Regime Shifts in Empirical Pricing Kernels: A …...2009/10/09 · Regime Shifts in Empirical...
Transcript of Regime Shifts in Empirical Pricing Kernels: A …...2009/10/09 · Regime Shifts in Empirical...
Regime Shifts in Empirical Pricing Kernels: AMixture CAPM∗
Massimo Guidolin†
Manchester Business School and Federal Reserve Bank of St. Louis
August 2009
JEL code: G12, C32.
PRELIMINARY AND INCOMPLETE, PLEASE DO NOT QUOTE WITHOUT PERMISSION
Abstract
We develop a family of Markov switching empirical pricing kernels that mix and nest the standard
conditional CAPM, the downside (semi-variance) conditional CAPM, and richer, four-moment CAPMs
in which coskewness and cokurtosis risks are priced in addition to plain vanilla covariance risk. We
estimate both unconstrained and restricted versions of these kernels, where the restrictions are suggested
by primitive principles concerning preferences and the resulting properties of the intertemporal marginal
rate of substitution, such as non-satiation, global risk aversion, and decreasing absolute risk aversion
over the wealth domain. When a restricted version of the Markov switching pricing kernel that implies
a mixture of the three classical forms of CAPMs is considered, we obtain a pricing performance that is
largely superior to what is provided by a more standard, single-state pricing kernel similar to many popular
implementations of the conditional CAPM. As a result, the alphas characterizing the returns on size-
(SMB), value- (HML), and momentum-sorted portfolios are either small or lack statistical significance,
an indication that a mixture CAPM-based empirical pricing kernel can rationalize the pricing of the cross
section of US stocks in its typical dimensions.
Key words: Capital Asset Pricing Model, Mixture CAPM, Markov Switching, Empirical Pricing Ker-
nel, Asset Pricing Anomaly.
1. Introduction
Is there a fundamental pricing measure (henceforth called the pricing kernel) that can adequately price the
cross-section of US stock returns and at the same time preserve some degree of consistency with elementary
∗This project was supported by INQUIRE UK funding. I would like to thank Kevin Aretz, Mario Cerrato, John Cotter,
Carlo Favero, Marcelo Fernandez, Ana Galvao, Mike Johannes, Frank de Jong (a discussant), Robert Kollmann, Alex Kostakis,
Fulvio Ortu, Peter Pope, Mark Shackleton, Raf Wouters, and seminar participants at Bocconi University, ECARES Brussels,
Lancaster School of Management, University of Manchester (Economics), University of Glasgow, University College Dublin,
Queen Mary University of London, Tilburg University (Finance), Tinbergen Institute Erasmus University Rotterdam, and the
2008 Cititgroup Quant conference in Athens for helpful comments and suggestions. All errors remain our own.†Correspondence to: Prof. Massimo Guidolin, Manchester Business School, Accounting & Finance Division. Phone: +44-
(0)161-306-6406; Fax: +44-(0)161-275-4023. Address: University of Manchester Business School, MBS Crawford House, Booth
Street East, Manchester M13 9PL, United Kingdom. E-mail: [email protected].
economic postulates, such as that free lunches (arbitrage opportunities) are unlikely to be exploitable in low-
frequency (monthly) data and that investors are risk-averse and feature decreasing absolute risk aversion as
their wealth increases? By and large, so far the answer provided by most of the financial economics literature
has been negative. On the one hand, when researchers have postulated a stochastic process for the pricing
kernel which is consistent with micro-founded, rational expectations models, they have normally found that
the resulting kernel can hardly price the cross-section of US stock returns. For instance, it is well known
that simple pricing kernels derived from the process that dynamic general equilibrium models imply for
the intertemporal marginal rate of substitution in consumption, generally fail to generate market (excess)
returns with realistic properties, starting from their mean (the equity premium puzzle), volatility (the excess
volatility puzzle), as well as their predictability over alternative horizons (mean reversion puzzles). On the
other hand, when researchers have adopted an empirical approach to the problem, the resulting, estimated
pricing kernels have accurately priced many dimensions of the US cross-section, but — without assuming
rather complicated, often unnatural structures for the kernel (e.g., that the kernel should depend in specific
ways on aggregate labor income) — they have also displayed properties which either open up the door to
arbitrage opportunities or imply that markets be populated by risk-loving individuals (e.g., see Dittmar,
2002).
In this paper, we propose an alternative, empirical route to the problem of finding rational pricing
kernels that can price the cross-section of US stock returns in a number of dimensions used in the earlier
literature, such as the size of listed firms, their book-to-market multiples, or measures of past stock perfor-
mance (momentum). We show that if the parameters that characterize simple versions of the pricing kernel
are governed by a low-dimensional Markov chain which leads to identify different regimes with different
versions of the basic, textbook Capital Asset Pricing Model (CAPM), the resulting pricing and forecasting
performance is remarkable and the estimated pricing kernel may be restricted to be economically admissible,
i.e., consistent with fundamental economic postulates. The functional form of the pricing kernel is “simple”
in the sense that we only postulate specifications of the kernel that reduce to special cases of the classical
CAPM, such as the plain vanilla CAPM (i.e., in which beta merely depends on conditional covariance risk),
the downside CAPM (henceforth dCAPM, when also downside covariance risk is priced), and a four-moment
CAPM (henceforth 4MOM, when also coskewness and cokurtosis risks are priced). The pricing and fore-
casting performance we demonstrate in this paper are “remarkable” in the sense that they are competitive
with both current empirical benchmarks in the literature — the four-factor Fama and French (1993) model
augmented by a momentum factor as in Carhart (1997) — and with unrestricted, Markov switching CAPMs
in which higher-order (co-)moments are priced in a no-arbitrage framework. Because the empirical pricing
kernels (henceforth, EPKs) we estimate are based on the intuition that over time the markets may switch
among different versions of some form of CAPM as determined by a homogeneous but irreducible first-order
Markov chain, we define the class of pricing kernels described in our paper as mixture CAPMs.
The empirical work undertaken in this paper features three key ingredients: the mixture of alternative
CAPM-type models; the Markov switching (henceforth, MS) component that drives the mixture; the fact
that economic constraints on the coefficients and shape of the pricing kernel are imposed. We use this
Introduction to explain the motivation for each of these ingredients and to connect our efforts to the existing
literature. Our motivation for developing and estimating a rational asset pricing framework that mixes
2
different “strands” of CAPMs is that a growing literature has shown that rather intuitive modifications
of the classical, plain vanilla unconditional CAPM may lead to accurate — albeit, it must be recognized,
unstable — pricing performance. The dCAPM may include either a specific price of risk for the covariance
of portfolio returns with the market, conditioning on the latter falling below some threshold (most typically,
its mean) or feature asymmetric reactions to downside and upside markets separately.1 It was developed by
Bawa and Lindenberg (1977) and Harlow and Rao (1989). Chan (1988), De Bondt and Thaler (1986), and
Petkova and Zhang (2005) have used it to investigate the value premium but have obtained mixed results.
Recently, Post and Vliet (2006) have used this model — and applied the non-parametric stochastic dominance
tests proposed by Post (2003) — to show that downside risk helps to explain the high returns earned by small
caps, value stocks and recent winner stocks, i.e., the same benchmark stock portfolios describing the US cross
section that we use in this paper. Post and Vliet (2005) also use parametric tests to find indications that
conditional downside risk drives asset prices: their mean-semivariance CAPM outperforms the traditional
mean-variance CAPM, both in unconditional and conditional tests. The low (high) beta stocks involve
more (less) systematic downside risk than expected based on their regular betas. This pattern is especially
pronounced during bad states of the world, when the market risk premium is high. They conclude that
conditional downside risk completely explains average returns within the size deciles, it is not related to
distress risk but can only partially explain the momentum effect. Similarly, Ang, Chen, and Xing (2006)
claim that downside beta is a priced risk attribute because stocks that have high covariation with the market
when the market declines exhibit high average returns over the same period.
The 4MOM CAPM introduced by Kraus and Litzenberger (1976) and Harvey and Siddique (2000)
prices higher-order moments. Higher (conditional) moments capture the presence of asymmetries and/or
fat tails in the distribution of asset returns, but are not necessarily the same as upside and downside betas.2
There are several different approaches to include higher moments in CAPM-style frameworks, for example
Friend and Westerfield (1980), Sears and Wei (1985), and Barone-Adesi (1985). Dittmar (2002) finds that
both a quadratic and a cubic pricing kernel are admissible for the cross section of industry portfolios,
whereas the linear single-factor (CAPM) and linear multifactor (Fama-French) pricing kernels are not.
Guidolin and Timmermann (2008a) use a 4MOM international CAPM to model the time series dynamics
of several international equity indices and report that its adoption drastically changes the optimal asset
allocation implications of the classical, mean-variance based CAPM. Although the superior performance
of nonlinear pricing kernels vs. linear pricing kernels has been documented in the literature (e.g., see
1As early as Roy (1952), economists have recognized that investors care differently about downside losses than they care about
upside gains. Markowitz (1959) advocates using semivariance as a measure of risk, rather than variance, because semivariance
measures downside losses rather than upside gains. More recently, the behavioral framework of Kahneman and Tversky’s (1979)
loss aversion preferences and the axiomatic approach taken by Gul’s (1991) disappointment aversion preferences allow agents
to place greater weights on losses relative to gains in their utility functions.2It is important to recognize that downside covariance risk is different from coskewness risk because downside betas explicitly
condition for market downside movements in a nonlinear fashion, whereas the coskewness statistic does not explicitly emphasize
asymmetries across down- and up-markets, even in settings where coskewness may vary over time. Ang, Chen, and Xing (2006)
control for coskewness risk in assessing the premium for downside beta. Downside beta risk and coskewness risk are found
to be empirically different. The high returns to high downside beta stocks are robust to controlling for coskewness risk and
vice versa. Downside beta risk is strongest for stocks with low coskewness. As a result, in this paper we also experiment with
pricing kernels that admit a differentiation between downside covariance risk, and higher comoment risk (i.e., coskewness and
cokurtosis risk).
3
Bansal and Viswanathan, 1993, Chapman, 1997), the superiority of these kernels to a flexible multifactor
model, such as the Fama-French’s (1993) model, has not. This result is particularly interesting because the
nonlinear pricing kernel that we investigate is subject to economic restrictions that cannot affect a standard,
Fama-French type multifactor pricing kernel as many of its factors fail to depend on the wealth process.
Our motivation for modelling the time-variation in the EPK as driven by an unobservable Markov chain is
that the recent empirical asset pricing literature has given growing attention to the fact that the performance
(or lack thereof) of simple, CAPM-style models appears to be unstable over time. Equivalently, while it
has been long recognized that the CAPM and its simple variants easily generates large and statistically
significant “alphas” (i.e., abnormal returns caused by large components of realized excess returns that are
not explained by the risk factors featured by the selected framework), there is now an increasing awareness
that such alphas may be strongly time-varying and difficult to forecast. For instance, Post and Vliet
(2005) find that the role of downside risk is most pronounced in their first subsample (1931-1966) and in
their bad-state subsample (defined by periods of above-average dividend yield). In Post and Vliet’s more
recent subsample (1967-2002) the unconditional mean-variance and unconditional mean-semivariance (i.e.,
dCAPM) models show instead similar performances. Ang and Chen (2007) have stressed that standard
CAPM betas are strongly time-varying, to the point that OLS inferences on alphas and betas might be so
badly misspecified to become unusable to assess the fit of a conditional CAPM.3 While some of the recent
literature has predominantly insisted in finding out the sub-periods over which the performance of simple,
conditional (or even unconditional) CAPM may be considered “acceptable”, and has often offered some
economic intuition for why this may be the case, in this paper we elect to base our research design on the
longest possible sample period for estimation and forecasting purposes. A MS framework is employed to
capture the unstable time series dynamics of the priced risk factors — or better, to endogenously characterize
such dynamics as deriving from the presence of regime switching patterns in the cross section of US equity
returns. As a result, while other papers have argued that the period 1927-1962 contains a few unique events
that could adversely affect econometric estimation during the pre-1963 period (e.g., the Great Depression
and WWII) because these may identify structural breaks, in our empirical strategy it becomes important
to effectively use these events to identify turning points and regime switches in the way the cross-section is
priced.
Our reduced-form conditional asset pricing model that unifies the standard CAPM, a downside risk
framework, and a 4MOM CAPM falls within the class of conditional CAPM models initially developed by
Harvey (1989), Shanken (1990), Ferson and Harvey (1991, 1993, 1999), Cochrane (1996), and Jagannathan
and Wang (1996). Most of these studies use instrumental variables to model the time-variation in the betas
as a linear function of the instruments. Our risk factors (comparable to time-varying betas) and their
unit prices (comparable to conditional market risk premia) are also time-varying, but instead of relying on
3Ang and Chen (2007) prove that when betas vary over time and are correlated with time-varying market risk premia, OLS
alphas and betas provide inconsistent estimates of conditional alphas and conditional betas, respectively. They also show that
the magnitude of the inconsistency in the unconditional OLS alpha, relative to the true conditional alpha, cannot be determined
without direct estimates of the underlying time-varying conditional beta process. The common practice of employing rolling
OLS estimates of betas understates the variance of the true conditional betas. The limiting distribution of the OLS alpha is
also distorted from the standard asymptotic distribution that assumes constant betas. Consequently, a large unconditional OLS
alpha may not necessarily imply the rejection of a conditional CAPM.
4
instrumental variables, we parameterize the unit risk prices themselves as driven by an endogenous latent
process. This has the advantage of not relying on exogenous predictor variables to capture the time-variation
in the betas and avoids any potential omitted variable bias from misspecifying the set of predictor variables
(see Harvey, 2001, and Brandt and Kang, 2004). However, a standard set of predictors which summarize
and forecast business cycle conditions — here the dividend yield, the riskless term spread, and the default
spread from the US corporate bond market — is collected in a vector of instruments which are allowed to
affect the market risk premium as well as the definition of the current Markov state.
Finally, two additional (and somewhat related) choices that inform our research design deserve comment.
First, we perform the estimation of our competing EPKs (with and without MS and mixture CAPMs) on a
small but rather significant set of portfolios that are supposed to capture in a parsimonious way the main
features of the US cross section, i.e., we employ the longest available data series on value-weighted market
(excess) returns, and on long-short portfolios that capture size, value, and momentum anomalies.4 As for
size, it is clearly debatable whether one may still discuss of an asset pricing anomaly (i.e., the fact that on
average small stocks give higher returns than large stocks).5 Yet, it is size-sorted long-short portfolio that
in a sense should allow our MS-driven, mixture approach to express its power: Ang, Chen, and Xing (2006)
have recently documented a strongly time-varying size premium that in many sub-samples is incompatible
with the plain vanilla CAPM. It seems crucial that EPKs such as ours — that are built to offer significant
flexibility — be used to capture the dynamics in the size premium.
The value anomaly consists of the empirical regularity by which value (high book-to-market ratio) stocks
would give on average higher returns than growth (low book-to-market ratio) stocks. As stressed by Zhang
(2005), the existence of a value premium is puzzling not only because modern finance implies that expected
returns ought to purely reward systematic risk-taking, but also because this runs contrary to basic economics
wisdom that growth options hinge upon future economic conditions and must therefore be riskier than assets
in place, which characterize instead value firms. There is rich evidence in the literature of time variations
in the value anomaly. For instance, Ang and Chen (2007) report evidence that a conditional CAPM may
explain the value premium over the period 1926-1963. During the 1964-2004 period, value stocks have lower
betas than growth stocks — the reverse of what the CAPM needs to explain the value premium. As a result,
the CAPM fails the tests for 1963 to 2004, whether or not one allows for time-varying betas. During 1926
to 1963, however, value stocks have higher betas than growth stocks.
The momentum anomaly consists of the regularity by which past “winner” (high performance) stocks
4Multifactor models of asset prices have been successful in pricing the cross section of equities than have single factor models,
i.e., they are by construction consistent with the anomalies. Fama and French (1993) have suggested that the returns to the
portfolios SMB and HML represent hedge portfolios in the sense of Merton (1973). In later work (Fama and French, 1995,
1996), they suggest that the size and book-to-market factors may capture some systematic distress factor. Carhart (1997)
has extended Fama and French’s three-factor model to include a momentum factor. However, multifactor models provide the
researcher with considerable freedom since the models give little guidance for the choice of factors. In contrast, the pricing
kernel in this paper explicitly defines the relevant factor for pricing, the portfolio of aggregate wealth. Further, preference theory
imposes restrictions on the signs of the coefficients associated to each term in the pricing kernel.5Fama and French (2006) have recently confirmed earlier evidence (see Chan and Chen, 1988, Fama and French, 1996) that
the size premium in average returns is consistent with CAPM pricing. The market beta for SMB (see Section 4 for a definition)
for the period 1926-1963, 0.19, is close to that for the period 1963-2004, 0.21. In their CAPM regression for SMB for 1926-2004,
the intercept is 0.10% (t-stat is 0.92). This means that although there exists a precisely estimated size premium for 1926 to
2004, about half of it is absorbed by SMB’s market covariance beta.
5
would systematically outperform past “loser” (poorly performing) stocks, i.e., there would exist a substantial
degree of persistence in realized stock returns that is very hard to rationalize using standard asset pricing
frameworks. Momentum has revealed itself to be particularly difficult to explain away: for instance, Griffin,
Ji, and Martin (2003) fail to find a strong association between international business cycles and momentum
effects. Post and Vliet (2005) conclude that their conditional downside risk model completely explains
average returns within the size deciles but that their conditional framework fails to explain the momentum
effect, although downside risk and conditioning lead to substantial improvements in pricing accuracy.
The second choice that characterizes our research design is the extensive use of comparative out-of-sample
forecast performance measures as a way to validate the quality of our estimated EPK and to minimize the
damaging effects of potential over-fitting and data-snooping. Over-fitting is clearly an issue when large
parametric models are employed, which is one of the distinctive feature of MS models. Data-snooping may
pose critical problems in our research design because the very features of the EPK framework we build and
estimate are largely suggested by the existing literature and by extensive recognition of the data properties
that we undertake in Section 4. The use of predictions to assess the quality and usefulness of EPKs is
applied to out-of-sample experiments in two different ways. Recursive, pseudo out-of-sample (henceforth,
OOS) exercises are used to check whether recursively updated estimates of competing EPKs may accurately
price the same benchmark portfolios used in estimation 1- and 12-month ahead. These exercises have a
pseudo OOS nature in the sense that the recursively update specification of each EPK is the one decided
on the basis of the full sample of data, i.e., it suffers by construction of data snooping issues even though
it provides protection against the perils of over-fitting. Genuine OOS experiments are instead performed
by pricing portfolios different from the benchmark ones — here, industry-sorted and size and value double
sorted portfolios — used in estimation. Because none of this information has been used in setting up our
econometric models, such prediction experiments are likely to provide protection against both over-fitting
and data-snooping issues.6
Our main results are as follows. First, single-state models in which the standard macro-finance predictors
used in the literature affect the time variation of the unit prices of risk provide a poor performance. Even
though our smooth, single-state model allows already non-negligible degrees of flexibility (and nonlinearity)
in the EPK, the alphas for HML and MOMO are systematically statistically significant, with annualized
abnormal rates of return between -0.9 and 6.8 percent in the case of HML, and 1.3 and 17 percent per
year in the case of MOMO. Occasionally, even the SMBalpha turns positive and statistically significant,
an indication that a smooth macro-driven model may even generate a size anomaly, contrary to common
wisdom in the literature. Additionally, the evidence that the two risk factors in a simple single-state model
— covariance and downside covariance risk — may be priced in the cross-section is weak. Both in terms of
root-mean-squared pricing errors (RMSPE) and of (pseudo-) R2s the single-state model yields a mediocre
performance. For instance, all R2s are between 0 and 2.1 percent, and the performance tends to be worse
than what can be attained using a plain-vanilla unconditional CAPM. We obtain sensible improvements
6Our focus on the predictive performance of alternative EPKs is not completely new in the asset pricing literature. Ang, Chen,
and Xing (2006) have checked whether past downside betas predict future expected returns. They find that, for the majority
of their cross section, high past downside beta predicts high future returns over the next month. In a MS framework related to
this paper, Guidolin and Timmermann (2008b) have documented that their MS models accurately predict the dynamics of the
joint density of long-short portfolios similar to ours.
6
of the asset pricing implications of the estimated model under a 4MOM CAPM framework in which the
dynamics in conditional higher order moments is driven by a two-regime MS structure. The first regime is a
persistent bull state in which volatilities are low. The second regime is a less persistent bear regime in which
volatilities are high. The evidence of average abnormal returns across the two regimes disappears almost
completely when the first set of constraints is imposed, i.e., arbitrage opportunities are ruled out, the signs
of the risk premia are constrained to agree with the implications of decreasing absolute risk aversion, and the
estimated pricing kernel implies on average the mean gross short-term rate observed over our sample period.
Even though the bear regime SMB alpha remains rather large in economic terms (4.99%) and commands a
p-value of approximately 0.05, all the remaining alpha coefficients fail to be significant at standard levels.
Importantly, this evidence of non-zero alphas weakening when Markov regimes are modelled in unit risk
prices and the relevant conditional co-moments projected out of the resulting MS model is not a feature we
have assumed a priori; equivalently, the data may have revealed larger, not smaller Markov switching alphas.
When the full set of constraints is imposed, we only find marginal statistical significance for the MOMO’s
alpha in regime 1, while the SMB’s alpha for regime 2 greatly declines and now fails to be significant.
Even though the statistical evidence in favor of a MS structure is strong and the results concerning the
average abnormal returns in the two states are encouraging, some uncertainty remains as to which factors
are priced in the MS 4MOM CAPM. For instance, while in the bear regime only coskewness seems to be
priced, in the bear state there is evidence that all conditional co-moments are priced. Additionally, when
the two-regime 4MOM EPK is constrained to be compatible with standard properties of the intertemporal
rate of marginal substitution for a risk-averse investor, the improvement in pricing performance is uniformly
visible for only one portfolio out of four (the momentum-sorted one). We therefore proceed to estimate a
three-state model in which the nature of the regimes is not left to the data, because the three states are
pre-specified to differ in the way assets are priced, i.e., we further estimate a mixture CAPM in which at all
points in time one and only one asset pricing framework may be generating the observed asset returns. When
the complete set of restrictions are imposed on the EPK, we find that all the alphas stop being statistically
significant, including MOMO’s alpha in the CAPM regime. Interestingly, the alphas remain rather large
in absolute value in correspondence to the third, 4MOM CAPM regime (the alphas range between -2.1%
to 2.8% per month), but they command such high standard errors that the associated p-values are all very
high. Economically, this means that even though the modeled risk factors fail to lead the average abnormal
returns to zero in a mathematical sense, they do so in a statistical sense as the variation in the sample returns
associated to the third state is sufficiently large to drive the classical 95% confidence intervals around the
estimated alphas to include a zero abnormal return. This finding has key economic implications because in
the absence of a rational explanation, the conclusion of Lakonishok, Shleifer, and Vishny (1994) that the
asset pricing anomalies (in particular, value and momentum premia) would be caused by overraction-fueled
irrational misspricing will apply. On the contrary, the ability to isolate one EPK that prices the US cross-
section without generating large and statistically significant abnormal returns (alphas) is consistent with
rational explanations.
The estimated mixture CAPM generates estimates of the risk premia which are rather sensible, validating
the a priori identification of the three statistical regimes with distinct asset pricing states. Moreover, the
regimes isolated by standard MS filtering techniques appear to make economic sense: The US stock market
7
has spent a large proportion of the period 1927-2007 in a plain vanilla CAPM regime. Our estimates reveal
that when constraints are imposed, the CAPM regime has a duration of approximately 14 months and
characterizes 44.1% of our sample. Long spans of data are consistent with a CAPM state in which only
covariance risk is priced, such as most of the 1940s, 1950s and 1960s, the period 1991-1996, and more
recently most of the bull markets of the 2003-2006 period. It seems plausible that extended periods of time
be expression of simple frameworks in which only covariance risk is priced. Additionally, the CAPM regime
typically features high mean excess market returns and high returns all stock portfolios, accompanied by
moderate volatility. On the contrary, the dCAPM regime is scarcely persistent (its average duration is
approximately 7 months) but because of the structure of the estimated transition matrix, it also occurs
with a remarkable frequency, characterizing 39.4% of our sample. Typical periods in which US stock returns
appear to have been generated by the dCAPM are 1929-1930, 1935-1938, 1941-1945, the late 1960s and
late 1970s, 1998-1999, and more recently the 2002-2003 bear markets. Interestingly, from mid-2007, in
correspondence of a situation of financial turmoil, the US equity markets switch from the CAPM to the
dCAPM regime, with an unsettling similarity to the onset of the Great Depression in 1929. Because
many of these periods correspond to bear and volatile markets, the dCAPM regime generates a modest
annualized market risk premium of 3.1% with a volatility of 21.6%. Finally, the 4MOM asset pricing regime
occurs rather infrequently but — when this happens — the state is considerably persistent, with an average
duration of 9 months. However, only 16.6% of our sample is generated by the 4MOM state. The 4MOM
regime is completely characterized by the presence of high volatility. The pricing performance of the three-
state mixture CAPM is attractive: despite the restrictions that define the mixture model, the additional
flexibility offered by a three-state specification offers high payoffs that are especially visible in the absence
of restrictions: all the pseudo-R2 increase when moving from the two-state to the three-state models and
the same holds for the RMSPE which declines. In fact, the mixture CAPM exceeds the R2 levels typical of
a simple, unconditional CAPM and substantially cuts its RMSPE. When the complete set of restrictions is
imposed, a similar result obtains, with the pseudo-R2 now in the range 3.5-6.1%.
Three related paper that deserve discussion as they may help us clarify our contribution are Ang and
Chen (2007), Guidolin and Timmermann (2008a), and Petkova and Zhang (2005). Ang and Chen (2007)
propose and estimate a conditional CAPM with time-varying betas, time-varying market risk premia, and
stochastic systematic volatility. They directly take into account the time-variation of conditional betas in
estimating conditional alphas, rather than relying on incorrect OLS inference. In particular, they treat
betas as endogenous variables that vary slowly and continuously over time and directly estimate them
using Kalman filter techniques in a Bayesian set up. On the contrary our (frequentist) MS approach is
closer in spirit to previous estimates of time-varying betas by Campbell and Vuolteenaho (2004), Fama
and French (2006), and Lewellen and Nagel (2006), among others, who assume discrete changes in betas
across subsamples but constant betas within subsamples. However, as in Ang and Chen (2007), we capture
predictable time-variation in both the conditional mean and the conditional volatility of the market excess
return. We model time-varying market premia by using a latent state variable for the conditional mean of
the excess market return. Moreover, while Ang and Chen estimate separate models for each of the long-short
portfolios that capture the essence of the asset pricing anomalies, we estimate a joint model for all portfolios.
Petkova and Zhang (2005) find that a conditional, time-varying CAPM goes in the right direction when
8
it comes to rationally explain the value puzzle; in particular, value betas co-vary positively with the expected
market risk premium while growth betas display opposite behavior. As a result, the beta of HML positively
co-varies with the expected market risk premium. However, the magnitudes of these covariances are too
small to fully account for the return differentials between value and growth stocks. As a result, the estimated
alphas of HML from conditional market regressions mostly remain positive and significant. However, this
correlation is only estimated indirectly, through instrumental proxies for conditional betas and market risk
premia. On the opposite, in our paper we propose a family of “structural” (in the sense that the priced
conditional comoments are generated endogenously from the presence of MS) econometric models that
generate covariation between expected market risk premia (as well as market volatility and asymmetries)
and betas (including the 4MOM betas) which is therefore estimated in a completely consistent fashion.
This evokes the main features of the 4MOM international CAPM proposed by Guidolin and Timmermann
(2008a), who also use a MS framework and an EPK derived by approximation to a standard intertemporal
marginal rate of substitution to price a number of international equity portfolio indices. However, their
focus is on the optimal portfolio diversification implications of their framework and not on the attempt to
specify an asset pricing framework that “turns off” all abnormal returns. In fact, the size and dynamics of
the alphas in Guidolin and Timmermann (2008a) is compatible with the fact that important risk factors may
have been omitted. Finally, Guidolin and Timmermann do not explore the possibility that — by imposing
restrictions on a multi-state version of their simpler two-state model — their regimes may receive a clear,
mixture-like CAPM interpretation.
The paper is structured in the following way. Section 2 develops our EPKs exploiting an extended
(to include semi-variance components) Taylor expansion of the intertemporal marginal rate of substitution.
Section 3 turns the EPK models developed in Section 2 into an estimable econometric framework. The
notion of mixture CAPM is introduced and possible restrictions on the EPK deriving from simple economic
principles are discussed. Section 4 describes the data and explains in what sense the asset pricing “properties”
(including the related anomalies) of size-, value-, and momentum-sorted portfolios are unstable over time.
Section 5 reports estimates of the models introduced in Section 3 and shows that a mixture CAPM may
explain away the major asset pricing anomalies that have appeared in the literature. Section 6 concludes.
2. Empirical Pricing Kernel Models
Since the seminal work by Harrison and Kreps (1979), we know that, under some regularity conditions, a
random variable Mt+1 that represents a pricing kernel that prices all risky payoffs under the law of one
price and is nonnegative under the condition of no arbitrage can be found. Moreover, the assumption of the
existence of a representative agent allows (at least in static settings) the pricing kernel to be expressed as a
function of aggregate consumption or, equivalently, gross returns on the aggregate wealth portfolio (see e.g.,
Brown and Gibbons, 1985). Similarly to Dittmar (2002) and Guidolin and Timmermann (2008a), we assume
that the pricing kernel may be approximated through a time-varying, third-order Taylor series expansion of
the marginal utility of gross returns on the wealth portfolio (around a zero return on the wealth portfolio,
RWt+1, i.e., around the initial wealth level Wt):
Mt+1 =U0(Wt+1)
U 0(Wt)+
U00(Wt+1)
U 0(Wt)RWt+1 +
U000(Wt+1)
U 0(Wt)(RW
t+1)2 +
U0000(Wt+1)
U 0(Wt)(RW
t+1)3 + o((RW
t+1)4).
9
Under the additional assumption that at least the zeroth and first-order term of this expansion may corre-
spond to a different intertemporal marginal rate of substitution (as in Post and Vliet, 2005) depending on
whether the returns on the wealth portfolios exceed or not some time-varying threshold level RWB,t+1, we can
write the resulting, approximate pricing kernel as:
Mt+1 ' g0,t + g1,tRWt+1 + g01,tminRW
t+1, RWB,t+1+ g2,t
¡RWt+1
¢2+ g3,t
¡RWt+1
¢3, (1)
where gj,t = U(j+1)t /U
(1)t is the ratio of derivatives of the utility function (U (1) ≡ U 0 is the first derivative,
etc.) evaluated at current wealth. RWB,t is a threshold/benchmark level for gross returns on aggregate wealth
computed on the basis of information at time t, Ft, for instance RWB,t = E[RW
t+1|Ft], the conditional mean
return on the aggregate wealth portfolio. Of course, RWB,t = Rf
t (the conditional, gross riskless rate) and
RWB,t = 0 also represent two natural benchmarks.
Assuming positive marginal utility (U 0 > 0), strict risk aversion (U 00 < 0), decreasing absolute risk
aversion (U 000 > 0), and decreasing absolute prudence (U 0000 < 0, as in Kimball, 1993), it follows that g1 < 0,
g01 < 0, g2 > 0 and g3 < 0.7 Negative exponential utility satisfies such restrictions and the same applies
to constant relative risk aversion preferences. Caballe and Pomansky (1996) show that all HARA utility
functions display standard risk aversion properties (i.e., U 0 > 0, U 00 < 0, U 000 > 0, and U 0000 < 0), which is
consistent with a cubic pricing kernel.8 The minRWt+1, R
WB,t+1 term makes the pricing kernel a function of
whether the gross returns on the wealth portfolio exceed or not the benchmark level RWB,t+1, i.e.,
Mt+1 =
(g0 + g01R
WB,t+1 RW
t+1 > RWB,t+1
g0 RWt+1 ≤ RW
B,t+1
+
(g1R
Wt+1 RW
t+1 > RWB,t+1
(g1 + g01)RWt+1 RW
t+1 ≤ RWB,t+1
+ g2¡RWt+1
¢2+ g3
¡RWt+1
¢3.
(2)
This means that in both cases the pricing kernel is a cubic function of the gross returns on the wealth
portfolio, although with different coefficients according to whether RWt+1 ≤ RW
B,t+1 or RWt+1 > RW
B,t+1.9
Ang et al. (2006) stress that for the types of utility functions that generate terms like minRWt+1, R
WB,t
in the expression for the pricing kernel — such as Gul’s (1991) disappointment aversion (DA) function,
which generalizes power utility to the presence of endogenously determined “kinks” — in the representations
7Unless marginal utility of wealth is constant, in our Taylor expansion the coefficients gj,t will be time-varying. However, to
simplify the notation we drop the time index on the utility function.8Scott and Horvath (1980) have shown that a strictly risk-averse individual who always prefers more to less and consistently
(i.e. for all wealth levels) likes skewness will necessarily dislike kurtosis, so that U 0000 < 0 may also be derived as an implication
of U 0 > 0, U 00 < 0, U 000 > 0, when these conditions apply to all wealth levels.9In principle, the pricing kernel could also be generalized to include two additional terms such as min(RW
t+1)2, RW
2B,t+1 andmin(RW
t+1)3, RW
3B,t+1whereRW2B,t andR
W3B,t are thresholds that apply to squared and cubic values of gross returns on the wealth
portfolio. Clearly, this creates a highly flexible but also complex pricing kernel in which there are 8 (= 23) different “branches”
formed by all the possible combinations of the events RWt+1 > RW
B,t+1, RWt+1 ≤ RW
B,t+1, (RWt+1)
2 > RW2B,t+1, (R
Wt+1)
2 ≤ RW2B,t+1,
and (RWt+1)
3 > RW3B,t+1, (R
Wt+1)
3 ≤ RW3B,t+1. We have experimented with a version of our unconstrained asset pricing model
which included the term max(RWt+1)
2, RW2B,t+1, so that the pricing kernel has structure:
Mt+1 =
⎧⎪⎪⎪⎨⎪⎪⎪⎩g0,t+1 + (g1,t+1 + g01,t+1)R
Wt+1 + (g2,t+1 + g02,t+1) RW
t+12+ g3,t+1 RW
t+13
RWt+1 ≤ RW
B,t+1, (RWt+1)
2 ≤ RW2B,t+1
(g0,t+1+g01,t+1R
WB,t+1) + g1,t+1R
Wt+1 + (g2,t+1+g
02,t+1) RW
t+12+ g3,t+1 RW
t+13
RWt+1 > RW
B,t+1, (RWt+1)
2 ≤ RW2B,t+1
(g0,t+1+g02,t+1R
W2B,t+1) + (g1,t+1+g
01,t+1)R
Wt+1 + g2,t+1 RW
t+12+ g3,t+1 RW
t+13
RWt+1 ≤ RW
B,t+1, (RWt+1)
2 > RW2B,t+1
(g0,t+1+g01,t+1R
WB,t+1+g
02,t+1R
W2B,t+1)+g1,t+1R
Wt+1 + g2,t+1 RW
t+12+g3,t+1 RW
t+13
RWt+1 > RW
B,t+1, (RWt+1)
2 > RW2B,t+1
.
We find that the resulting asset pricing model is very hard to estimate and leads to a poor forecasting performance, probably
as a result of overfitting.
10
commonly employed in empirical finance, such as the downside beta CAPM a la Bawa and Lindenberg
(1977), terms like minRWt+1, R
WB,t, and higher-order Taylor expansions a la Dittmar (2002), can only be
approximations, because the underlying utility functions usually do not admit an explicit form and Taylor
expansions are simply approximations of non-smooth functions.
Assuming that a conditionally risk-free asset exists with gross return Rft , and imposing the standard
no-arbitrage condition,
E[Mt+1Rit+1|Ft] = 1 (3)
we get a four-moment asset pricing model with time-varying coefficients. Noting that from the definition of
a conditionally riskless asset we have E[Mt+1Rft |Ft] = Rf
t E[Mt+1|Ft] = 1, so that E[Mt+1|Ft] = 1/Rft and
E[Mt+1Rit+1|Ft]−Rf
t E[Mt+1|Ft] = Cov[Mt+1, Rit+1|Ft] +E[Mt+1|Ft]E[R
it+1|Ft]−Rf
t E[Mt+1|Ft] = 0,
so that
E[Rit+1|Ft]−Rf
t = −Rft Cov[Mt+1, R
it+1|Ft].
Plugging our assumed approximate expression for the pricing kernel into this expression for the conditional
risk premium on any asset or portfolio indexed by i = 1, ..., N , we obtain:
E[Rit+1|Ft]−Rf
t = −Rft g1Cov[R
it+1, R
Wt+1|Ft]−Rf
t g01Cov[R
it+1,minRW
t+1, RWB,t+1|Ft] +
−Rft g2Cov[R
it+1,
¡RWt+1
¢2 |Ft]−Rft g3Cov[R
it+1,
¡RWt+1
¢3 |Ft], (4)
where the coefficients gj (j = 1, 2, 3) and g01 are measurable with respect to time t information, Ft, coherently
with the fact that (4) determines the time t conditional risk premium. At this point, we notice that the
expression Cov[Rit+1,minRW
t+1, RWB,t+1|Ft] can be re-written as (letting R
Wt+1 ≡ E[minRW
t+1, RWB,t+1|Ft]):
Cov[Rit+1, IRW
t+1<RWB,t+1
RWt+1 + IRW
t+1≥RWB,t+1
RWB,t+1|Ft] =
ZRit+1
(Rit+1 −Et[R
it+1])×
×(Z
RWt+1<R
WB,t+1
(RWt+1 − RW
t+1)dF (RWt+1) +
ZRWt+1≥RW
B,t+1
(RWB,t+1 − RW
t+1)dF (RWt+1)
)dF (Ri
t+1)
=
ZRit+1
(Rit+1 −Et[R
it+1])
ZRWt+1<R
WB,t+1
(RWt+1 − RW
t+1)dF (RWt+1)
= Cov[Rit+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1],
since by constructionRRWt+1≥RW
B,t+1(RW
B,t+1 − RWt+1)dF (R
Wt+1) = (R
WB,t+1 − RW
t+1) PrRWt+1 ≥ RW
B,t+1|Ft and
(RWB,t+1 − RW
t+1) PrRWt+1 ≥ RW
B,t+1|FtZRit+1
(Rit+1 −Et[R
it+1])dF (R
Wt+1) = 0.
Substituting (??) in (4), we obtain:
E[Rit+1|Ft]−Rf
t = −Rft g1Cov[R
it+1, R
Wt+1|Ft]−Rf
t g01Cov[R
it+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] +
−Rft g2Cov[R
it+1,
¡RWt+1
¢2 |Ft]−Rft g3Cov[R
it+1,
¡RWt+1
¢3 |Ft]= λ2,tCov[R
it+1, R
Wt+1|Ft] + λ−2,tCov[R
it+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] +
+λ3,tCov[Rit+1,
¡RWt+1
¢2 |Ft] + λ4,tCov[Rit+1,
¡RWt+1
¢3 |Ft] (5)
11
where λj,t = −Rft gj−1 (j = 2, 3, 4) and λ
−2,t = −R
ft g01 so that λ2,t, λ
−2,t > 0, λ3,t < 0 and λ4,t > 0. Notice that
the risk premia carry an index which corresponds to the index of the gj parameters increased by one, because
these coefficients are evocative of the order of the priced risk factors, 2 for covariance risk, 3 for coskewness
risk, 4 for cokurtosis risk. Finally, using the definitions of CAPM beta, downside beta, coskewness, and
cokurtosis,10
βi,t ≡Cov[Ri
t+1, RWt+1|Ft]
V ar[RWt+1|Ft]
β−i,t ≡Cov[Ri
t+1, RWt+1|Ft, R
Wt+1 < RW
B,t+1]
V ar[RWt+1|Ft, RW
t+1 < RWB,t+1]
γi,t ≡Cov[Ri
t+1, (RWt+1)
2|Ft]Skew[RW
t+1|Ft]where Skew[RW
t+1|Ft] ≡ E[(RWt+1 −E[RW
t+1|Ft])3|Ft]
κi,t ≡Cov[Ri
t+1, (RWt+1)
3|Ft]Kurt[RW
t+1|Ft]where Kurt[RW
t+1|Ft] ≡ E[(RWt+1 −E[RW
t+1|Ft])4|Ft],
we obtain:
E[Rit+1|Ft]−Rf
t = λ2,tCov[Ri
t+1, RWt+1|Ft]
V ar[RWt+1|Ft]
V ar[RWt+1|Ft] + λ−2,t
Cov[Rit+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1]
V ar[RWt+1|Ft, RW
t+1 < RWB,t+1]
×
× V ar[RWt+1|Ft, R
Wt+1 < RW
B,t+1] + λ3,tCov[Ri
t+1, (RWt+1)
2|Ft]
Skew(RWt+1|Ft)
Skew[RWt+1|Ft]
+ λ4,tCov[Ri
t+1, (RWt+1)
3|Ft]Kurt[RW
t+1|Ft]Kurt[RW
t+1|Ft]
= βi,tV ar[RWt+1|Ft] + β
−i,tV ar[R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] + γi,tSkew[RWt+1|Ft] + κi,tKurt[RW
t+1|Ft],
where the modified CAPM “betas” for variance, skewness, and kurtosis are
βi,t ≡ λ2,tβi,t β−i,t ≡ λ−2,tβ
−i,t γi,t ≡ λ3,tγi,t κi,t ≡ λ4,tκi,t. (6)
The conditional expression obtained in (5) is a four-moment no-arbitrage asset pricing model which admits
the possibility that the CAPM beta may include a component β−i,t that measures the variance risk of an
asset/portfolio during bear markets only. Since Bawa and Lindberg (1977), β−i,t has been defined as a
downside beta and it measures the contribution of each asset or portfolio i to the variance of the wealth
process in downside markets, when RWt+1 < RW
B,t+1. If an asset tends to move downward in a declining
market more than it moves upward in a rising market, it is an unattractive asset to hold, because it tends
to have very low payoffs precisely when the wealth of investors is low. Investors who are sensitive to
downside losses, relative to upside gains, require a positive premium for holding assets that co-vary strongly
with the market when the market declines. Hence we can expect λ−2,t to be positive and in an economy
with agents placing greater emphasis on downside risk than upside gains, assets with high sensitivities to
downside market movements will have high average returns.11 Additionally, as in Harvey and Siddique
10In the following we assume V ar[RWt+1|Ft] > 0, Skew[RW
t+1|Ft] 6= 0, and Kurt[RWt+1|Ft] > 0.
11There are many ways in which one can construct micro-founded models in which downside risk matters in asset pricing.
First, downside risk is often featured by behavioral models. Shumway (1997) develops an equilibrium model based on loss-averse
investors. Barberis and Huang (2001) use a loss aversion utility function, combined with mental accounting, to construct a
cross-sectional equilibrium. Second, constraints and frictions in rational models where the constraint binds only in one direction
obtain the same effect, for example, binding short-sales constraints (e.g., Chen, Hong, and Stein, 2001) or wealth constraints
12
(2000), Dittmar (2002), and Guidolin and Timmermann (2008a), this asset pricing framework attaches a
positive price to coskewness with the wealth process — the average tendency of an asset to display high (low)
returns when the variance on the wealth process is high (low) — which gives a positive asymmetry to the
joint process for [Rit+1 R
Wt+1]
0, and a negative price to cokurtosis. Because kurtosis can be described as the
degree to which, for a given variance, a distribution is weighted toward its tails, it measures the possible
multi-modality of a distribution, or the probability mass in the tails of the distribution, given its variance.12
Therefore cokurtosis captures the average tendency of an asset to exhibit returns of the same sign as the
market, exactly when the wealth process draws returns from its extreme tails. Clearly, a risk-averse investor
attaches a positive price to (i.e., she demands a negative risk premium on) positive coskewness, because
obtaining high returns from an asset when the market is more volatile helps reducing the overall risk of a
portfolio. Moreover, a risk-averse investor also attaches a negative price to (i.e., she demands a positive risk
premium on) positive cokurtosis.
3. The Econometric Framework
A large body of evidence suggests that return moments and prices of risk are time varying, and a wide array
of studies have used this evidence as a basis for investigating pricing models that hold conditionally (e.g.,
Harvey, 1989, Ferson and Harvey, 1991, Guidolin and Timmermann 2008a,b). To allow for conditional time-
variations in the return process and the possibility of misspecification biases, we extend the four-moment,
downside CAPMs in (5) as follows. Even though (5) represents a sensible way to write the asset pricing
model, empirical estimation is facilitated (see Guidolin and Timmermann, 2008a, for details) by re-writing
(5) as:
E[Rit+1|Ft]−Rf
t = λ2Cov[Rit+1, R
Wt+1|Ft] + λ−2 Cov[R
it+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] +
+λ3Cov[Rit+1, (R
Wt+1)
2|Ft] + λ4Cov[Rit+1, (R
Wt+1)
3|Ft]. (7)
This model no longer expresses conditional risk premia on a portfolio as a sum of estimable asset/portfolio-
specific beta-style loadings (βi, β−i , γi, κi) on risk factors multiplied by the prices of such risks. On the
contrary, (7) turns the risk prices into estimable parameters (λ2, λ−2 , λ3, λ4) and measures instead the
portfolio-specific quantities of risk (expressed in the forms of conditional covariances, Cov[Rit+1, R
Wt+1|Ft],
(e.g., Kyle and Xiong, 2001). Third, fully rational models exist. For instance, Ang, Chen, and Xing (2006) work with Gul’s
(1991) disappointment aversion framework in which disappointment utility is implicitly defined by the preference functional:
U(μW ) =1
K
μW
−∞U(W )dF (W ) +A
+∞
μW
U(W )dF (W )
where U(W ) is the felicity function over end-of-period wealth W , which they choose to be power utility. The parameter
0 < A ≤ 1 is the coefficient of disappointment aversion, F (W ) is the cumulative distribution function for wealth, μW is
the certainty equivalent (the certain level of wealth that generates the same utility as the portfolio allocation determining
W ), and K ≡ Pr(W ≤ μW ) + APr(W > μW ). Outcomes above (below) the certainty equivalent μW are termed “elating”
(“disappointing”) outcomes. If 0 < A < 1, then the utility function down-weights elating outcomes relative to disappointing
outcomes. If A = 1, disappointment utility reduces to a special case of standard CRRA utility; while standard CRRA utility
produces aversion to downside risk, the order of magnitude of the downside risk premium is usually negligible because CRRA
preferences are locally mean-variance.12Kurtosis may be distinguished from variance, which measures the dispersion of observations from the mean, in that it
captures the probability of outcomes that are highly divergent from the mean, that is, extreme outcomes.
13
Cov[Rit+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1], Cov[Rit+1, (R
Wt+1)
2|Ft], and Cov[Rit+1, (R
Wt+1)
3|Ft]) which can be com-
puted in a variety of ways (see below for details). Notice that when (5) is expressed in this way, (??) and
(7) become completely consistent because it is obvious that if (7) is applied to the wealth process, then
Cov[RWt+1, R
Wt+1|Ft] = V ar[RW
t+1|Ft], Cov[Rit+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] = V ar[RWt+1|Ft, RW
t+1 < RWB,t+1],
Cov[Rit+1, (R
Wt+1)
2|Ft] = Skew[RWt+1|Ft], and Cov[Ri
t+1, (RWt+1)
3|Ft] = Kurt[RWt+1|Ft]. This shows that
βW = λ2, β−W = λ−2 , γW = λ3, κW = λ4 because the same model must price all possible portfolios,
including the wealth process itself.
Third, to use a flexible representation without imposing too much structure, the price of risk associated
with these moments is allowed to depend on a latent state variable, St+1, that is assumed to follow a
Markov process but is otherwise not restricted. In turn, this state-dependence carries over to the price of
the risk factors appearing in the equations for returns on the individual portfolios, i.e., λ2,St+1 (covariance
risk), λ−2,St+1 (downside covariance risk), λ3,St+1 (coskewness risk), and λ4,St+1 (cokurtosis risk). Finally,
consistent with empirical evidence in the literature (see e.g., Campbell, Chan, and Viceira, 2003) we allow
for predictability of returns on the wealth portfolio through aM ×1 vector of instruments, zt, also assumedto follow a Markov switching first-order autoregressive process. For instance, similarly to much asset pricing
literature that has followed Fama and French (1989), zt may include variables such as the dividend yield,
the term spread, the default spread, and short-term interest rates. Interestingly, the EPK in (1) implies
that the mixture CAPM holds if and only if the “loadings” of the variables in zt onto market excess returns
are zeros (i.e., if and only if the null hypothesis of cWS = 0 in all regimes), which is a testable restriction.
Defining excess returns on any i−th portfolio as xit+1 ≡ Rit+1 −Rf
t (i = 1, ...,N, where N is the number of
test portfolios to be employed) and on the wealth portfolio as xWt+1 ≡ RWt+1−R
ft , the econometric model can
be summarized as:
xit+1 = αiSt+1 + λ2,St+1Cov[Rit+1, R
Wt+1|Ft] + λ−2,St+1Cov[R
it+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] +
+λ3,St+1Cov[Rit+1, (R
Wt+1)
2|Ft] + λ4,St+1Cov[Rit+1, (R
Wt+1)
3|Ft] + ηit+1
xWt+1 = αWSt+1 + λ2,St+1V ar[RWt+1|Ft] + λ−2,St+1V ar[R
Wt+1|Ft, xWt+1 < xWB,t] + λ3,St+1Skew[R
Wt+1|Ft] +
+λ4,St+1Kurt[RWt+1|Ft] + cWSt+1zt + ηWt+1
zt+1 = μSt+1 +BSt+1zt + ηZt+1. (8)
Consistent with the restrictions implied by our downside, four-moment CAPM in (??)-(7), the risk pre-
mia λj,St+1 (j = 2, 3, 4) and λ−2,St+1 are common across the individual portfolios and the wealth portfolio.
However, we allow for asset-specific intercepts, αiSt+1 , that capture other types of misspecification; further-
more, the presence of time-dependence in the alphas may be useful in what follows to propose a simple
test of whether flexible mixture asset pricing models combining the traditional CAPM (λ2,St+1 > 0), the
downside/partial-moment CAPM (λ−2,St+1 > 0), and the four-moment CAPM (λ3,St+1 < 0 and λ4,St+1 > 0)
may provide a pricing kernel-based explanation for size, value, and momentum anomalies. The innovations
ηt+1 ≡ [η1t+1...ηNt+1 ηWt+1 (ηZt+1)0]0 ∼ N(0,Ωst+1) can display state-dependent covariance matrices, while the
predictor variables, zt+1, follow a first order autoregressive process with state-dependent vector autoregres-
sive coefficients, BSt+1 , as in Guidolin and Timmermann (2006). This is consistent with the high degree of
persistence commonly found in popular predictor variables, such as the dividend yield and short-term inter-
est rates. Crucially, although (8) as an asset pricing framework is a model for conditional means only, and
14
as such it imposes no restrictions on the properties of the covariance matrix, in our econometric framework
the assumption of ηt+1 ∼ N(0,Ωst+1) is important because it may contribute to the endogenous genera-
tion and/or magnification of time-varying kurtosis and cokurtosis. As a result, our assumption of Markov
switching variances and covariances also has key asset pricing implications.
To “close” the econometric specification of the model, we assume that the Markov state variable, St+1,
follows a K-state (homogeneous, irreducible and ergodic) first-order process with constant transition prob-
ability matrix, P:13
P[i, j] = Pr(St+1 = j|St = i) = pij , i, j = 1, ..,K. (9)
Therefore (8) can be interpreted as a time-varying version of the multi-beta latent variable model of Ferson
(1990) where both the risk premia and the amount of risk depend on a latent first-order Markov state
variable.
There are a number of advantages to modelling equity returns in this way. At time t, conditional
on knowing the state next period, St+1, the distribution of portfolio and market returns is multivariate
Gaussian. However, since future states are not known in advance, the time t return distribution is effectively
a mixture of normals with weights reflecting the current state probabilities.14 Such mixtures of normals
provide a flexible representation that can be used to approximate many distributions (e.g., see Harvey and
Zhou, 1993, and Guidolin and Timmermann, 2008b). They can accommodate mild serial correlation in
returns — documented for market returns in a number of papers (e.g., see Campbell et al., 2003) — and
volatility clustering since they allow the first and second moments to vary as a function of the underlying
state probabilities (e.g., see Guidolin and Timmermann, 2006). Finally, multivariate regime switching
models allow return correlations across assets to vary with the underlying regime, consistent with the recent
evidence of asymmetric and time-varying correlations in US equity returns in Ang and Chen (2002) and in
size- and value-sorted portfolios in Guidolin and Timmermann (2008b). Finally, it must be stressed that
our asset pricing model depends on moments of (excess) returns on the market portfolio in addition to
the covariances, coskewness and cokurtosis between portfolio returns and the wealth portfolio. Estimating
the (co-)skewness and (co-)kurtosis of asset returns is difficult (see Harvey and Siddique (2000)). However,
our mixture MS model allows us to obtain precise conditional estimates in a flexible manner as it captures
coskewness and cokurtosis as a function of the mean, variance and persistence parameters of the underlying
Markov states.
13Of course, at the price of considerable complication in the estimation methods, the assumption of a time-homogeneous
Markov chain (transition probability matrix) driving St+1 might be removed. However, for our purposes the model in (8),
allowing time-varying premia as well as risk factors (i.e., conditional covariances, co-skewness, and co-kurtosis) appears to be
flexible enough to the point that additional degrees of flexibility would simply lead to a substantial risk of over-fitting.14Even in the case in which one regime receives a unit (filtered) probability at time t, unless the transition matrix P is
degenerate, the time t + T (T ≥ 1) state will remain uncertain. For instance, assuming K = 2 and calling πt the time t
probability of state 1, the predicted probability of state 1 at time t+1 is πtp11 + (1− πt)p21, where pij ≡ PrSt = j|St−1 = i,a generic element of the transition matrix P. Even though πt = 1, as long as p11 < 1, we have that the predicted probability
of state 1 is p11 < 1. p11 < 1 means that the Markov chain fails to have an absorbing state, which we have assumed.
15
3.1. Mixture Pricing Kernel Models
Amixture CAPM is a time-varying combination of different CAPM-type models — defined as models in which
the pricing kernel only depends on wealth portfolio returns and functions thereof — where one and only one
specific CAPM applies at each point in time. As already discussed in the Introduction, the basic intuition is
that both because preferences may change over time and because the stochastic process of asset returns may
undergo periods of structural instability that change their essential features, it is possible or even plausible
that over different intervals of time, the same asset or portfolio may be priced using a different rational asset
pricing framework. A mixture CAPM is a special case of the general Markov switching framework in (8)
for which a specific structure (that can be interpreted as a set of zero restrictions placed on the risk premia
coefficients) is imposed on the Markov chain:
K = 3 and
⎧⎪⎨⎪⎩λ−2,St+1 = λ3,St+1 = λ4,St+1 = 0 if St+1 = 1
λ3,St+1 = λ4,St+1 = 0 if St+1 = 2
λ−2,St = 0 if St+1 = 3
.
Clearly under the first regime, this restriction delivers a simple, standard CAPM but under the remaining
two regimes, it does not. As a result, each of the three assumed regimes corresponds to a specific asset
pricing model according to the mapping (for i = 1, 2, ...,N):15⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
⎧⎪⎨⎪⎩xit+1 = αi1 + λ2,1Covt[R
it+1, R
Wt+1] + ηit+1
xWt+1 = αW1 + λ2,1V art[RWt+1] + c
W1 zt + ηWt+1
zt+1 = μ1 +B1zt + ηZt+1 ηt+1 ∼ N(0,Ω1)
if St+1 = 1
⎧⎪⎨⎪⎩xit+1 = αi2 + λ2,2Covt[R
it+1, R
Wt+1] + λ−2,2Covt[R
it+1, R
Wt+1|RW
t+1 < RWB,t+1] + ηit+1
xWt+1 = αW2 + λ2,2V art[RWt+1] + λ−2,2V art[R
Wt+1|RW
t+1 < RWB,t+1] + c
W2 zt + ηWt+1
zt+1 = μ2 +B2zt + ηZt+1 ηt+1 ∼ N(0,Ω2)
if St+1 = 2
⎧⎪⎨⎪⎩xit+1 = αi3+λ2,3Covt[R
it+1, R
Wt+1]+λ3,3Covt[R
it+1, (R
Wt+1)
2]+λ4,3Covt[Rit+1, (R
Wt+1)
3]+ηit+1xWt+1 = αW3 + λ2,3V art[R
Wt+1] + λ3,3Skewt[R
Wt+1] + λ4,3Kurtt[R
Wt+1] + c
W3 zt + ηWt+1
zt+1 = μ3 +B3zt + ηZt+1 ηt+1 ∼ N(0,Ω3)
if St+1 = 3
.
(10)
To save space, the moments that condition on the information set Ft are simply written using a time t pedix,
e.g., Covt[Rit+1, R
Wt+1] ≡ Cov[Ri
t+1, RWt+1|Ft]. In qualitative terms, (10) is very different from (8): while in
the latter, in principle in each of the K regimes all the risk factors are priced, although potentially with
different risk premia as a function of the state St+1, in a mixture CAPM specific sub-sets of risk factors
enter the asset pricing equations in each of the three possible regimes, where K = 3 is dictated not from
the empirical features of the data but from the fact that three alternative CAPM-style pricing frameworks
are being mixed. From an economic viewpoint, (10) means that over time, markets switch among a limited
number of alternative asset pricing frameworks to price assets; the only risk factor which is commonly priced
15We also consider the case in which state 2 is a “pure” dCAPM regime, i.e., if St+1 = 2 then⎧⎪⎨⎪⎩xit+1 = αi2 + λ−2,2Covt[R
it+1, R
Wt+1|RW
t+1 < RWB,t] + ηit+1
xWt+1 = αW2 + λ−2,2V art[RWt+1|RW
t+1 < RWB,t] + c
W2 zt + ηWt+1
zt+1 = μ2 +C2zt + ηZt+1 ηt+1 ∼ N(0,Ω2)
.
However the pricing performance of such a model turns out to be similar to the model in the main text.
16
across all frameworks is covariance risk, although the related unit risk premium may differ across regimes,
i.e., λ2,1, λ2,2, and λ2,3 do not have to be identical. In quantitative terms, (10) is no different from (8); in
fact, if one were unable to reject the hypothesis that λ−2,1 = λ3,1 = λ4,1 = λ3,2 = λ4,2 = λ−2,3 = 0 in (8), it is
clear that (10) and (8) would be identical.
3.2. Other Benchmarks
It is important to recognize that it is a specific assumption on the most adequate empirical framework
that has taken us from (??) to (7), where the prices of risk are time-varying and follow a K-state Markov
switching specification with fixed transition probabilities, (8)-(9). In fact, in the literature we can find three
recent examples of empirical papers that have dealt with EPK models similar in spirit to (1) and that have
proposed empirical strategies that parametrize time-varying risk premia and risk exposure in a different way.
These yield two natural benchmarks for our empirical efforts.
Ang and Chen (2007) propose a fully specified conditional CAPM which can be written as:
xit+1 = αi + βi,t+1E[xWt+1|Ft] + σi i
t+1 i = 1, ..., N
xWt+1 = μt+1 +√υt+1
Wt+1
βi,t+1 = b0,i + b1,iβi,t + σβiβit+1
μt+1 = m0 +m1μt + σμ μt+1
lnυt+1 = v0 + v1 ln υt +υt+1 (11)
where [ it+1Wt+1
βit+1
μt+1
υt+1]
0 ∼ IID N(0,Ψ), and Ψ admits the existence of simultaneous correlations
between μt+1, and
υt+1 (capturing a leverage effect as in Brandt and Kang, 2004) as well as between
βit+1
and μt+1. All other shocks are orthogonal to each other. Although formally (11) does not nest (8)-(9) and
it also not nested in it, there are clear connections. On the one hand, (11) obtains when only covariance
risk is priced and in a symmetric fashion, i.e., λ−2,t = λ3,t = λ4,t = 0, while the expression for xWt+1 rescinds
any connections between market variance and risk premia. This makes (11) considerably more parsimonious
than (8)-(9). On the other hand, while (8)-(9) simply allows variances and covariances of the shocks in ηt+1
to be a function of a latent Markov state variable, (11) incorporates a truly (log-) stochastic volatility model
for excess market returns, which is clearly richer and more sophisticated.16 Additionally, (11) implies that
estimation has to be uniquely based on portfolio returns, while (8)-(9) allows us to use the predictors in zt to
affect the market risk premium as well as the definition of the current Markov state, which is rather general,
may allow us a further misspecification test of the mixture CAPM, and it is useful to try and connect the
Markov state variable, St+1, to business cycle conditions.
One additional difference between (11) and (8)-(9) obviously concerns the assumed dynamics for the
quantity of risk, βi,t. In (11) βi,t follows a continuous Gaussian AR(1) process; in our model, the quantity
16However, while our MS model may roughly capture the evidence of time-varying idiosyncratic volatility, the Gaussian AR(1)
time-varying beta model with fixed σi does not. Campbell et al. (2001) show that the idiosyncratic volatility has noticeably
trended up for individual stocks over the 1990’s. Notice however that incorporating continuous, time varying idiosyncratic
volatility would introduce a difficult identification problem between time varying risk factors and prices and idiosyncratic risk.
Therefore it seems that even a rough MS approach may be sufficient to capture the most important empirical features of the
data.
17
of risk and the exposure of each asset to it — via the conditional moments Cov[Rit+1, (R
Wt+1)
j |Ft] (j = 1, 2, 3)
and Cov[Rit+1, R
Wt+1|Ft, RW
t+1 < RWB,t+1] — follow a first-order, discrete Markov process. Clearly, an AR(1)
process is ideal to capture slow, highly persistent variation, while a Markov process impresses more abrupt,
sudden changes. Ang and Chen (2007) admit to a strong prior that their conditional betas ought to vary
slowly over time. However, Ang and Chen also expect that the conditional shocks hitting the betas (βit+1)
may be quite variable, so σβi could be potentially large. Additionally, notice that while the risk premia
λj,St+1 and λ−2,St+1 (j = 2, 3, 4) may be only take a finite number K of possible values, the risk quantities
Cov[Rit+1, (R
Wt+1)
j |Ft] (j = 1, 2, 3) and Cov[Rit+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] change as a function of all the
information collected in Ft — in particular, the current set of perceived state probabilities — so that if the
Markov states are persistent, then the risk factors in (8)-(9) may yield levels of persistence similar to those
directly modeled by Ang and Chen (2007) in (11). We therefore adopt (11) as a first, natural benchmark.
Post and Vliet (2005) resort to a much simpler framework (to side-step the task of estimating from
portfolio returns data the dynamics of the pricing kernel) but — similarly to our paper — they directly posit
that the pricing kernel has a simple structure given by Mt+1 = g0,t + g01,tminRWt+1, 1 in which
g0,t = b00 + b01zt
g01,t = b10 + b11zt,
where zt is identified with the dividend yield (but they show that this choice is relatively unimportant).
Clearly, their framework corresponds to a case of (1) in which g1 = 0 (so that only downside covariance risk
matters), g2 = g3 = 0 (hence λ3,t = λ4,t = 0), and RWB,t+1 = 1 so that the downside event simply corresponds
to negative net return on the wealth portfolio. Post and Vliet’s framework can be seen as an application of the
approach first proposed by Dumas and Solnik (1995) and Cochrane (1996) who have treated the time-varying
coefficients in representations of the pricing kernel as simple linear functions of time t information variables,
gj,t = b0jzt, (j = 1, 2, 3) where the Z × 1 vector zt collects standard predictors/instrumental variables used
in the empirical finance literature. Also Dittmar (2002), adopts the approach of parameterizing the pricing
kernel as a function of a set of exogenous instruments with predictive power and in an application to a
higher-order CAPM in fact imposes sign restrictions by setting
gj,t = (−1)j(b0jzt)2.
Of course, this implies λj,t = −(−1)j−1(b0jzt)2Rft and restrictions on the risk premia of sign opposite to the
gj,ts.
In our paper we employ a second benchmark econometric framework similar to Dittmar (2002) and Post
and Vliet (2005) in which:
g1,t = b01zt, g01,t = b
01zt,
(therefore λ2,t+1 = −Rft (g1 + b
01Bzt), λ
−2,t+1 = −R
ft (g
01 + b
01Bzt), where B ≡ [ι3 B] and zt ≡ [1 z0t]
0),
g2,t = g3,t = 0, and there is one single state:
xit+1 = αi + λ2,t+1Cov[Rit+1, R
Wt+1|Ft] + λ−2,t+1Cov[R
it+1, R
Wt+1|Ft, R
Wt+1 < RW
B,t+1] + ηit+1
xWt+1 = αW + λ2,t+1V ar[RWt+1|Ft] + λ−2,t+1V ar[x
Wt+1|Ft, RW
t+1 < RWB,t+1] + c
Wzt + ηWt+1
zt+1 = μ+Bzt + ηZt+1. (12)
18
Here ηt+1 ∼ N(0,Ω) and zt includes the same variables used in our MS model, i.e., the dividend yield, the
term spread, and the default spread. Notice that the fact that zt forecasts excess returns on the “market”
portfolio, implies that the conditional variance and partial variances as well as the covariance and partial
covariances in (12) will be time-varying.17 In the following, we refer to (12) as the single-state model in which
risk premia are driven by standard predictors of business cycle conditions, compactly SSBC (single-state
business cycle).
3.3. Economically Admissible Pricing Kernels
Similarly to previous papers (see e.g., Dittmar, 2002, and Post and Vliet, 2005) we ask whether — or better,
under what additional restrictions — can (1) represent an admissible pricing kernel, where “admissible” means
consistent with basic postulates on individual behavior such as non-satiation (i.e., positive marginal utility),
weak risk-aversion (non-increasing marginal utility), and non-increasing absolute risk aversion (convex or
linear marginal utility). The logic is that in all equilibrium asset pricing models and in settings with standard
preferences, the pricing kernelMt+1 ought to correspond (or be proportional) to the intertemporal marginal
rate of substitution (IMRS) in consumption (see Hansen and Jagannathan, 1991), i.e.,
Mt+1 ∝U 0(Ct+1)
U 0(Ct).
Because U 0t(Ct) ∈ Ft, this implies that the basic properties of Mt+1 conditional on Ft ought to follow from
those characterizing U 0(Ct+1) while, similarly to Dittmar (2002), (1) can be read as a Taylor series expansion
of the IMRS. Therefore non-satiation is sufficient for Mt+1 > 0 everywhere; weak risk-aversion is sufficient
forMt+1 to be non-increasing; non-increasing absolute risk aversion implies either a linear or a convexMt+1.
At this point notice that because RWt+1 is a gross return and therefore R
Wt+1 > 0, the signs of Mt+1 and of
its derivatives,
M 0t+1 =
(g1,t + g01,t + g2,tR
Wt+1 + g3,t
¡RWt+1
¢2RWt+1 < RW
B,t+1
g1,t + g2,tRWt+1 + g3,t
¡RWt+1
¢2RWt+1 ≥ RW
B,t+1
M00t+1 = g2,t + g3,tR
Wt+1
only depend on the values of the coefficients, recalling that g1,t < 0, g01,t < 0, g2,t > 0 and g3,t < 0.
Although these values are in principle unknown, exploiting the fact that λj,t = −Rft gj−1,t (j = 2, 3, 4)
and λ−2,t = −Rft g01,t, they can easily be recovered ∀t in correspondence to the estimated values for the risk
premia, i.e., gj−1,t = −λj,t/Rft and g
01,t = −λ
−2,t/R
ft . Of course, in the absence of adequate restrictions, these
estimated values of the time-varying coefficients characterizing the EPK may imply that time periods in
a subset T exist such that if t ∈ T , then it is possible that Mt+1 < 0 (which opens the door to arbitrage
opportunities caused by negative state prices), or thatM 0t+1 > 0 (which implies risk-loving) or thatM
00t+1 < 0
(which implies increasing absolute risk aversion). We therefore estimate (8)-(9) as well as all other models
described in Section 3 under three alternative sets of restrictions:
1. We initially estimate our models without imposing any economic structure. This delivers uncon-
strained estimates for the λj,t+1s (j = 2, 3, 4) and λ−2,t+1 and therefore for gj,t (j = 1, 2, 3) and g01,t.
17When cW = 0 it is clear that variances and covariances will all be constant and simply derive from the off-diagonal elements
of Ω.
19
These estimated values of the time-varying coefficients characterizing the pricing kernel may imply
the existence of time periods when Mt+1 < 0, or M 0t+1 > 0, or M
00t+1 < 0. Clearly, the first two
violations are particularly significant: when arbitrage opportunities exist, then the standard port-
folio/consumption optimization problem that leads to the first-order, Euler condition in (3) fails to
characterize the optimum, which implies that the asset pricing model we have derived is meaningless;
if the representative agent is not globally risk-averse, then the Euler condition in (3) characterizes
the optimum for the investor, but it is only a necessary condition (i.e., it may represent a minimum
and not a maximum); finally, the possibility that an investor has a positive, decreasing, but concave
pricing kernel contradicts much empirical and experimental evidence on investors displaying constant
or decreasing absolute risk aversion.
2. We then estimate all of our models imposing a rather weak set of economic restrictions: (i) that
Mt+1 > 0 at all times in the exercise; (ii) that at all times
E[Mt+1|Ft] =1
Rft
,
as implied by (3) for a conditional riskless asset; (iii) that g1,t < 0, g01,t < 0, g2,t > 0 and g3,t < 0 ∀twhich, because λj,t=-R
ft gj−1,t (j = 2, 3, 4) and λ−2,t = −R
ft g01,t, map into λ2,t¿0,λ
−2,t¿0,λ3,t¡0,λ4,t¿0∀t.
Dittmar (2002) calls the set of conditions in (iii) local restrictions implied by preference theory. Also
notice that the condition under (ii) is essentially void of implications if (i) is not simultaneously
imposed, as one may otherwise simply “use” the time-varying constant g0,t to set the conditional
expectation of the kernel to equate the inverse of the conditionally riskless interest rate. However,
when g0,t is also jointly restricted by the condition Mt+1 > 0, (ii) may have a chance “to bite” on the
data and influence the estimation outcome.18
3. Finally, we impose the full economic structure derived/emphasized in this sub-section, i.e.: (i)Mt+1 >
0 ∀t; (ii) E[Mt+1|Ft] = 1/Rft ∀t; (iii) g1t < 0, g01t < 0, g2t > 0 and g3t < 0 ∀t which, because λj,t=-
Rft gj−1,t (j = 2, 3, 4) and λ−2,t = −R
ft g01,t, map into λ2,t>0,λ
−2,t>0,λ3,t<0,λ4,t>0∀t; (iv) M 0
t+1 ≤ 0 ∀t;(v) M
00t+1 ≥ 0 ∀t. Dittmar (2002) calls restrictions such as (i), (iv), and (v), global restrictions implied
by preference theory.
3.4. Estimation Strategy
Our estimation strategy is similar to Guidolin and Timmermann (2008a,b). Here we briefly describe a few
of the technical issues involved with reference to the two-state MS 4MOM CAPM in (8)-(9). An Appendix
details how the conditional comoments implied by MS can be computed in closed-form following results
in Guidolin and Timmermann (2004, 2008a), which greatly simplifies the iterated (simulated) MLE used
in this paper. Obviously, the issue with the econometric framework in (8)-(9) is that conditional (higher)
comoments that are driven by the presence of MS parameters appear in the conditional mean function that
defines the residual vector ηt+1 ≡ [η1t+1...ηNt+1 ηWt+1 (ηZt+1)0]0 from which the conditional (higher) comoments
18Dahlquist and Soderlind (1999) show that failure to impose this restriction can result in estimation of an admissible pricing
kernel that implies a mean-variance tangency portfolio that is not on the efficient frontier.
20
themselves depend on. However, apart from the fact that the conditional covariances in the model depend
on the MS parameters in highly non-linear ways, (8)-(9) is qualitatively similar to a popular and frequently
used family of nonlinear models that are often also estimated by MLE, ARCH-in mean models, where the
conditional mean function of some variable yt also depends on the filtered conditional variance of y at time
t, hyt , which is a simple, basic form of second co-moment (the conditional variance of y is the conditional
covariance of y with itself). What makes the estimation of (8)-(9) tricky and interesting at the same time
is that while in an ARCH-in mean model, the conditional variance function depends on the residual from
the conditional mean function of y in a very simple way, this is not the case for (8)-(9), and the dynamic
system defined by the model needs to be simulated.
In practice, the estimation is performed by fixing some initial parameter vector θ0 (inclusive of the
estimable elements of the transition probability matrix P) and taking the initial value of the predictors
(z0) and of the relevant co-moments (Cov0[Rit+1, R
Wt+1], Cov0[R
it+1, (R
Wt+1)
2], etc.) as given. One also needs
some initial, starting value for the full-sample path of the regime probabilities, π0tTt=0, where π0t is aK × 1 vector such that ι0Kπ0t = 1. Experience shows that (whenever possible) setting θ0 to the uncon-
strained MS estimates and π0tTt=0 to equal the vectors of filtered probabilities that one can derive froma simple MS time series model (as in Guidolin and Timmermann, 2008b) for a given K, and the ini-
tial comoments to their full-sample unconditional estimates works well in terms of speed of convergence
and stability of the algorithm. At this point, using θ0 and π0tTt=0, one can compute the full sam-ple path of conditional comoments (Covt[R
it+1, R
Wt+1; θ0, π0tTt=0], Covt[Ri
t+1, (RWt+1)
2; θ0, π0tTt=0], etc.for t = 1, ..., T ) using the actual values of the predictors (when relevant) and the fixed starting values
for θ0 and π0tTt=0 as inputs to the closed-form expressions reported in the Appendix. At this point,
Covt[Rit+1, R
Wt+1; θ0, π0tTt=0], Covt[Ri
t+1, (RWt+1)
2; θ0, π0tTt=0], etc. for t = 1, ..., T can be taken as given
and standard MLE estimation routines can be deployed to obtain a first update of the parameter vector
θ1 and of the filtered probabilities π1tTt=0 (using standard Hamilton-Kim’s type filters). At this point,θ1 and π1tTt=0 are taken as the new initial “conditions” to obtain new sequences of predicted comomentsCovt[R
it+1, R
Wt+1; θ1, π1tTt=0], Covt[Ri
t+1, (RWt+1)
2; θ1, π1tTt=0], etc. for t = 1, ..., T. One can then iterate
on the algorithm until a pre-set convergence criterion is satisfied.19 Obviously, within each step in which the
comoments Covt[Rit+1, R
Wt+1; θ0, π0tTt=0], Covt[Ri
t+1, (RWt+1)
2; θ0, π0tTt=0], etc. for t = 1, ..., T are taken
as given (i.e., as if they were predetermined regressors), the algorithm features standard ML estimation of a
MS model with switching regression coefficients (see Hamilton, 1993). What is special about this estimation
algorithm is the step in which — for given parameter estimated and filtered probabilities from the previous
step — the conditional comoments have to be computed to yield the updated matrix of regressors appearing
in the conditional mean function.
As for the standard ML estimation step, letting yt+1 = [x0t+1, x
Wt+1, z
0t+1]
0 be a vector of excess returns
and predictor variables with intercepts μSt+1 = (α1St+1
, .., αNSt+1 , αWSt+1
,μ0zSt+1)0, we can collect the conditional
19Our stopping rule at step j ≥ 1 requires that the Euclidean distance between θj and θj−1 as well as the arithmetic sum
of the K − 1 Euclidean distances between each of the columns of πjtTt=0 and the corresponding column of πj−1tTt=0 beless than e−3. This means that convergence is achieved if and only if a fixed point in the space of conditional comoments and
parameter/filtered probability estimates has been found.
21
moments of returns and the world price of comoment risk in the matrices MSt and ΛSt+1 as follows
MSt ≡
⎛⎜⎝⎡⎢⎣"Cov[xt+1, x
Wt+1|Ft] Cov[xt+1, (x
Wt+1)
2|Ft] Cov[xt+1, (xWt+1)
3|Ft]V ar[xWt+1|Ft] Sk[xWt+1|Ft] K[xWt+1|Ft]
#O
⎤⎥⎦⊗ ι03⎞⎟⎠¯ ¡ι03 ⊗ I¢
ΛSt+1 ≡
⎡⎢⎣ λ2,St+1 ... λ2,St+1 λ2,St+1 0 ... 0
λ3,St+1 ... λ3,St+1 λ3,St+1 0 ... 0
λ4,St+1 ... λ4,St+1 λ4,St+1 0 ... 0
⎤⎥⎦ ,where ι3 is a 3 × 1 vector of ones and J is a matrix that selects the co-moments of excess returns (seeGuidolin and Timmermann, 2008a). We can then write the asset pricing model (5) more compactly as
yt+1 = μSt+1 +MStvec(ΛSt+1) +Bst+1yt + ηt+1. (13)
Here BSt+1 captures autoregressive terms in state St+1 and also collects the coefficients cWSt+1
that measure
the impact of the lagged instruments zt on the market risk premium. Finally ηt+1 ∼ N(0,ΩSt+1) is the
vector of state-dependent innovations. At this point, if ηjSt+1 is a vector of residuals in state St+1, the
contribution to the log-likelihood function conditional on being in state St+1 at time t+ 1 is given (up to a
constant) by:
ln p(yt+1|Ft, St+1;θj) ∝ −1
2ln |ΩjSt+1 |−
1
2η0St+1Ω
−1jSt+1
ηjSt+1 ,
where θj collects the mean (φ), variance (Ω), and transition probability (P) parameters of the model. The
expected value of the log-likelihood employed by the EM algorithm is maximized by choosing the parameters
θj+1 in the j + 1 iteration to satisfy (see Hamilton (1990, p.51)):
TXt=1
KXSt+1=1
∂ ln p(yt+1|Ft, St+1;θ)
∂θ
¯θ=θj+1
p(St+1|y2,y3, ...,yT ;θj) = 0,
where p(St+1|y2,y3, ...,yT ;θj)KSt+1=1 are the smoothed state probabilities for each of theK states. Letting
y ≡ [y02 y03 ... y0T ]0 and η ≡ [η01 η02 ... η0K ]
0, it is useful to re-write the log-likelihood as:
(y1, ...,yT |δ) ∝ −12
KXs=1
ln |Ωs|TXt=2
p(St+1;θj)−1
2
KXs=1
η0s(Σs ⊗Ω−1s )ηs
= −12
KXs=1
ln |Ωs|TXt=2
p(St+1;θj)−1
2η0W−1η
where
Z ≡
⎡⎢⎢⎢⎢⎣Z1
Z2...
ZK
⎤⎥⎥⎥⎥⎦ ; Zi≡
⎡⎢⎢⎢⎢⎣[e0i e
0i ⊗ y01]⊗ IN
[e0i e0i ⊗ y02]⊗ IN...
[e0i e0i ⊗ y0T−1]⊗ IN
⎤⎥⎥⎥⎥⎦
W−1≡
⎡⎢⎢⎣Σ1 ⊗Ω−11 · · · O
.... . .
...
O · · · ΣK ⊗Ω−1K
⎤⎥⎥⎦Σi≡diagp(s2 = i;θj), p(s3 = i;θj), ..., p(sT = i;θj).
22
The EM updating equation for the transition probabilities is based on the smoothed state probabilities
and can be found in equation (4.1) of Hamilton (1990, p. 51). Filtered state probabilities are calculated as
a by-product. The first order conditions for the mean and variance parameters, φ and Ω, are:
∂ ln (yt|δ)∂φ
= −12η0W−1Z = 0 (14)
∂ ln (yt|δ)∂Ωs
= −12
TXt=1
p(St+1 = s;θj)Ω−1s +
1
2Ω−1s ε0sΣsεsΩ
−1s = O s = 1, 2, ...,K, (15)
where εs ≡ [(y2 − Zs2=iφ)0 (y3 − Zs3=iφ)0 ... (yT − ZsT=iφ)0]0 are the residuals in state s and W−1 is a
function of ΩsKs=1. Equation (14) implies that φj+1 is a GLS estimator once observations are replaced by
their smoothed probability-weighted counterparts:
φj+1 = (Z0W−1Z)−1Z0W−1(ιK⊗y). (16)
Similarly, (15) implies the covariance estimator
Ωs =ε0sΣsεsPT
t=1 p(St+1;θj). (17)
φj+1 and Ωs,j+1Ks=1 must be solved jointly since εs enters the expression for the covariance matrix andalso depends on φj+1, while the regime-dependent covariance matrices Ωs,j+1Ks=1 enter (16) via W−1.
Hence, within each step of the EM algorithm, (16)-(17) is iterated upon until convergence of the estimates
φj+1 and Ωs,j+1Ks=1.
4. The Data
We use CRSP equity data (i.e., value-weighted NYSE and, when available, AMEX and NASDAQ) for the
period from January 1927 to March 2008, for a total of 975 observations per series. As stressed by Post and
Vliet (2006), when analyzing risk and risk preferences it is particularly important to include periods during
which investment risks are high and investors may be conjectured to have been keenly sensitive to risk. For
this reason, it appears justified to use extended sample periods that include the prolonged bear markets
of the 1930s, 1970s and early 2000s.20 In particular, we collect the series of value-weighted returns on the
market portfolio in excess of 1-month T-bill yield rate (from Ibbotson Associates) and of returns on Fama
and French’s (1993) SMB (Small Minus Big) and HML (High Minus Low) portfolios.21 The case of the
momentum portfolio (for short, MOMO) is a bit more complicated as six value-weighted portfolios formed
on size and prior returns are used. The portfolios, which are formed monthly, are the intersections of two
portfolios formed on size and three portfolios formed on prior returns. The monthly size breakpoint is the
20In particular, we purposefully retain the early 1927-1962 period. Ang and Chen (2007) show that the value premium for
the 1926-1963 period can be explained using a conditional CAPM. Fama and French (2006a) concur that the value premium
can be captured by the CAPM during the pre-1963 period.21SMB measures the average return on the stocks in the lowest tercile of the size distribution minus the average return on
the stocks in the highest tercile among the size-sorted portfolios; size is measured as total market value of firm equity. HML
measures the average return on the stocks in the highest 50% of the book-to-market-sorted distribution minus the average return
on the stocks in the lowest 50% of the book-to-market-sorted portfolios.
23
median NYSE market equity. The monthly prior return breakpoints are the 30th and 70th NYSE realized
return percentiles. MOMO returns are the average returns on the two high (above the 70th percentile)
prior return portfolios minus the average return on the two low (below the 30th percentile) prior return
portfolios, independently of size. This methodology captures the standard logic of measuring the returns of
past “winners” minus past “losers”.22
The choice of the instrument set zt is motivated by two considerations. First, the instruments should
consist of variables that are able to predict equity portfolio returns. Second, the choice of instruments should
be parsimonious due to power considerations, as argued in Dittmar (2002). Consequently, we use three
instrument series, represented by variables that are well known in the literature for their ability to forecast
(market) stock returns. The first is the value-weighted CRSP dividend yield series, computed as a ratio
between the trailing sum of 12-month dividends and time t−1 monthly price index (multiplied by 100). Thesecond variable is the term spread, computed as the monthly difference between the annualized percentage
yield on a constant maturity 10-year government bond (Treasury note) and the annualized percentage 1-
month Treasury Bill yield. The third variable is the default spread, constructed as the monthly difference
between the annualized Moody’s seasoned Aaa and Baa corporate bond yields (in percentage). In the case
of the default spread, monthly series are obtained as averages of daily series.23 The interest rate data are
obtained from FREDIIR°at the Federal Reserve Bank of St. Louis.
We also obtain series for the 25 Fama and French (F-F, 1993) portfolios and 17 industry portfolios.
The 25 F-F portfolios are obtained by applying a double, 5× 5 sorting on both size (measured by marketequity) and book-to-market ratio. The size breakpoints for year t are the NYSE market equity quintiles
at the end of June of t. The book-to-market ratio for June of year t is the book equity for the last fiscal
year end in t− 1 divided by the market value of the equity for December of t− 1; also the book-to-marketratio breakpoints are NYSE quintiles. The industry portfolios are simply obtained by assigning each NYSE,
AMEX, and NASDAQ stock to an industry portfolio at the end of June of year t based on its four-digit SIC
code at that time; Compustat SIC codes are used when available, while whenever Compustat SIC codes
are not available, CRSP SIC codes are used. Then returns are computed from July of t to June of t + 1.
The 25 F-F portfolios and the 17 industry portfolios are used to investigate the “out-of-sample” pricing and
forecasting performance of the empirical EPK models to be estimated in Section 5.
The literature shows that there are many competing definitions of value-sorted portfolios and, corre-
spondingly, of value and growth style-portfolios.24 We also assess how successful our competing empirical
models for the pricing kernel are with reference to two alternative portfolio construction methods. First,
22All of these data are dowloaded from Kenneth French’s data library, which is at http://mba.tuck.dartmouth.edu/
pages/faculty/ ken.french/Data Library/. SMB and HML returns are computed with reference to annual sortings that are
formed on July of year t and held until June of t+1; these portfolios include all NYSE, AMEX, and NASDAQ stocks for which
market equity data are available for December of t− 1 and June of t, and (positive) book equity data for t− 1. In the case ofMOMO, to be included in a portfolio for month t (formed at the end of the month t− 1), a stock must have a price for the endof month t− 13 and a good (valid) return for t− 2.23Moody’s tries to include bonds with remaining maturities as close as possible to 30 years. Moody’s drops bonds if the
remaining life falls below 20 years, if the bond is susceptible to redemption, or if the rating changes.24High HML portfolios are defined to be value portfolios, while low HML portfolios are growth portfolios. Our definition of
HML has been used as a standard measure of the value premium since Fama and French (1993), and has been used in many
recent studies, including Petkova and Zhang (2005) and Fama and French (2006a).
24
we construct value-minus-growth (VMG) portfolios using deciles instead of the more rudimentary 2 × 3sorting originally used by Fama and French (1993). Using a decile classification, we produce the portfolios
Hd (highest book-to-market decile), Ld (lowest book-to-market decile) and, as a difference, HMLd. These
definition of VMG has been used by Ang and Chen (2007), because a decile-based VMG is expected to give
a stronger evidence on the value premium by using the highest and lowest book-to-market decile portfolios.
Second, we employ a notion of VMG that emphasizes small capitalization stocks. Because the value pre-
mium is supposedly stronger among smaller firms (see Loughran, 1997, and Fama and French, 2006), from
a five-by-five sort on size and book-to-market the small-value (Hs) and small-growth (Ls) portfolios can be
calculated, and the small value premium HMLs is the small-value (Hs) minus the small-growth (Ls).25 For
symmetry, we also use one alternative portfolio construction method applied to size-sorted portfolios: every
year, we sort the CRSP universe in ten size deciles and produce Sd (lowest size decile), Bd (highest size
decile) and, as a difference, SMBd.
Table 1 starts by presenting standard summary statistics for the portfolios which are the object of our
analysis, and for the instruments used to capture predictability in the returns on the wealth process. For
comparison, we include in the Table also summary statistics concerning HMLd, SMBd, and HMLs, as a
way of performing robustness checks. Panel A of the table concerns long-run, full-sample results spanning
the 81 years and 3 months that go from January 1927 to March 2008. Data on excess market returns
show customary features, i.e., a mean excess return (in annualized terms) of 7.6%, (annualized, assuming
counter-factual independence over time) volatility of 18.8%, and an annualized Sharpe ratio of 0.40. Excess
market returns also display a moderately positive skewness (0.23, which is not typical in the literature and
entirely attributable to data from the 90s and the 00s) and rather large excess kurtosis. Overall, there is
little doubt that excess market returns fail to have an unconditional Gaussian density, as shown by the
large Jarque-Bera test statistic. The full-sample data also show clear evidence of the existence of value and
momentum anomalies, to be interpreted as the inability of the unconditional CAPM to explain the returns
from the HML and MOMO long-short portfolios.26 Standard, unconditional CAPM (single-index model)
regressions yield large alphas (5.2% per year for HML, 11.3% for MOMO, up to a stunning 13.4% per year in
the case of HMLs) with p-values virtually indistinguishable from 0, i.e., that are compatible with the null of
the CAPM practically with zero probability. As often stressed in the literature, the betas for these portfolios
are either too small (although statistically significant) or even negative and as such cannot rationalize the
high and statistically significant returns on value and momentum-sorted portfolios (6.1% for HML, 9.2%
for MOMO, and 11.5% for HMLs, in annualized terms); in fact, negative betas can only exacerbate the
phenomenon and they clearly imply alphas which are even larger than the raw, total mean returns from the
long-short portfolios.27 The percentage R2s from a plain vanilla, standard CAPM are practically negligible,
25However, Fama and French (2006a) show that Loughran’s (1997) evidence that there is no value premium among large
stocks seems to be peculiar to (1) the post-1963 period, (2) using the book-to-market ratio as the value-growth indicator, and
(3) restricting the tests to U.S. stocks. In particular, during the period 1927-1962, the value premium is nearly identical for
small and big US stocks.26This is different from Ang and Chen (2007) who report weak value effects for their overall 1927-2001 sample. As we shall
observe later, this is attributable to large HML mean return over the recent 2002-2007 period, when HML has displayed a mean
comparable to the full-sample (0.46%) which cannot be explained by the CAPM (the HML alpha is a large 0.51% with a p-value
of 0.027, while beta is negative).27Our findings for HMLd are mixed. On the one hand, the annualized mean return on HMLd is 6.3% and this mean is
25
between 3 and 14 percent, which confirms the standard adage that equity portfolio returns are extremely
hard to fit using (too) simple asset pricing models.
As reported in many recent papers, the size anomaly fails to be present in the full sample, since the
annualized average return on SMB and SMBd are only 1.9% and 7.4% respectively, only the second mean
is statistically significant, and most importantly, the corresponding alphas are relatively small (only 0.4 and
3.3 percent, respectively) and statistically insignificant. While in the case of the classical Fama and French’s
(1993) SMB portfolio there is indeed very little to explain, in the case of SMBd, a plain vanilla unconditional
CAPM yields a rather large beta of 0.53 capable to account for a substantial portion of SMBd returns (the
R2 is a non-negligible 14%); as a result the alpha is small and not statistically significant. All the long-
short portfolios entail large amounts of excess kurtosis (always statistically significant) and of distributional
asymmetries. However, while most portfolios have positive and significant skewness, HMLs and MOMO
imply negative skewness. In any event, none of the return series under investigation is compatible with an
unconditional Gaussian distribution, since there is clear evidence of the Jarque-Bera statistics exceeding 1-
and 5 percent size critical values. The three predictors used in our analysis confirm standard properties
reported in the literature, in particular a moderate volatility (between 3 and 6 times smaller than the
volatilities typical of equity portfolio returns) and the fact that the term and default spreads were on
average positive over our sample period, as one would expect.
Panels B and C distinguish between the pre- (1927-1963) and post- (1964-2007) COMPUSTAT periods.
Similarly to Ang and Chen (2007), in the pre-COMPUSTAT period we fail to find evidence of a value
anomaly: for both HML and HMLd, the alphas are small and not statistically significant, even though the
mean return on HML is relatively large (6.1% per year); in fact, the point estimate for the HMLd alpha
is negative. This is due to the fact that the pre-1964 CAPM betas are rather large and able to explain
a large portion of the variability in the positive value premia (with R2 of 27% for HML and of 31% for
HMLd). However, the value anomaly persists for small capitalization stocks, because HMLs has a mean
return of over 12% in annualized term and yields a highly statistically significant and large alpha of 13.7%
(which is clearly caused by the fact the single-index beta is once more negative). On the contrary, the value
anomaly is strong in the COMPUSTAT sample, because all the mean portfolio return estimates are high
and statistically significant (from 6.2% for HML to 11% for HMLs) and especially the single-index alphas
are economically large (from 7.3% for HMLd to 13.4% for HMLs) and highly statistically significant. These
results are identical to those reported by Ang and Chen (2007) with reference to HMLd and the period July
1963 - December 2001, i.e., 0.53% per month average return (we have a 0.55%) and a monthly alpha of
0.60% (we have 0.61%); similarly we find that while in the pre-1964 sample the HMLd beta is estimated to
be 0.74 and large enough to explain the performance of the long-short portfolio (R2 is 31%), in the post-1964
subsample, the beta falls negative to -0.11 and can no longer explain the performance of the book-to-market
strategy (R2 is 1%).
Panels B and C show that there is no strong time-variation in the strength of the momentum anomaly.
Although MOMO’s average returns increase from 8.1 to 10 percent (in annualized terms) when going from
the 1927-1963 to the 1964-2008 sub-sample, MOMO’s alpha in fact declines from a stunning 12.4% to a still
statistically different from zero with a p-value between 1 and 5%. On the other hand, the associated single-index alpha is a
moderate 2.9% per annum which fails to be statistically significant. This is probably due to the fact that returns for both
deciles 1 and 10 of the book-to-market sorted distribution are extremely noisy.
26
large 10.3%, and both estimates command p-values which are close to zero, while betas are either zero or
negative. Contrary to what has been sometimes reported (obviously, on shorter or different sub-samples)
there was never substantial evidence of a size anomaly. Both SMB and SMBd command neither statistically
significant and positive average returns (SMBd implies a modest 4.6% over the 1964-2008 period, but the
corresponding p-value is close to 10%) nor significant and positive alphas. While it is sometimes contended
that the size anomaly relates to the early part of the CRSP/NYSE sample, in fact the only feeble signal of
a size anomaly is given by SMBd for the sub-sample 1964-2008, when the alpha is 3.6% in annualized terms
with a p-value of 0.17.
Finally, panel D reports summary statistics for the recent sub-period 1994-2008, which collects only 2-3
complete market cycles and yet includes a sufficient number of observations to allow us to express some
comments on the dynamics of the anomalies in the very recent financial history of the US. Clearly, in recent
periods value and momentum anomalies have been stronger than ever. For instance, HML commands a
plain-vanilla single-index alpha of 9.8% (in annualized terms) and a MOMO alpha of 11.2%, both with
p-values which are essentially zero. As in panels A and C, HMLs is even larger, for instance with an alpha
that is almost double the alpha of the standard, Fama and French’s (1993) HML definition. Both the HML
and MOMO portfolios imply negative betas, which is of course the cause of the large and positive alphas. On
the contrary, it becomes clearly visible that especially in the recent period, the size anomaly has disappeared
altogether, with both negative (or anyway, not significant) raw returns and alphas.
Of course, we are far from being able to claim that we are the first researchers to observe that value-
and momentum-sorted portfolios imply positive and statistically significant abnormal returns that cannot
be explained by pure “beta risk”. For instance, starting with Fama and French (1993), many authors have
estimated simple OLS regressions on portfolios of stocks sorted by book-to-market ratios and rejected the
hypothesis that the OLS alpha is equal to zero. However, as argued by Ang and Chen (2007), using an
unconditional factor model estimated by OLS to make inferences regarding the conditional CAPM may be
treacherous for a number of reasons. Since the seminal work by Jagannathan and Wang (1996), we know that
if time-varying conditional betas are correlated with time-varying market risk premia, then the conditional
CAPM is observationally equivalent to an unconditional multifactor model in which Cov[Et[xWt+1], βt] is an
additional, priced factor. Under the null of a conditional CAPM, we would expect that the estimate of
the unconditional OLS alpha captures both the conditional alpha and the interaction of time-varying factor
loadings and market risk premia. Hence, any statement made about the failure of an unconditional CAPM
to capture the spread of average returns in the cross-section does not imply that a conditional CAPM cannot
explain the cross-sectional spread of average returns. This implies that when conditional betas and market
risk premia are correlated, OLS fails to provide consistent estimates of both the conditional alpha and the
conditional betas. Moreover, Ang and Chen (2007) have shown that the degree of the inconsistency depends
on unknown parameters driving the conditional beta process that are not directly observed. In Section 5
we show that an extended (mixture) conditional CAPM models in which the betas change over time and
are potentially correlated with the risk premium (for instance, because both sets of parameters follow a
Markov chain process driven by the same latent state variable) may generate zero (or at least, statistically
insignificant) abnormal returns.
27
4.1. Time-Varying Abnormal Returns and Co-Moments for Size, Value, and Momentum Portfolios
As a way to provide some intuition for our empirical strategy, we deepen our preliminary investigation of the
data by computing and plotting time-varying (OLS, unconditional) alphas, covariances, partial (downside)
covariances, coskewness, and cokurtosis coefficients for SMB, HML, and MOMO portfolio returns. The
co-moments are all computed vs. value-weighted CRSP market portfolio returns. Co-skewness for portfolio
i is defined as Coskew[Rit, R
Wt ] ≡ Cov[Ri
t, (RWt )
2] and cokurtosis as Cokurt[Rit, R
Wt ] ≡ Cov[Ri
t, (RWt )
3].
Of course, the covariance of these long-short portfolios vs. the market portfolio is proportional to a time-
varying measure of beta, similarly to Ang and Chen (2007); the partial covariance — computed conditioning
on negative market returns, RWt+1 < 0 — is proportional to a time-varying downside beta as in Ang, Chen,
and Xing (2006). However, we extend our investigation to coskewness and cokurtosis as well. Because
our analysis in this Section wants to be a purely exploratory one, we compute time-varying alphas and
co-moments resorting to the simplest possible method, i.e., 60-month rolling window estimates. To facilitate
interpretation, we report covariances scaled by the variance of the market portfolio over the same 60-month
period (i.e., we report standard, unconditional CAPM and downside betas), coskewnesses scaled by the
factor V ar[RWt ]pV ar[Ri
t], and cokurtosis scaled by the factor3pV ar[RW
t ]pV ar[Ri
t]. These choices follow
Ang, Chen, and Xing (2006).
Figure 1 plots the results. In the first plot we report 60-month rolling window estimates of the uncondi-
tional CAPM alphas. All long-short portfolio display an enormous amount of variation over time and in all
the cases, the OLS alpha frequently changes sign over time. The HML alpha is mostly positive, although
recursive estimates from the mid-1930s turn negative and large (below -2% per month at the apex of the
Great Depression, between 1933 and 1934), indicative of abnormal and large returns of growth companies in
excess of value companies, relative to what is justified by the CAPM. Apart from three short-lived episodes
coinciding with the beginning of WWII and the recessions of 1970 and 1974, the HML alpha remains positive
throughout and it tends to persistently exceed 1% per month for long periods such as the 1950s, the late
1980s, the period 1996-2000, and more recently the Spring of 2007. Although there is evidence of negative
HML alphas in the 1930s, the plot does not support a simplistic conclusion that HML abnormal returns are
originated only from the post-1964 period, although it is true that a 1927-1963 average (0.41%) is inferior
to the 1964-2008 one (0.75%) and that both the 1980s and late 1990s feature two remarkable peaks in the
HML alphas. What is evident is that the value alphas are strongly time-varying as well as considerably
persistent.28
Similar remarks hold for the recursive estimates of MOMO’s alpha, although MOMO seems to leave
the Great Depression “regime” before HML does (starting in late 1935), while the negative/low alphas
corresponding to the bear markets of late 1960s, 1970s, and early 1980s are more negative than those
found for HML. MOMO’s alpha lands negative territory also in the bear markets of 2001-2002. In fact,
the correlation between HML and MOMO’s alphas is positive and statistically significant (0.51). On the
opposite, the SMB alpha is positive and negative at equal frequencies: initially positive and large during
the 1930s and 1940s, and then mostly negative between the mid-1980s and 2004. Apart from the early
part of the sample, the size anomaly seems then to be related to two alpha-spike periods (reaching monthly
28The first-order serial correlation coefficient for the HML alpha is 0.986; those for MOMO and SMB are 0.972 and 0.991,
respectively.
28
values of approximately 1%) between 1967 and 1972 and then 1978 and 1985. In fact, the SMB alphas are
negatively correlated with both HML (-0.35) and MOMO (-0.15).
The second and third plot show the standard and downside CAPM betas, respectively. Both definitions
of beta show considerable time variation, as already observed by Ang and Chen (2007).29 However, while
the SMB beta is moderate in absolute terms but generally positive (which explains why on average the SMB
alphas are smaller than those of other long-short portfolios), the HML beta starts positive and relatively
high in the 1930s (around 0.5), it then trends down reaching negative values (of around -0.5) in 2004, and
then resuming towards higher values between 2005 and 2008. This is consistent with Ang and Chen’s finding
of a structural difference between the pre-1963 and the post-1963 samples: in the pre-1963 subsample, the
beta of the book-to-market strategy are statistically positive and large enough to explain the performance of
the strategy; in the post-1963 subsample, book-to-market betas become negative and can no longer explain
the performance of the book-to-market strategy. MOMO betas are instead mostly characterized by high
volatility, although also in this case three clear patches of negative betas — 1930s and 1940s, 1970s, and
the post-2001 period — can be distinguished from periods of positive betas, i.e., the 1950s and the 1980s
and 1990s. The recursive downside betas tend to follow the general patterns revealed by the unconditional
betas: the HML downside beta are simply small, although it becomes negative and rather large (around -0.6)
between 1998 and 2004; the MOMO downside beta is very volatile and turns negative in correspondence
to bear periods, such as the 1930s, the 1970s, the late 1980s, and recently the post-2001 period. The SMB
downside beta is of moderate size but also remains persistently positive getting close to a unit value in the
bull periods of the 1960s, 1980s, and 1990s. Of course, the erratic and volatile dynamics of the HML and
MOMO betas — both unconditional and downside — makes it hard to think that either the unconditional
CAPM or a semi-variance, conditional form of the CAPM may explain the high and positive returns on
these two portfolios. On the opposite, the dynamics over time of both the standard and downside SMB betas
leaves it open the possibility that some small modification of the CAPM may explain away the presence of
high-alpha periods.
The fourth and fifth plots show recursive, rolling window estimates of (scaled) coskewness and excess
cokurtosis. MOMO has dreadful coskewness properties: with few exceptions, coskewness is always negative,
i.e., MOMO returns are below-average when the variance of the market portfolio is above average; half of the
time, coskewness is below -0.5, i.e., it is hardly negligible; during the 1930s and 1940s, the late 1980s, and
in the period that follows 2005, MOMO’s coskewness falls below -1. On the opposite (with the exception
of the 1930s and 1940s, when cokurtosis is negative), MOMO’s excess cokurtosis oscillates around zero,
which illustrates that most of the time the MOMO returns and the skewness of market returns are weakly
correlated. However, coskewness is certainly a possible candidate to explain away the high MOMO’s alphas,
since negative coskewness needs to be rewarded by high risk premia. The coskewness of HML tends instead
to be moderate, although after a start in the positive numbers (which would make HML more appealing
than what is implied by a standard mean-variance framework) in the 1930s, HML coskewness remains in
the negative region most of the time and becomes rather large (around -1) during the 1990s. HML shows
instead persistently positive and high excess cokurtosis between the 1930s and the 1950s; however, after
29Of course, the finding of strong time-variation in recursively estimated betas stresses that it may be difficult to produce a
correct inference regarding the true conditional alpha, because the standard unconditional CAPM OLS regressions are likely to
be misspecified.
29
1987 HML’s excess cokurtosis turns mostly negative. The dynamics in SMB coskewness and cokurtosis are
much less interesting, although coskewness turns large and negative for most of the 1980s and 1990s, while
cokurtosis spikes in the 1940s and between 1987 and 1992. All in all, it remains unclear whether the third
and fourth higher co-moments may provide a substantial contribution to explain the high HML alphas in
the first panel. Section 5 explores this avenue by proposing the concept of mixture CAPM and estimating
a number of related econometric models.
Figure 2 repeats these calculations using 120-month rolling windows. The patterns are qualitatively
identical. The only interesting differences is that the MOMO’s alphas are now essentially always positive,
although they become rather small between the 1970s and 1980s. As conjectured by Ang and Chen (2007),
other features besides CAPM alphas and betas exhibit considerable time variation. In particular, we have
seen in Figure 1 that variation in coskewness (for MOMO and partially HML) and cokurtosis (for MOMO)
may play an important role in explaining away the high and positive alphas of value- and momentum-sorter
long-short portfolios. Moreover, it is clear from Figures 1-2 that many of the series display time-varying
patterns of persistence potentially consistent with the presence of regime switching dynamics, with recurrent
periods of positive and high values followed by periods of negative and small values.30
5. Empirical Estimates
In this section we report and comment on both unrestricted and restricted estimates of the pricing kernel
models reviewed in Section 3 and estimated on a 4 × 1 vector that includes value-weighted excess marketreturns, returns on the standard Fama-French SMB and HML portfolios, as well as returns on the long-short
momentum portfolio; when relevant, the instruments in zt that capture any kind of predictability of market
excess returns are the dividend yield, the term spread, and the default spread, i.e., N = 4 and M = 3.
In what follows we assume that RWB,t+1 = Et[R
Wt+1], where the conditional expectation of one-month ahead
market gross returns is computed using in each occasion the model under estimation. Of course, under
the assumption that the pricing kernel is unique and identifiable, there is nothing compelling in the choice
of basing our estimates on a vector of portfolio returns that — besides excess market returns — picks up
equity portfolio returns that reflect size, value, and momentum anomalies. However, given the key role
that these US equity portfolios have played in the development of the empirical literature on the US equity
cross-section, we believe this estimation exercise to be a sensible enough starting point.
5.1. The Smooth, Single-State Benchmark
We start by presenting empirical estimates of the single-state SSBC benchmark in ((12)). Table 2 reports
parameter estimates and summary statistics concerning the fit provided by the model to the four equity
portfolios under investigation.31 Besides the maximized value of the log-likelihood function, we also report
30For instance, the 120-month rolling window first-order serial correlation coefficient for the HML alpha is 0.992; those for
MOMO and SMB are 0.981 and 0.996, respectively. Of course, this is also an implication of the rolling window structure used
in the construction of the statistics presented.31To save space, we have omitted to report estimates of the covariances between shocks to predictors and shocks to the
predictors themselves and shocks to portfolio returns, i.e., Cov[ηZt+1, ηWt+1, ], Cov[η
Zt+1,η
it+1] (i = SMB, HML, and MOMO),
and V ar[ηZt+1]. They are available from the author upon request.
30
a few standard information criteria (Akaike’s, Hannan-Quinn’s, and Bayes-Schwartz’s) that trade-off fit and
parsimony and penalize the log-likelihood using an increasing function of the number of parameters to be
estimated. In the table, three blocks of estimates appear, corresponding to different sets of restrictions
on the pricing kernel (the first set of estimates is obtained under an empty set of restrictions). The table
shows that as tighter and tighter constraints on the sign and shape of the pricing kernel are imposed, the
maximized log-likelihood rapidly declines, moving from -10339 for the unrestricted case to -11884 when both
sign, monotonicity, and convexity restrictions are imposed. Consequently, the information criteria increase
as we move from the left to the right in Table 2.32 A few indications of problems for (12) in providing an
adequate fit come from the parameter estimates displayed in the table. First, we notice that even though
(12) allows already a non-negligible degree of flexibility to the pricing kernel, the alphas for HML and/or
MOMO are systematically statistically significant, with annualized abnormal rates of return between -0.9
and 6.8 percent in the case of HML, and 1.3 and 17 percent per year in the case of MOMO. In one case
(restriction set 2), also the SMB alpha actually turns positive and statistically significant. This means
that the pricing kernel defined by (12) fails to bring about consistency between the risk factors that are
assumed to be priced (as well as the assumed mapping from predictors to the unit risk prices) — in this
case, covariance and semi-covariance risk — and the dynamics of excess market returns and net returns on
short-long equity portfolios built exploiting the cross-sectional dispersion of firms in the size, value, and
momentum dimensions. Worse, and especially in the unconstrained case, the evidence that the two assumed
risk factors are consistently priced in the cross-section is rather weak. For instance, in the first column, we
have that both the joint nulls of g1 = 0∩b1 = 0 of g01 = 0∩ b1 = 0 cannot be rejected using a standardjoint Wald test (and all the individual coefficients fail to be significant at standard test size levels), which
means that both covariance and downside-covariance risk fail to be priced in the US cross-section. When
economic restrictions are imposed, it remains true that the null of g1 = 0 ∩ b1 = 0 cannot be rejected,although this is no longer the case for the composite hypothesis g01 = 0∩ b1 = 0, i.e., there is at leastweak evidence of downside covariance risk being priced when unit risk prices evolve “smoothly”. In fact,
when restrictions are imposed there is evidence of the loading of the downside covariance risk price on the
term spread being statistically significant (with a borderline p-value just below 0.01), while — recalling that
λ02,t+1 = −Rft (g
01 + b
01Bzt) — it is clear that a few of (yet, insignificant) estimates in b1 and b1 produce
embarrassing coefficient estimates: for instance, a higher time t default spread (typical of business cycle
contractions) forecasts a higher one-month ahead price of downside covariance risk λ02,t+1, which means that
in “bad times” equity prices should be increasing.33
Table 2 also shows the estimates of the VAR(1) for the predictor vector zt when the unit risk prices are
functions of the variables in zt. Clearly, given the structure of (12), the VAR model for zt is autonomous
(i.e., it satisfies a block-exogeneity property) vs. the rest of the model in (12). However, our estimation
32As usual, information criteria are defined in such a way that the best performing model returns the lowest value of the
information critera.33Interestingly, imposing restrictions that ensure that the estimated Mt+1 process is a pricing kernel in an economic sense,
causes the evidence in favor of the hypothesis that covariance and semi-covariance risks are priced to get stronger, with g1
systematically negative and statistically significant. However, these good news are countered by the fact that g01 is estimated
to be negative and significant, which implies that the downside covariance risk premium is negative, which is hardly a sensible
finding. Moreover, a few of the unexplained variance estimates (especially for MOMO and the market portfolio) grow very large
(exceeding 25% per year) and seem “too high” to be sensible in an economic perspective.
31
strategy in practice produces estimators which are of a covariance matrix-weighted type (like GLS, which
are just a special case of MLE) and as a result the estimates of the VAR for zt turn out to also depend on the
fact that economic restrictions on the pricing kernel (hence, the unit risk prices) are imposed. As a matter
of fact, the point estimates of the coefficient in μ and B hardly change with the type of restrictions imposed,
and only the reported standard errors show some degree of sensitivity to the different estimation exercises.
All of the three predictors are strongly persistent, as one would expect, with own-lags VAR coefficients of
between 0.73 (for the term spread) and 1, even though in all cases standard stationarity conditions are
satisfied. Otherwise, there is only some evidence that past values of the default spread help forecasting
subsequent values of the term spread.
Table 3 reports the comparative, in-sample pricing performance of all models examined in this Section,
starting from the SSBC benchmark. Both in terms of root-mean-squared pricing errors (RMSPE) and of
(pseudo-) R2s the model yields a rather mediocre performance. For instance, all R2s are between 0 and 2.1
percent, and the performance tends to be worse than what can be attained using a plain-vanilla unconditional
CAPM. Interestingly, while in the absence of constraints, the RMSPEs are almost entirely caused by variance
in the pricing errors (i.e., the mean pricing errors are generally very small), once constraints are imposed
also mean pricing errors increase and give a contribution to the total RMSPEs. To make sense for how
disappointing the model performance may be, we have to recall that in Table 1 the monthly volatility
estimates for SMB, HML, MOMO, and market were 3.4, 3.6, 4.7, and 5.4 percent, respectively. As a result,
RMSPEs between 2.9 and 5 percent — in short, of a scale comparable to the total volatility of portfolio
returns — appears to be quite unsatisfactory.
One final measure of plausibility of the benchmark (12) is given by average sample measures of the
estimated price of risk coefficients,
λ2 = −Rft (g1 + b
01bBzt)
λ02 = −Rf
t (g01 +
bb01 bBzt), (18)
where hatted matrices refer to iterated MLE estimates, and Rft and z are the sample means of the gross
riskless interest rate and of the predictor variables, respectively. In the unrestricted case, we obtain λ2 =
0.0075 and λ02 = −0.0012, which is clearly not sensible as the downside covariance risk premium would
actually be negative. We also perform the following exercise: we collect the official, monthly NBER recession
dates and separately compute zrect over recession months and z
expt over expansion months. We can then
compute λ2,rec, λ2,exp, λ02,rec, and λ
02,exp, obtaining λ2,exp = 0.0093 and λ2,rec = 0.0001, λ
02,exp = −0.0067
and λ02,rec = 0.0213. Oddly, economic expansions (that according to NBER dating would characterize
approximately 80% of our 1927-2008 sample) are characterized by either economically negligible or even
negative prices of risk. These results show how important is to impose a minimal set of sensible economic
restrictions when estimating empirical pricing kernels. Under the first set of constraints, the risk premia
estimates are on average λ2 = 0.0026 and λ02 = 0.3497, i.e., the downside risk premium becomes particularly
sizeable; under the second set of restrictions also involving the shape of the pricing kernel, the risk premia
estimates in correspondence to mean values of the predictors are instead λ2 = 0.0444 and λ02 = 0.1653. As a
reflection of the risk prices being relatively small, we have observed in Table 3 that the single-state smooth
benchmark may not produce appreciable in-sample R2s and/or reduce the RMSPEs statistics below the
32
sample standard deviation of the portfolio returns under consideration.34
Finally, we compute confidence intervals for these risk price estimates using the following parametric
bootstrap strategy: using the sets of parameter estimates in Table 2 and their (unreported) estimated
covariance matrix, we draw 20,000 975-month long independent samples from the (asymptotic) multivariate
normal distribution of the parameter estimates; for each of the 20,000 draws, we proceed to compute the
estimates of the risk premia implied by (18); we then compute 90% bootstrapped confidence bands by
recording the λ2,0.05 and λ02,0.05 that leave 5% of the boostrapped distribution of the λ2 and λ
02 below λ2,0.05
and λ02,0.05, and the λ2,0.95 and λ
02,0.95 that leave 5% of the boostrapped distribution of the λ2 and λ
02 above
λ2,0.05 and λ02,0.05.
35 Limiting ourselves to the case in the second set of constraints had been imposed, we
find the following bootstrapped 90% confidence bands for the unit risk prices:
λ2 ∈ [−0.1422, 0.2269] λ02 ∈ [−0.6935, 0.7775].
Clearly, these confidence bands are wide and do not allow us to reject the null hypothesis that in fact λ2 = 0
and λ02 = 0, i.e., there is in practice no compensation for either covariance or downside covariance risk, in
which case (18) reduces to a simple way of estimating the means of the equity portfolios with a multivariate
Gaussian framework, when the wealth process is predictable and the predictors follow a standard VAR(1)
process as in much of the existing literature.
5.2. Four-Moment Markov Switching CAPM
Table 4 presents the empirical estimates of the complete two-state model (8)-(9) in which, similarly to
Guidolin and Timmermann (2008a), covariance, coskewness, and cokurtosis risk are all priced, while down-
side covariance risk is not (i.e., λ−2,St+1 = 0 is imposed). The structure of the table and its panels are
otherwise similar to Table 2.36 The unconstrained version of the model reveals that modeling regime shifts
in predictability within a four-moment CAPM framework does bring most alphas towards zero in statis-
tical terms and at the same time also greatly weakens the statistical evidence of the presence of average
abnormal returns that the assumed risk factors (here, covariance, coskewness, and cokurtosis risk) cannot
capture. In fact, only for HML in the first regime and for SMB in the second regime we find evidence of
a statistically significant (positive) alpha. The latter alpha is also the only economically large coefficient
(5.4% per month).37 The evidence of average abnormal returns across the two regimes disappears almost
completely when the first set of constraints is imposed, i.e., arbitrage opportunities are ruled out, the signs
34For instance, when all constraints are imposed, we also compute the recession- and expansion-speficic risk prices and obtain
λ2,rec = 0.0271 and λ2,exp = 0.0487, λ02,rec = 0.0341 and λ
02,exp = 0.2974. Therefore expansions would be characterized by
sensibly higher risk premia on both types of covariance risk and as such command higher expected returns and lower equity
prices, assuming that measured conditional covariances are not sensible to business cycles (which is counterfactual).35When simulating the risk premia levels in correspondence to the means, we set Rf
t = 1/Mt+1(zt), i.e., we compute the mean
riskless rate as an implication of the mean empirical pricing kernel implied by the predictors taken at their means.36We omit to report estimates of the (regime-dependent) covariances between shocks to the predictors and shocks to portfolio
returns, as well as the (regime-dependent) covariance matrix of shocks to the predictors within the assumed VAR(1) framework.37It appears that αW1 = 2.14 is also statistically significant with a p-value between 0.01 and 0.05. However αW1 > 0 does not
imply the presence of any abnormal returns and in fact αW1 can be simply interpreted as an intercept term that adjusts the
mean of fitted excess market returns after taking into account of variance, skewness, kurtosis, and of the effect of the predictors
in zt.
33
of the risk premia are constrained to agree with the implications of decreasing absolute risk aversion, and the
estimated pricing kernel implies the mean short-term rate observed over our sample period. Even though
the regime 2 SMB alpha remains rather large in economic terms (4.99%) and commands a p-value around
0.05, all the remaining alpha coefficients fail to be significant at standard levels. This is quite remarkable
in the light of the Table 1, where the HML and MOMO’s alphas were all between 5.5 and 11.5 percent per
annum and highly statistically significant. As already stressed, this tendency of the evidence of non-zero
alphas to weaken when Markov regimes are modelled in unit risk prices and the relevant conditional co-
moments projected (predicted) out of the resulting regime switching model is not a feature we have assumed
a priori (equivalently, the data may have revealed larger, not smaller Markov switching alphas). Finally,
when the second set of constraints is additionally imposed, we only find marginal statistical significance
for the MOMO’s alpha in regime 1, while the SMB’s alpha for regime 2 greatly declines and now fails to
be significant. All in all, comparing Table 4 with Table 2 shows that when the presence of regimes and
their implications for (co-) higher order moments and predictability are taken into account, most of the
evidence on the US cross-sectional anomalies tends to disappear. Focussing on the last three columns of the
table, where all economic constraints have been imposed, we observe that the residual alphas range from an
annual 2.1% (for MOMO in regime 2) to 33% (for SMB in regime 2, which however fails to be statistically
significant).
Even though the results concerning the average abnormal returns in the two states are promising, some
uncertainty remains as to which factors are priced in this model. Focussing again on the estimates obtained
under third set of constraints, we notice that while in regime 2 only coskewness seems to be priced with
λ32 = −Rft g22 = −2.7651, in regime 1 there is evidence that all conditional co-moments are priced, with
λ21 = −Rft g11 = 0.5008, λ31 = −Rf
t g21 = −0.3736, and λ41 = −Rft g31 = 0.0732, and the hypothesis that
g22 = 0, g11 = 0, g21 = 0, and g31 = 0 are all rejected with p-values between 0.01 and 0.05. Interestingly, the
two regimes have a starkly different asset pricing characterization: regime 1 is a truly four-moment CAPM
state, in the sense that independently of the restrictions imposed, covariance, coskewness, and cokurtosis
are all significantly priced and the premia are far from negligible; regime 2 implies to the contrary that only
coskewness risk be priced and — when all restrictions are imposed — with a unit risk premia that appears not
only statistically significant but also economically large. Additionally, in regime 2 when no sign restrictions
are imposed, the covariance risk premium takes on an incorrect sign (i.e., the higher the covariance between
a portfolio and the wealth process, the lower the portfolio average return in excess of the riskless asset). In
fact, simple calculations based on the smoothed probabilities implied by the model (see Figure 3) reveal that
there are structural differences in the average levels — obtained by simulating the model over time under the
parameter estimates in Table 4 — of covariance, coskewness, and cokurtosis (all measured with respect to
the wealth process) across the two regimes. In regime 1, conditional comoments are small in absolute value
(the average covariances are 1.82, -0.49, and 1.20 for SMB, HML, and MOMO, respectively; the average
coskewnesses are -1.91, -1.11, and -0.98; the average cokurtosis are 47.97, -17.27, and 43.39, for SMB, HML,
and MOMO, respectively).38 However, as argued in the Introduction, a two-state Markov switching model
38These regime-specific estimates are simply obtained by computing the one-steap ahead predicted comoments under the ML
parameter estimates obtained in Table 4 and then classifying each sample period as a regime 1 period if the corresponding
smoothed (full-sample) probability of a regime 1 exceeds (or equals) 0.5, and as a regime 2 period otherwise. For completeness
and because these moments are also priced in our econometric framework, market variance is 11.6 in regime 1 (94.4 in regime
34
does generate rich dynamics in conditional comoments and therefore in the quantity of the risks represented
by the different comoments in regime 1. Correspondingly, we have obtained moderate but statistically
significant estimates of the unit risk premia in this state. On the contrary all conditional comoments are
high in absolute value in regime 2 (the average covariances are 20.58, 21.58, and -45.95 for SMB, HML,
and MOMO, respectively; the average coskewnesses are 168.0, 422.8, and -624.0; the average cokurtosis
coefficients are 9584, 20309, and -31645 for SMB, HML, and MOMO, respectively).39 This means that in
regime 2 all comoments jump to very high absolute value, while the estimated unit prices of risk decline
towards zero, to the point of commanding p-values in excess of 5%, with the exception of the coskewness
risk premium, which in fact remains high.40
Additional aid in interpreting the economic meaning of the two states we have specified comes from
Figure 3, where the (full-sample, ex-post) smoothed probabilities are plotted for each the unconstrained and
set 2-constrained estimates reported in Table 4. The first clear point that the figure allows us to make is that
imposing constraints on the EPK changes the inference on the state probabilities, although only by marginal
amounts. In practice, the correlation between unconstrained and constrained probabilities is very high (e.g.,
0.994 between unconstrained and constrained-set 2 smoothed probabilities of regime 2) and the only visible
difference is a tendency for the unconstrained probabilities to “wiggle” in a restricted range around the 0
and 1 bounds which by construction characterize a probability measure. Even though the figure shows state
probabilities for both regimes, comments are easier to express with reference to the second state. As per the
estimates of the transition probability matrix in Table 4, this regime tends to characterize approximately
24% of our sample (i.e., 234 months) and has an implied average duration of 4.5 months, i.e., once markets
enter this state, they tend to remain there between 4 and 5 months, which is hardly negligible. However, the
figure also clarifies that — apart from these average duration properties — at least four sample periods tends
to be characterized by a prevalence of the second regime: the Great Depression (1929-1935, when the state
2 probability exceeds 0.9 in 56 months out of 72), the period that leads up to WWII (1938-1942, when state
2 probability exceeds 0.9 in 17 months out of 48), the recession and first oil shock of the early and mid-1970s
(1970-1976, when state 2 probability exceeds 0.9 in 19 months out of 72), and the “dot-com” bubble and its
burst of the late 1990s and early part of the new millennium (1999-2002, when state 2 probability exceeds
0.9 in 31 months out of 48). Clearly all these periods correspond to stages of bubbling, hyper-active and
highly volatile US stock markets which always led to spectacular bubble bursting price action, ending in
protracted bear patches. For instance, focussing on the constrained parameter estimates of the last three
columns of Table 4, the model implies an annualized market volatility of 9.0% in regime 1, and of 87% in
2); the third central moment for the market is 15.0 in regime 1 (367.5 in regime 2); the fourth central moment for the market
is 434.8 in regime 1 (44238 in regime 2).39These co-skewness values are only apparently enormous. When scaled by the standard factor V ar[RW
t ] V ar[Rit], the
co-skewness coefficients are 0.27, 0.63, and -0.71, for SMB, HML, and MOMO, respectively. A similar caveat applies to the
co-kurtosis estimates reported in the main text: for instance, a co-kurtosis with the market of 20309 for HML implies in reality
a scaled (using the factor 3 V ar[RWt ] V ar[Ri
t]) co-kurtosis coefficient of 2.38 which is not at all exceptional.40Intuitively, while it is hard to perform effective back-of-the-envelope calculations for regime 1, it is obvious that only
coskewness dynamics may explain portfolio returns in regime 2. Given the negative and large coskewness unit price of risk,
it will be easy to explain the high MOMO returns in this state, as MOMO displays on average negative coskewness with the
market portfolio; however, it remains hard to explain SMB and HML returns as their coskewness coefficients are on average
positive (and this generates negative average returns). This is consistent with the magnitudes of the regime 2 alphas reported
in Table 4.
35
regime 2, almost ten times higher.41 Obviously, even outside the four periods of prevalence of regime 2 we
have just listed, this state also receives a high ex-post likelihood in many other but shorter sample periods
(e.g., the short 1980-1981 recession, the market correction of late 1987, and the early stages of the short
1990-1991 recession).42
In any event, historical memory serves us well in leading us to characterize the second regime as a bear
state of high volatility and low or negative excess market returns. Also the covariance, (co-) skewness, and
(co-) kurtosis of the equity portfolios under investigations dramatically differ across the two regimes. This
is confirmed by an analysis of the estimates obtained in Table 4.43 In state 2, the equity portfolio volatilities
of the shocks to returns are systematically higher than in regime 1: in annualized terms, these are 17.1%
vs. 7.1% for SMB, 23.5% vs. 6.1% for HML, and 28.3% vs. 6.1% for MOMO. Although these are not easily
transparent from the table, we also proceed to compute state 2 typical means and (total) volatilities for the
equity portfolio returns in regime 2. This is done using the parameter estimates in Table 4 and simulating
the estimated multivariate model in real time, to also compute the co-moments that enter the asset pricing
(conditional mean) equations. The moments are then computed by weighting the real-time co-moments with
the smoothed probabilities in Figure 3. In annualized terms, the market exhibits an average excess return of
-3.3% in state 2 vs. 10.5% in state 1, and a volatility of 33.7% in state 2 vs. 12.0% in state 1. These results
extend to the annualized total volatilities of SMB, HML, and MOMO in regime 1 vs. regime 2: 20.8% vs.
7.5%, 22.5% vs. 7.4%, and 31.3% vs. 8.0%, respectively. We have already reported state-specific estimates
of covariances, coskewnesses, and cokurtosis coefficients computed vs. the market portfolio and verified that
they are considerably higher in state 2 than in state 1; in particular, all of these co-moments are higher
(lower) in state 2 than in state 1 for SMB and HML (MOMO); also in the case of the market while (scaled)
skewness is roughly constant across the two regimes (0.40 in state 2 vs. 0.36 in state 1), (scaled) kurtosis
is 4.97 in state 2 vs. 3.04 in state 1.44 Naturally, as also stressed by Figure 3, regime 1 is the complement
of regime 2 and as such it can be best characterized as a bull state of low volatility and high and positive
excess market returns; covariances, (co-) skewness, and (co-) kurtosis coefficients of the equity portfolios
tend to be lower (in absolute value) than in the bear state.
There is one final dimension along which the two regimes appear to differ: the implied strength of the
predictability patters from the selected instruments to excess market returns as well as the time series dy-
41Nonetheless, one should bear in mind that the average duration of regime 2 does not exceed 5 months, and that these
annualized estimates are obtained assuming independence of the states over time, which is clearly not the case in Table 4.
In practice, a regime classification-based estimate of the annualized market volatility yields a 12.0% in regime 1 vs. 33.7%
in regime 2. Even the latter, high volatility estimate is a rather plausible one: for instance, between 1929 and 1934, the US
value-weighted portfolio has displayed an annualized volatility of 41.7%.42Interestingly, the model signals an increasing probability of having entered a state 2 period around the end of 2007, which
is rather plausible given the acute market crises in the Summer and Fall of 2008 (a period not included in our sample).43Also in this case, it appears more meaningful to focus on the constrained estimates.44As a result of the heterogeneous regime properties of co-skewness and co-kurtosis, the two-state Markov switching model
fails to have stark predictions for expected returns on the three long-short portfolios. In the case of MOMO, the mean return
is considerably lower in state 2 (-5.0%) than in state 1 (12.9%); for SMB the mean return is similar and anyway modest across
the two states (0.7% in state 2 and 2.2% in state 1); for HML the mean return is higher in state 2 (20.1%) than in state 1
(2.4%). In this sense, while high momentum returns appear to be a regime 1 phenomenon and as such tend to characterize
more than 2/3 of the sample, the high value returns are a regime 2 phenomenon. However, the spread across regimes for HML
(20.1− 2.4 = 17.7) and momentum-sorted portfolios (12.9− (−5.0) = 17.9) are similar.
36
namics followed by the predictors themselves. Table 4 shows that while in regime 1 none of the variables in
zt predicts subsequent market excess returns (there is a partial exception for the dividend yield in regime 1,
but the corresponding p-value is around 0.05), in regime 2 all the relevant coefficients are larger, econom-
ically relevant, and at least the coefficient of the dividend yield is relatively large (1.59) and statistically
significant.45 Additionally, while all the predictor variables are highly persistent in regime 1, in regime 2 the
dividend yield and the term spread only exhibit intermediate persistency, while the default spread remains
highly persistent. Clearly, these differences are then reflected in the pricing performance of the model and in
the implied R2s. Interestingly, the presence of a few coefficients illustrating “direct” predictability of excess
market returns from the predictors in zt may be constructed as a disappointing result that should lead to a
rejection of our asset pricing model, because it implies that it is not only conditional co-moments that drive
risk premia in our framework. We return to this perspective in Section 6.
In asset pricing terms, these characterizations for the two regimes estimated under (8)-(9) make intuitive
sense. In the stable regime 1, comoments are low in absolute value but display a rich dynamics and as such
they are priced, as revealed by Table 4. In the volatile regime 2, comoments are higher and the pricing
function is dominated by the coskewness term Cov[Rit+1, (R
Wt+1)
2|Ft], i.e., by the ability of equity portfolios
to provide a hedge against the volatility of the wealth process, because only the parameter λ32 is statistically
significant and large in an economic sense. This is consistent with the fact that it is volatility that dominates
the second regime, and the most appreciable property of any asset in that state is the ability to compensate
the higher variance with adequate returns.
The lower portion of Table 4 reports estimates of the transition matrix characterizing the Markov chain
that drives the switching behavior in the pricing kernel. While the estimate of Pr(St = 1| St−1 = 1), the
“stayer” probability of regime 1, is relatively stable as increasingly stringent constraints are imposed, the
estimate of Pr(St = 2| St−1 = 2) increases from 0.72 to 0.78. Correspondingly, regime 2 becomes more and
more persistent, with the estimate of Pr(St = 2| St−1 = 2) that equals 0.72 implying an average duration of3.6 months and an estimate of Pr(St = 2| St−1 = 2) of 0.78 an average duration of 4.6 months. The set 2-constrained estimates of P means that the long-run, ergodic probabilities of regimes 1 and 2 are, respectively
0.73 and 0.27. The maximized log-likelihood function of the two-state MS model is considerably higher than
the single state benchmark in Section 5.1, -9598 vs. -10339. Correspondingly, all information criteria decline
from 21-22 in Table 2 to 19-20 in Table 4. This means that even when the log-likelihood improvement is
penalized by the considerable increase in the number of estimated parameters (from 55 to 92, a difference
of 37 additional parameters to be estimated), the superior performance of the two-state model is difficult to
refute. Along the same lines, because the single-state and the two-state Markov switching models are not
simply nested, it is also sensible to proceed to a likelihood ratio test, which is essentially a way to trade-off
the maximized log-likelihood improvement caused by the flexibility and better fit of the two-state model
and the higher parsimony of the single-state model:
LR = 2[−9598.04− (10339.15)] = 1482.22 a∼ χ237.
45As for the economic “significance” of the coefficients, in regime 2 a one standard deviation increase in the dividend yield
causes an increase in excess market returns of 2.48% (vs. 0.14% in regime 1), a one standard deviation increase in the term
spread causes an increase in excess market returns of 0.69% (vs. -0.01% in regime 1), and one standard deviation increase in
the default spread causes a decrease in excess market returns of -0.40% (vs. +0.16% in regime 1).
37
Under a χ237, a LR statistic of 1482.22 implies a p-value which is essentially zero, a sign of a strong rejection
of the single-state model in favor of the two-state one. As already noticed in Table 2, when all constraints
are imposed on the estimation of the kernel, the maximized log-likelihood declines somewhat, to -10140.
However this values remains critically higher than the corresponding log-likelihood when the single-state
model had been estimated under all the sensible constraints, -11834. The information criteria in Table 4
— between 21 and 22 — all signal that the two-state model ought to be preferred to the single-state one,
for which the information criteria all fell between 24 and 25 in Table 2. Ignoring for simplicity potential
complications for the structure of the asymptotic distribution that may be caused by the fact that we are
comparing constrained MLE estimates, a likelihood ratio test produce a test statistic in excess of 3388 which
is clearly highly statistically significant.
Finally, Table 3 gives account on the comparative in-sample pricing performance of the two-state, four-
moment CAPM model. For the sake of brevity, we only comment on the unconstrained and set 2-constrained
estimates. The top panel of the table clearly shows that the unconstrained two-state model provides by and
large an improvement over the single-state benchmark. All of the pseudo-R2 improve and reach levels of
3.5-6.6% which start being far from negligible for the cross section of US equity returns. Correspondingly,
the RMSPEs generally decline (but excess market returns are an exception), with the most substantial
improvement characterizing the fit to MOMO returns (with a decline in excess of 11%). Interestingly, most
of this decline comes from a reduction of the variance of the pricing errors. When the EPK is constrained to
be compatible with standard properties of the intertemporal rate of marginal substitution for a risk-averse
investor, the improvement is uniformly visible only one portfolio out of four. Oddly — even though R2 and
RMSPEs do measure different aspects of the notion of “fit” — while the R2 improvements concern MOMO
and the market portfolio, the RMSPE improvement involve SMB and MOMO. Interestingly, fitting HML
returns using the two-state model proves rather difficult, also because a substantially negative average pricing
error (bias) of almost -1% appears, i.e., the two-state model systematically over-predicts value minus growth
return spreads. It is also remarkable that a similarly large bias appears for excess market returns (-1.8%
per month), even though the final RMSPE is actually lower than under the single-state smooth benchmark,
thanks to the fact that the Markov switching errors greatly reduces the variance of the pricing errors, i.e.,
the regime switching models prices “worse” on average, but much more consistently, avoiding to produce
huge and aberrant pricing errors. More generally, the resulting pseudo R2 are much less impressive than in
the first panel of the table, achieving levels of 3.0-3.7% only. This implies that a substantial portion of the
explanatory power of a Markov-switching, four-moment CAPM is effectively lost when economic constraints
are imposed to ensure its admissibility. It remains to be seen whether also a Markov-switching mixture
CAPM may be subject to the same effect.
5.3. Mixture CAPM
Table 5 presents the empirical estimates of the three-state mixture CAPM model (10).46 Notice immediately
that even though the mixture CAPM is logically simpler and obtained as a “restriction” imposed on the
completely flexible Markov switching four-moment CAPM, being based on a K = 3-state specification (10)
46For additional clarity and to save space, in what follows we denote as “dCAPM” the downside covariance CAPM and as
“4MOM” the four-moment CAPM.
38
does imply a higher number of parameters to be estimated, 138 vs. 92 in the two-state case of (8)-(9).47 In
the table, a portion of the information criteria — essentially the Akaike and Hannan-Quinn criteria — point
to the fact that the three-state mixture CAPM may be preferred to the two-state four-moment model of
Section 5.2 and, a fortiori, to the single-state smooth benchmark of Section 5.1, and this in spite of the
fact that the mixture model implies a higher number of parameters to be estimated. For instance, under
the constraint set 2, the AIC criterion declines from 20.99 to 20.81 (it was 24.39 in the single-state case)
and the H-Q criterion from 21.16 to 21.07 (it was 24.49 in the single-state model). The indications given by
the Bayesian information criterion (BIC) are instead harder to read, as the BIC decreases when going from
Table 4 to Table 5 only under the constraint set 1 (from 22.40 to 22.10).
Table 5 presents empirical estimates. Similarly to Tables 2 and 4, we omit to report estimates of the
(regime-dependent) covariances between shocks to the predictors and shocks to portfolio returns, as well as
the (regime-dependent) covariance matrix of shocks to the predictors within the assumed VAR(1) framework.
To save additional space, in Table 5 we also omit to report estimates of the MS VAR(1) coefficients of the
predictors.48 Clearly, the three-state mixture CAPM model achieves the goal of leading the estimates of
the average abnormal returns not justified by risk exposure towards zero and certainly to levels of weak
statistical significance. In Table 5, we obtain 3 different estimates for each of the long-short portfolio alphas,
one in correspondence to each of the three pre-assigned asset pricing regimes, e.g., αHMLCAPM , αHML
down−CAPM ,
and αHML4mom−CAPM .We immediately notice that the toughest alpha to “send” to zero (at least, in a statistical
sense) is MOMO’s alpha, especially in correspondence to the first, CAPM-driven regime. In fact, in the
unrestricted case, αMOMOCAPM = 0.90 (which corresponds to a 10.8% per year) and commands a p-value close
to 0.01. However, when the full set of economic restrictions are imposed on the EPK, we find that all the
alphas stop being statistically significant, including αMOMOCAPM . Interestingly, the alphas remain rather large
in absolute value in correspondence to third, 4-moment CAPM regime (the alphas range between -2.1% to
2.8% per month), but they command such high standard errors that the associated p-values are all very
high. Economically, this means that even though the modeled risk factors fail to lead the average abnormal
returns to zero in a mathematical sense, they do in a statistical sense as the variation in the sample returns
associated to the third state is sufficiently large to drive the classical 95% confidence intervals around the
estimated αSMB4mom−CAPM , αHML
4mom−CAPM , and αMOMO4mom−CAPM to always include a zero abnormal return. This
finding has key economic implications because in the absence of a rational explanation (see Fama and
French, 1993, who interpret SMB and HML as priced risk factors in a intertemporal CAPM framework,
and Carhart, 1997, who adds MOMO has a fourth, priced factor), the conclusion of Lakonishok, Shleifer,
and Vishny (1994) that the asset pricing anomalies (in particular, value and momentum premia) would be
caused by overraction-fueled irrational misspricing hold. On the contrary, the ability to isolate one EPK
that — especially under economically meaningful constraints — prices the US cross-section without generating
large and statistically significant abnormal returns (alphas) is consistent with rational explanations. We also
47In any event, let us remark that 138 parameters are not “too many” (apart from numerical considerations), as with
975 × (N + M) = 6, 825 observations, this implies a saturation ratio (the ratio between total number of observations and
number of uknown parameters to be estimated) of almost 50, which is well in excess of the lower bound of approximately 20
normally advised in the non-linear time series literature.48These coefficients (under three alternative sets of restrictions on the estimation program) are available from the Author
upon request.
39
notice that this type of evidence as well as its “progression” across restriction sets is very similar to what
we had documented in Table 4 for the two-state MS CAPM model.
Imposing economic constraints on the estimation of the mixture CAPM appears to generate interesting
payoffs also for the precision with which it is possible to estimate the prices of risk in the model. While
in the unrestricted case, the evidence on the EPK g coefficients is actually a bit puzzling, in the sense
that the statistically significant coefficients raise doubts on the possibility to attribute a compelling asset
pricing interpretation to the three regimes (e.g., the covariance risk coefficient fails to be significant in the
CAPM state, a regime that ought to be only characterized by the pricing of covariance risk), as additional
economic restrictions on the coefficient themselves and the overall shape of the pricing kernel are imposed,
the fraction of 95% classic confidence intervals for the g coefficients that fail to include the zero increases.
The last three columns of Table 5 show that the constant coefficient g0 is significant in all three states,
that λ−2,down−CAPM = −Rf
t g12 = 0.2230 with a p-value between 0.05 and 0.06, and that in the third,
four-moment CAPM regime all the unit risk premia are statistically significant with p-values of 0.01 or
lower (λ2,4mom−CAPM = −Rft g13 = 0.0120, λ3,4mom−CAPM = −Rf
t g23 = −0.0075, and λ4,4mom−CAPM =
−Rft g33 = 0.0065). These results at least partially validate the goodness of the a priori identification of the
three statistical regimes with distinct asset pricing states, in the sense that the second regime produces a
(borderline) statistically significant and positive premium on downside covariance risk, while the third regime
sees all three conditional (symmetric) comoments prices and statistically significant, with a negative risk
premium on coskewness risk.49 In any event, the joint null hypothesis that g11 = g12 = g012 = g13 = g23 = g33
is always rejected; with a p-value between 0.01 and 0.05 in the unrestricted case, and with p-values close
to zero when economic restrictions are imposed. This makes sense because constraining the signs of the g
coefficients to be consistent with non-satiation and (increasing) absolute risk aversion has the predictable
effect of reducing most of the standard errors associated with the estimation.
As customary at this point, additional help in interpreting the economic meaning of the three states we
have specified comes from Figure 4, where the (full-sample, ex-post) smoothed probabilities are plotted for
each the unconstrained and set 2-constrained estimates reported in Table 5. Also in this case, the Figure
shows that imposing constraints in the estimation implies modest changes in the smoothed full-sample
probabilities. Although they must be interpreted with caution because bounded to the [0, 1] interval, the
correlations between CAPM smoothed probabilities across unconstrained and constrained estimates is 0.96,
and the correlation for dCAPM probabilities is 0.85. Because its occurrence is less frequent, the analog
pairwise probability is only 0.71 for the 4MOM CAPM probabilities. In practice Figure 5 shows that while
an unconstrained model would imply several entries and exits between the downside and 4MOMCAPMs over
the periods 1929-1935, and then again in 2000-2001, this is not the case for the constrained smoothed regime
probabilities that instead simply illustrate that both these periods are characterized as 4MOM intervals.
The first panel of Figure 4 makes it obvious that according to the estimates in Table 5, the US stock market
has spent a large proportion of the period 1927-2008 in a plain vanilla CAPM. In fact, the estimates reveal
that when constraints are imposed, the CAPM regime has a duration of approximately 14 months and
49It remains somewhat problematic the finding that λ2,CAPM = −Rft g11 = 0.0206, which is not statistically significant in any
sense (its p-value is 0.22). This means that the first regime is pre-assigned to be a CAPM regime in which only covariance risk
is priced, but our restricted estimates reveal that the resulting estimate for the price of risk is not statistically positive.
40
characterizes 44.1% of the data.50 Long spans of data are captured by the properties of the CAPM state
in which only covariance risk is priced, such as most of the 1940s, 1950s and 1960s, the period 1991-1996,
and more recently most of the bull markets that have occurred between 2003 and 2006. Figure 5 displays
the same smooth probabilities as in Figure 4, but it is limited to the 1985-2008 sub-sample. The point of
these plots is to show the remarkable persistence of the regimes isolated, and in particular of the CAPM
state. Visibly, long stretches of time would have been characterized by the CAPM state in the past 23
years, such as 1985-1986, 1988-1990, 1993-1996, 1998, and the recent up-turn in stock prices of the interval
2004-2006. The horizontal arrows are used to stress the length of these periods of CAPM pricing. It seems
plausible that long patches of time be expression of simple frameworks in which only covariance risk is priced,
which is the key intuition of the classical CAPM. Additionally, it is easy to verify that the CAPM regime
typically features high mean excess market returns and high returns on the remaining stock portfolios under
consideration, accompanied by moderate volatility. For instance, focussing on the constrained parameter
estimates of the last three columns of Table 5, the model implies an annualized market risk premium in the
CAPM state of 10.1% with volatility of 10.4% (i.e., this is an approximate market Sharpe ratio of 0.28 per
month).
The intermediate panels in Figures 4 and 5 shows that the dCAPM regime — when covariance and down-
side covariance risks are differently priced — is scarcely persistent (its average duration is approximately 7
months) but because of the structure of the estimated transition matrix — in particular, the fact that Pr(St =
dCAPM| St−1 = CAPM) is estimated at 0.086 and that the estimate of Pr(St = dCAPM|St−1 = 4MOM) is0.110, i.e., estimates of the transition probabilities that make the switch to the dCAPM regime from both
the CAPM and the 4MOM regimes quite likely — occurs with a remarkable frequency, characterizing 39.4%
of our sample. Needless to say, for many practical applications, to know that a dCAPM pricing regime
has an average duration of 7 months is far from negligible. The moderate persistence associated with this
regime and its frequent occurrence make it hard to eyeball specific time periods in which US stock returns
were predominantly generated by the dCAPM, even though “spikes” of this state are clearly visible in corre-
spondence of 1929-1930, 1935-1938, 1941-1945, the late 1960s and late 1970s, 1998-1999, and more recently
2002-2003. Interestingly, from mid-2007, in correspondence of a deep situation of financial turmoil, the US
equity markets switches from the CAPM to the dCAPM regime, with an unsettling similarity to the onset
of the Great Depression in 1929. Because many of these periods correspond to bear and volatile markets,
the dCAPM regime generates an annualized market risk premium of 3.1% with a volatility of 21.6% (for
a low market Sharpe ratio of 0.04 per month); while the market risk premium is roughly half its historical
long-term mean, the volatility is slightly higher than its long-run estimate.
The bottom panels of Figures 4 and 5 stress instead the fact that the 4MOM asset pricing regime
occurs rather infrequently but — when this happens — the state is considerably persistent, with an average
duration of 9 months. However, in terms of ergodic probability, only 16.6% of the any long sample should be
generated by the 4MOM state, which in our case means 161 months out of 975. Figure 4 clearly shows that
most of these 161 months in our sample can be identified with 1931-1934, 1973, and 2000-2001, for a total
of approximately 84 months.51 Of course, there are many additional probability spikes of the smoothed
50This is the long-run, ergodic probability of the CAPM regime implied by the estimated transition matrix under the full set
of constraints.51It may appear counter-intuitive that a regime with low ergodic probability may be rather persistent. This derives from the
41
probability of this regime, for instance in early 1975 and late 1991. The 4MOM regime is completely
characterized by the presence of high volatility. The constrained estimates in Table 5 imply an annualized
market risk premium of 6.2% (which is close to the historical mean over the full sample) and a stunning
volatility of 47.8%.52 As a result, the corresponding Sharpe ratio is 0.037, which turns out to be even
lower than the reward-to-risk ratio estimated for the dCAPM state. One may also notice that — apart from
the structure we have imposed ex-ante when specifying this mixture model — the three regimes possess a
rather clean ex-post statistical configuration: the CAPM state is a bull regime in which volatility is low; the
dCAPM state is a bear regime in which volatility takes intermediate values; the 4MOM state is dominated
by extraordinarily high volatility, even though the implied risk premium is close to the historical average.
Table 5 also shows the existence of structural differences in predictability patterns (for excess market
returns) across the three regimes and these patterns change as additional economic constraints are imposed.
Under the full set of restrictions, only the CAPM state implies accurately estimated predictability effects,
with higher dividend yields and term spreads in the risk-free yield curve forecasting higher future market
risk premia, and a higher default spread in the private corporate bond market forecasting lower future
risk premia. All these effects are sensible and the corresponding coefficients have the signs that one would
expect in the light of the existing literature. On the contrary, the dCAPM and 4MOM regimes imply very
weak evidence of predictability, with the only exception of the default spread which forecasts higher risk
premia in the 4MOM regime. Even though these differences across asset pricing regimes appear in all the
models we have estimated, the opposition between CAPM and other regimes becomes starker when economic
constraints are imposed in the estimation.53
Table 3 offers evidence of the in-sample pricing performance of the three-state mixture CAPM. Despite
the restrictions that define the mixture model in terms of imposing a characterization of the regimes based on
the asset pricing framework generating returns, the additional flexibility offered by a three-state specification
offers obvious payoffs tat are especially visible in the absence of restrictions: all the pseudo-R2 increase when
moving from the two-state to the three-state models and the same holds — only with the exception of MOMO
returns, for which the RMSPE increases from 3.41 to 3.49 percent — for the RMSPE which declines. In
fact, the mixture CAPM exceeds the R2 levels typical of a simple, unconditional CAPM and substantially
reduces the RMSPE. When the complete set of restrictions is imposed, a similar result obtains, with the
pseudo-R2 now in the range 3.5-6.1%.54
fact that we are estimating a three-state Markov chain: Pr(St = 4MOM|St−1 = 4MOM) is estimated at 0.889; the low ergodic
frequency of the state derives instead from the fact that Pr(St = 4MOM|St−1 = CAPM) and Pr(St = 4MOM|St−1 = dCAPM)are estimated to equal 0.001 and 0.048, respectively, which are rather low values. More generally, it is of some interest to notice
that the estimated transition matrix has a “band-diagonal” structure in which — to an approximation — the markets can only
leave the CAPM to a dCAPM pricing regime, and the 4MOM state to a dCAPM regime. This means that transitions from
the first, low volatility to the third, high-volatility state (and viceversa) will mostly occur going through the second, low risk
premia and moderate volatility regime.52Also in this case one should bear in mind that the average duration of this regime 2 is only 9 months and that the regime
occurs rather infrequently.53As for the economic “significance” of the coefficients, in regime 2 a one standard deviation increase in the dividend yield
causes an increase in excess market returns of 2.48% (vs. 0.14% in regime 1), a one standard deviation increase in the term
spread causes an increase in excess market returns of 0.69% (vs. -0.01% in regime 1), and one standard deviation increase in
the default spread causes a decrease in excess market returns of -0.40% (vs. +0.16% in regime 1).54Also in this case, MOMO produces a RMSE that increases from 3.91 to 3.99 percent when going from two- to three-states.
42
6. Conclusion
This paper has shown that — even when sensible economic restrictions such as non-satiation and global
risk aversion are imposed in the estimation routines — a flexible EPK can be found that prices the US
cross-section of equity returns — as represented by the value-weighted market portfolio and by three long-
short portfolios representing size, value, and momentum sorting — without leaving high and statistically
significant average abnormal returns (alphas) on the table. The existence of the pricing kernel makes the
dynamic properties of cross-sectional US stock returns consistent with rational asset pricing, because the
possible presence of arbitrage opportunities (i.e., the possibility for the estimated EPK to turn negative)
and of risk-loving pricing (even locally, i.e., the possibility for the EPK to turn increasing over the wealth
domain) are explicitly ruled out. The flexibility of the EPK derives from our choice to model regime switches
across different asset pricing frameworks, where the switches are governed by a simple (yet, latent) first-
order Markov chain. In particular, we have found evidence that when the MS EPK is restricted to have
regimes identified ex-ante with distinct asset pricing frameworks — in the sense that each regime corresponds
to periods in which one and only type of CAPM applies from the set standard CAPM, dCAPM, and 4MOM
CAPM — its performance is particularly satisfactory and in some dimensions superior to what could be
otherwise attained by either modelling a single-state in which risk premia change as a function of standard
macro-style predictors (dividend yield, term spread, default spread) similarly to the conditional (4MOM)
CAPMs in Dittmar (2002) and Post and Vliet (2005) or by adopting a 4MOM MS pricing framework along
the lines of Guidolin and Timmermann (2008a).
In an empirical perspective, we have reported to main findings. First, a mixture CAPM obtained from
the MS mixing of CAPM, dCAPM, and 4MOM CAPM delivers alphas which are relatively small and that
especially fail to be statistically significant. This means that a mixture of CAPM that is consistent with
the standard properties of investors’ rationality (i.e., with the typical features of economically admissible
intertemporal marginal rates of substitution) appears to bar the existence of average abnormal returns in
excess to what is justified from genuine risk-taking activities typical of rational investing decisions. In
short: the right econometric model that mixes yet simple asset pricing frameworks yields the sensible
conclusions that there are no free lunches out there, in the markets. This result is consistent with earlier
literature based on different econometric frameworks, for instance Ang and Chen’s (2007) result that when
the existence of correlation between conditional betas are with market risk premia is taken into account, the
Bayesian posterior for size and value alphas is usually concentrated around zero. However, it is interesting to
notice that our zero-alpha result fully obtains also with reference to momentum portfolio strategies, whose
rationalization has proven more elusive in the literature. Second, even though the use of simple CAPM
mixtures — in which only functions of gross returns on aggregate wealth can serve as state variables affecting
the EPK — poses obvious limitations to the ambitions of such exercises, we have reported that the estimated
mixtures produce an interesting OOS pricing performance, in pseudo OOS schemes in which the benchmark
portfolios (market, SMB, HML, and MOMO) are priced 1- and 12-month ahead as well as in genuine OOS
exercises in which portfolios (industry- and value/size-sorted) not used in estimation are priced 1- and 12-
month ahead. In particular, the mixture CAPM model seems to systematically outperform the single-state
smooth benchmark for a majority of the portfolios we have experimented with. This means that the models
Under the first set of constraints, we observe mixed results, with half of the R2s and RMSEs failing to improve.
43
proposed in this paper may also have some further potential for their practical applications.
As it is usual, there are a number of unanswered questions that our research design has left open. For
instance, even though in a MS perspective, the persistence and duration properties of the asset pricing
regimes isolated in Section 5.3 do seem appealing, it remains unclear to what these shifts in (rational)
asset pricing “paradigms” may correspond, not to mention that it remains hard to identify any micro-
founded structural economic model that may explain what factors or events may even cause these shifts.55
Apart from trying and imposing additional structure which might lead to deeper insights on the causes and
nature of the regime shifts in pricing frameworks, our paper could be extended in a number of additional
directions. Our pricing kernel is a function only of the return on aggregate wealth. However, several recent
papers have shown that the specification of aggregate wealth impacts the conclusions of empirical asset
pricing studies. Consequently, we specify the priced factor as a function of both the return on equity and
the return on human capital. Dittmar (2002) incorporates human capital, since recent evidence (Campbell,
1996, Jagannathan and Wang, 1996) suggests that the incorporation of human capital into the pricing
kernel substantially improves the performance of the conditional CAPM. However, in both of these studies,
the return on human capital impacts the cross section of returns linearly. The evidence in Dittmar (2002)
suggests that this linear impact is not sufficient to explain cross-sectional variation in returns. Rather, it is a
nonlinear function of the return on human capital that improves the performance of the model. It would be
interesting to also augment the definition of wealth returns in our framework to include returns on human
capital. It would also be interesting to examine the relation of the estimated EPKs to the volatility bounds
of Hansen and Jagannathan (1997). The bounds represent the minimum volatility that a pricing kernel
must exhibit, given its mean, to be admissible. In this respect, the bounds depict the set of admissible
pricing kernels in the mean-standard deviation space. Dittmar (2002) finds that a cubic pricing kernel
(incorporating returns on human capital) is actually able to generate sufficient volatility to be inside the
Hansen—Jagannathan bounds, but its mean is slightly too high for the pricing kernel to actually lie within
the bounds. Finally, Aretz, Bartram, and Pope (2007) use multivariate GMM models to show that book-to-
market, size, and momentum capture cross-sectional variation in exposures to a broad set of macroeconomic
factors. The performance of their pricing model based on the macroeconomic factors is comparable to the
performance of the Fama and French (1993) model, even though the momentum factor is found to contain
incremental information for asset pricing, consistently with Carhart (1997). However, as discussed in Aretz,
Bartram, and Pope (2007), also these macro-driven models are subject to remarkable instability as a number
of relationships between risk factors and macroeconomic variables do seem to change sign as analysis and
data are updated (e.g., the relation between HML and changes in economic growth expectations), or become
insignificant (e.g., the relation between HML and default risk). As typical in the empirical finance literature,
this may be due to the presence of regimes and/or structural breaks in structural asset pricing relationships.
It may be interesting to explore the notion of MS-driven mixtures as a way to capture, exploit, and forecast
these tendency for macro factors to generate elusive pricing implications over time.
55In other words, these comments imply that the asset pricing regimes identified in the mixture CAPM framework may
be predictable (hence, useful) in a statistical sense, although a lack of understanding for the economic causes underlying the
switches may impose an important upper bound on how much predictability may be uncovered and exploited in practice.
44
References
[1] Aretz, K., S., Bartram, and P., Pope, 2007, Macroeconomic risks and characteristic—based factor models.
Lancaster University Working Paper.
[2] Ang, A., and J., Chen, 2002, Asymmetric correlations of equity portfolios. Journal of Financial Eco-
nomics 63, 443—494.
[3] Ang, A. and J., Chen, 2007, CAPM over the long run: 1926-2001. Journal of Empirical Finance 14,
1-40.
[4] Ang, A., J., Chen, and Y., Xing, 2006, Downside risk. Review of Financial Studies 19, 1191-1239.
[5] Barberis, N., and M., Huang, 2001, Mental accounting, loss aversion, and individual stock returns.
Journal of Finance 56, 1247-1292.
[6] Barone-Adesi, G., 1985, Arbitrage equilibrium with skewed assets. Journal of Financial and Quantita-
tive Analysis 20, 299-313.
[7] Bawa, V. , and E., Lindenberg, 1977, Capital market equilibrium in a mean-lower partial moment
framework. Journal of Financial Economics 5, 189-200.
[8] Brandt, M., and Q., Kang, 2004, On the relationship between the conditional mean and volatility of
stock returns: a latent VAR approach. Journal of Financial Economics 72, 217-257.
[9] Brown, D., and M., Gibbons, 1985, A simple econometric approach for utility-based asset pricing
models. Journal of Finance 40, 359-381.
[10] Campbell, J., M., Lettau, B., Malkiel, and Y., Xu, 2001, Have individual stocks become more volatile?
An empirical exploration of idiosyncratic risk. Journal of Finance 56, 1-44.
[11] Campbell, J., Y., Chan, and L., Viceira, 2003, A multivariate model of strategic asset allocation.
Journal of Financial Economics 67, 41-80.
[12] Campbell, J., and T., Vuolteenaho, 2004, Bad beta, good beta. American Economic Review 94, 1249-
1275.
[13] Carhart, M., 1997, On persistence in mutual fund performance. Journal of Finance 51, 1681-1714.
[14] Chan, K., 1988, On the contrarian investment strategy. Journal of Business 61, 147-163.
[15] Chan, K., and N.-F., Chen, 1988, An unconditional asset-pricing test and the role of firm size as an
instrumental variable for risk. Journal of Finance 43, 309-325.
[16] Chen, J., H., Hong, and J., Stein, 2001, Forecasting crashes: trading volume, past returns and condi-
tional skewness in stock prices. Journal of Financial Economics 61, 345-381.
[17] Cochrane, J., 1996, A cross-sectional test of an investment-based asset pricing model. Journal of Political
Economy 104, 572-621.
45
[18] Dahlquist, M., and P., Soderlind, 1999, Evaluating portfolio performance with stochastic discount
factors. Journal of Business 72, 347-383.
[19] De Bondt, W., and R., Thaler, 1986, Further evidence on investor overreaction and stock market
seasonality. Journal of Finance 42, 557-581
[20] Dittmar, R., 2002, Nonlinear pricing kernels, kurtosis preference, and evidence from the cross-section
of equity returns. Journal of Finance 57, 369-403.
[21] Dumas, B., and B., Solnik, 1995, The world price of foreign exchange risk. Journal of Finance 50,
445-479.
[22] Fama, E., and K., French, 1989, Business conditions and expected returns on stocks and bonds. Journal
of Financial Economics 29, 23-49.
[23] Fama, E., and K., French, 1993, Common risk factors in the returns of stocks and bonds. Journal of
Financial Economics 33, 3-56.
[24] Fama, E., and K., French, 1996, Multifactor explanations of asset pricing anomalies. Journal of Finance
51, 55—84.
[25] Fama, E., and K., French 1997, Industry costs of equity. Journal of Financial Economics 43, 153-193.
[26] Fama, E., and K., French, 2006, The value premium and the CAPM. Journal of Finance 61, 2163-2186.
[27] Fama, E. and J., MacBeth, 1973, Risk, return and equilibrium: empirical tests. Journal of Political
Economy 71, 607-636.
[28] Ferson, W, 1990, Are the latent variables in time-varying expected returns compensation for consump-
tion risk? Journal of Finance 45, 397-429.
[29] Ferson, W., and C., Harvey, 1991, The variation of economic risk premiums. Journal of Political Econ-
omy 99, 385-415.
[30] Ferson, W., and C., Harvey, 1993, The risk and predictability of international equity returns. Review
of Financial Studies 6, 527-566.
[31] Ferson, W., and C. Harvey, 1999, Conditioning variables and the cross-section of stock returns. Journal
of Finance 54, 1325-1360.
[32] Friend, I., and R., Westerfield, 1980, Co-skewness and capital asset pricing. Journal of Finance 35,
897-913.
[33] Griffin, J., X., Ji, and S., Martin, 2003, Momentum investing and business cycle risk: evidence from
pole to pole. Journal of Finance 58, 2515-2547.
[34] Guidolin, M. and A., Timmermann, 2004, Value at risk and expected shortfall under regime switching.
Working paper, University of Virginia and UCSD.
46
[35] Guidolin, M., and A., Timmermann, 2006, An econometric model of nonlinear dynamics in the joint
distribution of stock and bond returns. Journal of Applied Econometrics 21, 1-22.
[36] Guidolin, M., and A., Timmermann, 2008a, International asset allocation under regime switching, skew
and kurtosis preferences. Review of Financial Studies 21, 889-935.
[37] Guidolin, M., and A., Timmermann, 2008b, Size and value anomalies under regime shifts. Journal of
Financial Econometrics 6, 1-48.
[38] Gul, F., 1991, A theory of disappointment aversion. Econometrica 59, 667-686.
[39] Hansen, L. P. and R., Jagannathan, 1997, Assessing specification errors in stochastic discount factor
models. Journal of Finance 52, 557-590.
[40] Harlow, W., and R., Rao, 1989, Asset pricing in a generalized mean-lower partial moment framework:
theory and evidence. Journal of Financial and Quantitative Analysis 24, 285-311.
[41] Harrison, M., and D., Kreps, 1979, Martingales and arbitrage in multiperiod securities markets. Journal
of Economic Theory 20, 381-408.
[42] Harvey, C., 1989, Time-varying conditional covariances in tests of asset pricing models. Journal of
Financial Economics 24, 289-317.
[43] Harvey, C., 2001, The specification of conditional expectations. Journal of Empirical Finance 8, 573-
638.
[44] Harvey, C., and A., Siddique, 2000, Conditional skewness in asset pricing tests. Journal of Finance 55,
1263-1295.
[45] Jagannathan, R., and Z., Wang, 1996, The conditional CAPM and the cross-section of expected returns.
Journal of Finance 51, 3-53.
[46] Kraus, A., and R., Litzenberger, 1976, Skewness preference and the valuation of risk assets. Journal of
Finance 31, 1085-1100.
[47] Kyle, A., andW., Xiong, 2001, Contagion as wealth effect of financial intermediaries, Journal of Finance
56, 1401-1440.
[48] Lewellen, J., and S., Nagel, 2006, The conditional CAPM does not explain asset-pricing anomalies.
Journal of Financial Economics 82, 289-314.
[49] Loughran, T., 1997, Book-to-market across firm size, exchange, and seasonality: is there an effect?
Journal of Financial and Quantitative Analysis 32, 249-268.
[50] Markowitz, H., 1959, Portfolio Selection. Yale University Press, New Haven, CT.
[51] Merton, R., 1973, An intertemporal capital asset pricing model. Econometrica 41, 867-887.
47
[52] Petkova, R., and L., Zhang, 2005, Is value riskier than growth? Journal of Financial Economics 78,
187-202.
[53] Post, T., 2003, Empirical tests for stochastic dominance efficiency. Journal of Finance 58, 1905-1931.
[54] Post, T., and H., Levy, 2005, Does risk seeking drive stock prices? Review of Financial Studies 18,
925-953.
[55] Post, T. and P., van Vliet, 2005, Conditional downside risk and the CAPM. Research Paper ERS-2004-
048-F&A Revision, Erasmus Research Institute of Management.
[56] Post, T., and P., van Vliet, 2006, Downside risk and asset pricing. Journal of Banking and Finance 30,
823-849.
[57] Roy, A., 1952, “Safety first and the holding of assets. Econometrica 20, 431-449.
[58] Scott, R. and P., Horvath, 1980, On the direction of preference for moments of higher order than the
variance. Journal of Finance 35, 915-919.
[59] Sears, R., and K.-C., Wei, 1985, Asset pricing, higher moments and the market risk premium: a note.
Journal of Finance 40, 1251-1253.
[60] Shanken, J., 1990, Intertemporal asset pricing: an empirical investigation. Journal of Econometrics 45,
99-120.
[61] Shumway, T., 1997, Explaining returns with loss aversion. Working paper, University of Michigan.
[62] Zhang, L., 2005, The value premium. Journal of Finance 60, 67-103.
Appendix: Moments of Portfolio Returns
To characterize the moments of returns on the world market portfolio and the co-moments with local
market returns, note that mean returns can be computed from
yt+1 ≡ E[yt+1|Ft] =KXl=1
(π0tPel)μl +KXl=1
(π0tPel)Alyt, (19)
where πt is the vector of state probabilities, el is a vector of zeros with a one in the l-th position so
(π0tPel) is the ex-ante probability of being in state St+1 at time t+ 1 given information at time t, Ft, and
μl ≡ μl +MStvec(Υl).
Because μl involves higher order moments of the world market portfolio such as MStvec(Λl) as well
as higher order co-moments between individual portfolio returns and returns on the global market port-
folio, the (conditional) mean returns E[yt+1|Ft] enter the right-hand side of (8). For instance, computing
Cov[xt+1, xWt+1|Ft] requires knowledge of the first h elements of E[yt+1|Ft]. Appendix B explains our iterative
estimation procedure used to solve the associated nonlinear optimization problem.
48
The conditional variance, skew and kurtosis of returns on the world market portfolio, xWt+1, can now be
computed as follows:
V ar[xWt+1|Ft] =KXl=1
(π0tPel)h¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢2i+
KXl=1
(π0tPel)V ar[ηWt+1|St+1 = l]
Sk[xWt+1|Ft] =KXl=1
(π0tPel)h¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢3i+3
KXl=1
(π0tPel)£¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢V ar[ηWt+1|St+1 = l]
¤K[xWt+1|Ft] =
KXl=1
(π0tPel)h¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢4i(20)
+6KXl=1
(π0tPel)h¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢2V ar[ηWt+1|St+1 = l]
i.
Clearly the skew and kurtosis are functions of the mean and variance parameters μi,l, .., μh,l, Al, ΩlKl=1,state probabilities, πt, and the mean of the VAR coefficients, αj ≡ e0j
PKl=1(π
0tPel)Al. Hence, no new
parameters are introduced to capture the higher moments of the return distribution. Such model-based
estimates are typically determined with considerably more accuracy than estimates of the third and fourth
moments obtained directly from realized returns which tend to be very sensitive to outliers.
Similarly, the covariance between country returns, xit+1, and the world market return, xWt+1, is
Cov[xit+1, xWt+1|Ft] =
KXl=1
(π0tPel)£¡μi,l − e0iyt+1 + (e0iAl − αi)yt
¢ ¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢¤+
KXl=1
(π0tPel)Cov[ηit+1, η
Wt+1|St+1 = l], (21)
Given estimates of the parameters and state probabilities, Cov[xit+1, xWt+1|Ft, St] can easily be calculated.
Finally, the coskewness and cokurtosis between local market returns and the world market return is
Cov[xit+1, (xWt+1)
2|Ft] =KXl=1
(π0tPel)h¡μi,l − e0iyt+1 + (e0iAl − αi)yt
¢ ¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢2i+
KXl=1
(π0tPel)£¡μi,l − e0iyt+1 + (e0iAl − αi)yt
¢V ar[ηWt+1|St+1 = l]
¤+2
KXl=1
(π0tPel)£¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢Cov[ηit+1, η
Wt+1|St+1 = l]
¤
49
and
Cov[xit+1, (xWt+1)
3|Ft] =KXl=1
(π0tPel)h¡μi,l − e0iyt+1 + (e0iAl − αi)yt
¢ ¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢3i+3
KXl=1
(π0tPel)£¡μi,l − e0iyt+1 + (e0iAl − αi)yt
¢ ¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢V ar[ηWt+1|St+1 = l]
¤+3
KXl=1
(π0tPel)h¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt
¢2Cov[ηit+1, η
Wt+1|St+1 = l]
i.
Terms such as¡μi,l − e0iyt+1
¢ ¡μWl − e0h+1yt+1
¢show the deviations of the state-specific mean from the
overall mean and do not arise in single-state models.
50
51
Table 1
Descriptive Statistics for Portfolio Stock Returns This table reports monthly sample statistics for percentage portfolio returns on a variety of equity portfolios The market corresponds to the value-weighted CRSP (NYSE/AMEX/NASDAQ) portfolio; market returns are measured in excess of 1-month T-bill yields. HML and SMB are zero-investment portfolios: HML F-F shorts stocks with below-median book-to-market ratio and goes long in stocks with above-median book-to-market ratio, independently of size (as in Fama and French, 1993); HMLd (decile-based) shorts the lowest book-to-market decile and goes long in the highest book-to-market decile, independently of size; HMLs (small) shorts the two lowest book-to-market deciles among small capitalization stocks (in the two lowest deciles) and goes long in the two highest book-to-market deciles among small capitalization stocks (in the two highest deciles); SMB F-F shorts stocks in the lowest size tercile and goes long in stocks in the highest size tercile, independently of their book-to-market ratio (as in Fama and French, 1993); SMBd (deciles) shorts the lowest size decile and goes long in the highest size, independently of book-to-market ratio. “MOMO” is a portfolio that shorts stocks below the 30th percentile of the distribution of prior returns and goes long in stocks above the 70th percentile, independently of size and book-to-market. DY is the CRSP-implied dividend yield; TERM is the spread between 10-year (constant maturity) government bonds and 1-month T-bill yields; DEFAULT is the spread between Baa and Aaa (long-term) corporate bond yields. In the “Unconditional CAPM” regression section, we have boldfaced the alphas that are statistically significant at a size of 10% or lower; this implies that the corresponding portfolio returns cannot be adequately explained by the standard, unconditional CAPM.
Unconditional CAPM
Portfolio Mean St. Dev. Skew Excess Kurtosis
Jarque‐Bera
p‐value p‐
val. p‐val. Perc. R2
Panel A – Full Sample (1927:01 – 2008:03) Market 0.629*** 5.425*** 0.227* 7.918*** >1000 0.000 − − − − − HML F‐F 0.510*** 3.573*** 2.172*** 15.84*** >1000 0.000 0.428 0.000 0.131 0.000 3.95 HMLd 0.524** 6.658*** 3.003*** 26.92*** >1000 0.000 0.241 0.230 0.441 0.000 12.9 HMLs 0.954*** 7.877*** ‐2.123*** 30.81*** >1000 0.000 1.120 0.000 ‐0.258 0.000 3.18 SMB F‐F 0.155 3.361*** 1.540*** 22.44*** >1000 0.000 0.032 0.759 0.196 0.000 10.1 SMBd 0.615** 7.728*** 4.842*** 47.13*** >1000 0.000 0.273 0.238 0.533 0.000 14.0 MOMO 0.764*** 4.658*** ‐3.019*** 28.43*** >1000 0.000 0.943 0.000 ‐0.285 0.000 10.7 DY 3.927*** 1.566*** 0.529*** 0.163 46.59 0.000 − − − − − TERM 1.522*** 1.222*** ‐0.228 0.596* 22.87 0.000 − − − − − SPREAD 1.123*** 0.708*** 2.480*** 8.825*** >1000 0.000 − − − − − Panel B – Early Sample (1927:01 – 1963:12) Market 0.852*** 6.464*** 0.422** 7.562*** >1000 0.000 − − − − − HML F‐F 0.504** 4.252*** 2.676*** 16.33*** >1000 0.000 0.211 0.224 0.343 0.000 27.3 HMLd 0.489 8.554*** 3.047*** 20.76*** >1000 0.000 ‐0.143 0.674 0.742 0.000 31.4 HMLs 1.002* 10.71*** ‐1.812** 18.47*** >1000 0.000 1.142 0.026 ‐0.164 0.000 9.66 SMB F‐F 0.210 3.518*** 3.733*** 35.86*** >1000 0.000 0.046 0.770 0.192 0.000 12.5 SMBd 0.891 10.03*** 4.688*** 34.77*** >1000 0.000 0.274 0.520 0.724 0.000 21.8 MOMO 0.677*** 5.334*** ‐4.077*** 33.792*** >1000 0.000 1.028 0.000 ‐0.412 0.000 24.9 DY 4.867*** 1.420*** 0.717** ‐0.181 38.66 0.000 − − − − − TERM 1.497*** 0.947*** ‐0.595** 0.999** 44.29 0.000 − − − − − SPREAD 1.256*** 0.925*** 1.959*** 4.476*** 654.7 0.000 − − − − − * Denotes significance at the 10% level. ** Denotes significance at the 5% level. *** Denotes significance at the 1% level.
52
Table 1 (continued)
Descriptive Statistics for Portfolio Stock Returns
Unconditional CAPM Portfolio Mean St. Dev. Skew
Excess Kurtosis
Jarque‐Bera
p‐value p‐val. p‐val. R2
Panel C – Compustat Sample (1964:01 – 2008:03) Market 0.443** 4.369*** ‐0.496** 2.103** 119.6 0.000 − − − − − HML F‐F 0.515*** 2.890*** 0.430** 2.356*** 139.2 0.000 0.629 0.000 ‐0.257 0.000 15.1 HMLd 0.554*** 4.490*** 0.392* 1.245** 47.67 0.000 0.605 0.002 ‐0.109 0.015 1.05 HMLs 0.914*** 4.225*** ‐0.693*** 5.432*** 691.3 0.000 1.115 0.000 ‐0.434 0.000 20.1 SMB F‐F 0.109 3.227*** ‐0.850*** 5.884*** 829.9 0.000 0.019 0.889 0.203 0.000 7.57 SMBd 0.383* 5.039*** 0.737*** 3.909*** 383.9 0.000 0.301 0.168 0.178 0.000 2.41 MOMO 0.836*** 4.009*** ‐0.636*** 5.330*** 664.4 0.000 0.860 0.000 ‐0.053 0.186 3.25 DY 3.141*** 1.208*** 0.282 ‐0.619** 15.51 0.000 − − − − − TERM 1.543*** 1.411*** ‐0.145 0.034 1.896 0.388 − − − − − SPREAD 1.012*** 0.423** 1.259*** 1.765** 209.1 0.000 − − − − − Panel D – Recent Sample (1994:01 – 2008:03) Market 0.538* 4.167*** ‐0.735** 0.969** 22.07 0.000 − − − − − HML F‐F 0.611** 3.424*** 0.765*** 2.581*** 64.15 0.000 0.814 0.000 0.379 0.000 21.3 HMLd 0.246 4.282*** 0.075 1.180** 9.903 0.007 0.420 0.194 ‐0.287 0.000 7.84 HMLs 1.082** 5.566*** ‐0.895*** 4.601*** 170.6 0.000 1.509 0.000 ‐0.705 0.000 27.7 SMB F‐F ‐0.300 3.821*** ‐1.592*** 7.443*** 467.0 0.000 ‐0.364 0.215 0.121 0.087 1.73 SMBd 0.362 5.534*** 0.988** 6.848*** 355.5 0.000 0.353 0.416 0.015 0.885 0.04 MOMO 0.811** 5.007*** ‐0.661** 5.243*** 208.3 0.000 0.929 0.016 ‐0.220 0.017 3.25 DY 1.784*** 0.449** 0.766** 0.120 16.81 0.000 − − − − − TERM 1.456*** 1.281*** 0.160 ‐0.910** 6.632 0.036 − − − − − SPREAD 0.834*** 0.219** 0.955*** 0.143 26.16 0.000 − − − − − * Denotes significance at the 10% level. ** Denotes significance at the 5% level. *** Denotes significance at the 1% level.
53
Table 2
Estimates of Benchmark Single-State Model with Risk Premia Driven by Standard Predictors
This table reports the iterated SMLE estimates of the single-state, SSBC “smooth” model:
,
],|[]|[
],|,[]|,[
11
11,111,211,21
11,1111,2111,21
Zttt
Wtt
WWtB
Wtt
Wttt
Wtt
WWt
it
WtB
Wtt
Wt
ittt
Wt
itt
iit
RRxVarRVarx
RRRRCovRRCovx
++
++++−
++++
+++++−
+++++
++=
++<ℑ+ℑ+=
+<ℑ+ℑ+=
ημ Bzz
zc ηλλα
ηλλα
in which ),( 111,2 tf
tt gR zBb &&&&′+ +−=λ )( 111,2 tf
tt gR zBb &&&&′′′+ +−=λ where ]BB 3[ι≡&& and ]1[ ′≡ ′
tt zz&& , and zt collects the predictors (dividend yield, term spread, and default spread). W
tBR 1, + is set to correspond to the conditional expectation of the gross
market portfolio return, WtR 1+
. The first set of constraints imposes that Mt+1 >0 at all times, that fttt RME /1]|[ 1 =ℑ+
, and
that g1,t<0, g’1,t<0, g2,t>0, and g3,t<0 ∀t. The second set of restrictions further imposes that M’t+1 ≤ 0 and M’’t+1 ≥ 0 ∀t. In the table, DY stands for dividend yield, TERM for the term spread, and DEF for the default spread. Standard errors and (pseudo) t-stats associated to correlations refer to estimates of covariance coefficients.
Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat
αSMB0.1386 0.1050 1.320 0.5397 0.0972 5.552 2.3002 1.3377 1.720
αHML 0.4024 0.0920 4.374 0.5621 0.0764 7.362 ‐0.0727 0.2931 ‐0.248
αMOMO 0.7622 0.1400 5.444 0.1053 0.1342 0.785 1.4152 0.2653 5.334
αW1.1914 0.8587 1.387 2.6618 0.7009 3.798 1.6825 4.1811 0.402
g1 (Cov. risk prem. constant) ‐0.0005 0.0350 ‐0.014 ‐0.0996 0.0176 ‐5.657 ‐0.1447 0.0447 ‐3.236
b1 (Cov. risk prem. loading on DY) ‐0.0040 0.0068 ‐0.588 0.0000 0.0032 ‐0.007 ‐0.0009 0.0226 ‐0.040
b2 (Cov. risk prem. load on TERM) 0.0154 0.0117 1.316 0.0000 0.0091 0.000 0.0014 0.0607 0.023
b3 (Cov. risk prem. load on DEF) ‐0.0014 0.0251 ‐0.056 ‐0.0002 0.0126 ‐0.016 ‐0.0003 0.1151 ‐0.003
g'1 (Semicov. risk prem. const.) 0.0819 0.0581 1.410 0.8434 0.0145 58.200 0.9027 0.2636 3.424
b'1 (Semicov. risk prem. load on DY) ‐0.0100 0.0169 ‐0.592 ‐0.0752 0.0100 ‐7.523 ‐0.0792 0.0281 ‐2.818
b'2 (Semicov. risk prem. load on TERM) ‐0.0148 0.0186 ‐0.796 ‐0.0029 0.0194 ‐0.149 ‐0.0075 0.1664 ‐0.045
b'3 (Semicov. risk prem. load on DEF) ‐0.0173 0.0584 ‐0.296 ‐0.0024 0.0377 ‐0.064 0.0253 0.2456 0.103
c1 (Pred. coeff. of market on DY) ‐0.0377 0.1598 ‐0.236 ‐0.1021 0.1249 ‐0.817 0.3745 2.4484 0.153
c2 (Pred. coeff. of market on TERM) 0.2785 0.2120 1.314 0.2490 0.1712 1.455 1.4156 4.1293 0.343
c3 (Pred. coeff. of market on DEF) ‐0.7380 0.5796 ‐1.273 0.0057 0.5194 0.011 ‐0.0335 2.5563 ‐0.013
SMB volatility 2.9399 0.1598 54.087 4.3461 15.5514 2.395 3.4483 5.9948 1.984
SMB‐HML correlation ‐0.0374 0.2120 ‐1.491 ‐0.2633 0.0293 ‐10.647 ‐0.1685 21.2033 ‐0.106
HML volatility 2.8739 0.5797 14.249 3.1829 0.2698 9.752 3.8574 5.2287 2.846
SMB‐MOMO correlation ‐0.2296 0.1510 ‐17.195 ‐0.0915 0.0071 1.246 ‐0.0331 1.3879 ‐0.382
HML‐MOMO correlation ‐0.0888 0.2411 ‐4.071 0.0261 0.0026 7.118 0.0228 0.0592 6.910
MOMO volatility 3.8467 0.1503 98.430 8.7391 0.0054 519.72 4.6510 0.4180 51.755
SMB‐Mkt correlation 0.2060 0.3187 8.536 0.2794 0.0200 33.113 0.0145 0.1977 1.822
HML‐Mkt correlation ‐0.0434 0.0303 ‐1.846 ‐0.0374 0.0124 ‐4.126 0.0024 0.6108 0.110
MOMO‐Mkt correlation ‐0.0204 0.1842 ‐1.915 0.1803 0.2239 12.857 0.0187 1.0363 0.602
Mkt volatility 4.4924 0.1763 114.52 7.7226 0.0287 175.89 7.1932 14.2156 3.640
DY 0.0622 0.0262 2.374 0.0624 0.0244 2.560 0.0633 0.0246 2.567
TERM 0.2150 0.0667 3.223 0.2239 0.0674 3.321 0.2337 0.0602 3.881
DEF 0.0254 0.0144 1.764 0.0252 0.0186 1.357 0.0246 0.0239 1.029B11 (DYt on DYt‐1) 0.9802 0.0063 155.587 0.9842 0.0079 124.610 0.9753 0.0086 113.895B12 (DYt on TERMt‐1) ‐0.0102 0.0078 ‐1.308 ‐0.0101 0.0090 ‐1.113 ‐0.0097 0.0116 ‐0.833B13 (DYt on DEFt‐1) 0.0251 0.0147 1.707 0.0247 0.0195 1.271 0.0256 0.0210 1.215B21 (TERMt on DYt‐1) ‐0.0191 0.0160 ‐1.194 ‐0.0199 0.0162 ‐1.227 ‐0.0203 0.0186 ‐1.091B22 (TERMt on TERMt‐1) 0.7748 0.0199 38.935 0.7516 0.0204 36.831 0.7278 0.0248 29.400B23 (TERMt on DEFt‐1) 0.1817 0.0374 4.858 0.1760 0.0485 3.629 0.1718 0.0439 3.913B31 (DEFt on DYt‐1) 0.0008 0.0035 0.229 0.0008 0.0033 0.257 0.0009 0.0039 0.228
B32 (DEFt on TERMt‐1) ‐0.0005 0.0043 ‐0.116 ‐0.0005 0.0042 ‐0.120 ‐0.0005 0.0044 ‐0.111B33 (DEFt on DEFt‐1) 0.9759 0.0081 120.481 0.9915 0.0077 129.273 1.0005 0.0095 104.823
Maximum log‐likelihood function:Number of parameters:Akaike information criterion:Hannan‐Quinn inf. criterion:Bayes‐Schwartz inf. criterion:
24.386724.491525.0503
5523.550223.655024.2138
21.321321.426121.9850
‐10339.14 ‐11425.70 ‐11833.5155 55
Unconstrained Estimates Constrained Estimates (Set 1) Constrained Estimates (Set 2)
54
Table 3
Comparing In-Sample Pricing Performances This table reports the average pricing errors,
∑=
−≡T
t
mit
m eTAPE1
,1 ,
where )( ,, mit
it
mit xxe )−≡ is the fitted value of the (excess) returns on equity portfolio i from model m at time t and T is
the sample size; the standard deviation of the pricing errors,
2,,
1
1 )( mimit
T
t
m eeTSDPE −≡ ∑=
− ;
the root mean-squared error, 22 )()( mmm APESDPERMSE +≡ ;
and the (pseudo) R2.
Average Pricing Error Std. Dev. of Pricing Error Root Mean Squared Error Pseudo R2 (%)
SMB ‐0.00064 2.9424 2.9424 1.1608HML 0.00117 2.8760 2.8760 1.3443MOMO 0.00298 3.8491 3.8491 0.1136Market 0.00022 4.4951 4.4951 0.7178SMB 0.00680 2.8035 2.8035 4.2885HML 0.02667 2.7869 2.7870 6.5740MOMO ‐0.07148 3.4091 3.4099 5.0587Market 0.03855 4.8531 4.8533 3.4760SMB 0.01183 2.4930 2.4930 6.5383HML 0.01053 2.6250 2.6250 6.4159MOMO ‐0.04492 3.4925 3.4928 8.6224Market 0.04493 4.0647 4.0650 8.1910
SMB 0.24296 3.0197 3.0294 2.0945HML ‐0.16609 3.0451 3.0496 1.4341MOMO 0.33278 4.1483 4.1616 0.9428Market 0.01231 5.0465 5.0465 0.0025SMB ‐0.73823 2.9117 3.0038 3.4561HML 0.70154 3.2533 3.3281 2.8311MOMO ‐0.00957 3.7799 3.7799 3.1195Market ‐0.17185 4.4096 4.4130 3.6814SMB 0.71827 3.1252 3.2067 2.6495HML 1.07252 5.0777 5.1898 5.4751MOMO ‐0.34627 3.6035 3.6201 1.4030Market 1.05476 4.3812 4.5063 1.6269
SMB ‐0.33362 3.2700 3.2869 4.1617HML 0.17569 3.2268 3.2316 1.9661MOMO 0.13262 4.3909 4.3929 3.2501Market ‐0.25995 5.3993 5.4055 2.9886SMB ‐0.56820 3.1801 3.2305 3.1977HML ‐0.98299 4.3435 4.4533 2.9803MOMO 0.31320 3.9068 3.9193 3.1932Market ‐1.84665 4.8318 5.1727 3.7194SMB 0.22152 2.9357 2.9440 3.5328HML 0.47107 3.3497 3.3827 4.0592MOMO ‐0.13697 3.9912 3.9935 4.7922Market ‐0.65454 5.0741 5.1162 6.1394
Constrain Set 1
Single‐state SSBC Model
Two‐state Four‐Moment MS CAPM
Two‐state Four‐Moment MS CAPM
Two‐state Mixture CAPM
Two‐state Mixture CAPM
Constrain Set 2
Single‐state SSBC Model
Unconstrained Models
Single‐state SSBC Model
Two‐state Four‐Moment MS CAPM
Two‐state Mixture CAPM
55
Table 4
Estimates of Two-State Markov Switching Four-Moment CAPM Model This table reports the iterated SMLE estimates of the two-state, four-moment CAPM model:
ZttSSt
WttSt
WtS
tWtS
WtB
Wtt
WtSt
WtSS
Wt
itt
Wt
itS
tWt
itS
WtB
Wtt
Wt
itSt
Wt
itSS
it
tt
W
tt
ttt
W
t
t
ttt
i
t
xKurt
xSkewxxxVarxVarx
RRCov
RRCovRRRRCovRRCovx
11
11,4
1,3,11,21,21
13
11,4
211,31,111,211,21
11
11
1111
1
1111
]|[
]|[],|[]|[
]|)(,[
]|)(,[],|,[]|,[
++
++
+++++
+++
+++++++++
++=
++ℑ+
+ℑ+<ℑ+ℑ+=
+ℑ+
ℑ+<ℑ+ℑ+=
++
++
+
−
+++
+
+
−
+++
ημ zBz
zc ηλ
λλλα
ηλ
λλλα
with constant transition probabilities and in which ,11 ,1, ++ −−=
tt Sjf
tSj gRλ for j = 2, 3, 4. The vector zt collects the predictors (dividend yield, term spread, and default spread). W
tBR 1, + is set to correspond to the conditional expectation of the gross market portfolio return, W
tR 1+ . The first set of constraints imposes that Mt+1 >0 at all times, that f
ttt RME /1]|[ 1 =ℑ+, and that g1,t<0, g’1,t<0, g2,t>0, and g3,t<0 ∀t. The second set of restrictions further imposes that
M’t+1 ≤ 0 and M’’t+1 ≥ 0 ∀t. Standard errors and (pseudo) t-stats associated to correlations refer to estimates of covariance coefficients.
Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐statSMB ‐ Regime 1 ‐0.1908 0.1554 ‐1.228 ‐0.2566 0.8511 ‐0.301 ‐0.1283 0.5902 ‐0.217HML ‐ Regime 1 0.4610 0.1409 3.272 0.1152 0.7981 0.144 0.0576 0.6557 0.088MOMO ‐ Regime 1 0.1122 0.1886 0.595 0.9411 0.6877 1.368 0.4706 0.6635 0.709W ‐ Regime 1 2.1354 0.4954 4.310 1.5548 0.7385 2.105 0.7774 0.8232 0.944SMB ‐ Regime 2 5.4202 2.6533 2.043 4.9932 2.1683 2.303 2.4966 2.0095 1.242HML ‐ Regime 2 0.2228 7.7102 0.029 ‐0.4640 2.3332 ‐0.199 ‐0.2320 1.6180 ‐0.143MOMO ‐ Regime 2 ‐0.6970 10.2330 ‐0.068 ‐0.0810 4.8504 ‐0.017 ‐0.0405 3.6367 ‐0.011W ‐ Regime 2 3.9820 11.0010 0.362 3.0722 3.9249 0.783 1.5361 3.7104 0.414
g0 ‐ Regime 1 0.6229 0.1255 4.963 0.0410 0.5724 0.072 0.0205 0.4091 0.050g1 (Covariance risk prem.) ‐ Regime 1 ‐2.3316 1.1770 ‐1.981 ‐0.9025 0.5738 ‐1.573 ‐0.4513 0.1101 ‐4.099g2 (Coskew risk premium) ‐ Regime 1 0.5886 0.0464 12.69 0.7450 0.0235 31.767 0.3725 0.1012 3.681g3 (Cokurt risk premium) ‐ Regime 1 ‐0.0836 0.0017 ‐49.18 ‐0.1459 0.0166 ‐8.766 ‐0.0730 0.0290 ‐2.513g0 ‐ Regime 2 1.0548 0.6890 1.531 0.6571 1.6680 0.394 0.3286 1.4518 0.226g1 (Covariance risk premium) ‐ Regime 2 0.4856 0.0067 72.48 ‐0.1994 0.0675 ‐2.953 ‐0.0997 0.0783 ‐1.274g2 (Coskew risk premium) ‐ Regime 2 0.0283 0.0001 205.1 0.0133 0.0095 1.394 0.0067 0.0110 0.607g3 (Cokurt risk premium) ‐ Regime 2 ‐0.0012 0.0000 ‐6549.5 0.0000 0.0002 ‐0.239 0.0000 0.0005 ‐0.039
c1 (Pred. coeff. on DY) ‐ Regime 1 0.1395 0.0956 1.459 0.2088 0.0952 2.159 0.0886 0.0863 1.027c2 (Pred. coeff. on TERM) ‐ Regime 1 ‐0.0112 0.1254 ‐0.089 ‐0.0057 0.0970 ‐0.114 ‐0.0095 0.1183 ‐0.081c3 (Pred. coeff. on DEF) ‐ Regime1 0.2074 0.3334 0.622 0.1384 0.2585 0.447 0.2270 0.2210 1.027c1 (Pred. coeff. on DY) ‐ Regime 2 0.8431 0.4632 1.820 0.6199 0.3700 2.994 1.5865 0.4623 3.431c2 (Pred. coeff. on TERM) ‐ Regime 2 0.4242 0.5201 0.816 0.6342 0.8025 0.536 0.5621 0.8518 0.660c3 (Pred. coeff. on DEF) ‐ Regime2 ‐0.9681 0.9131 ‐1.060 ‐0.8542 0.6479 ‐1.656 ‐0.5673 0.9073 ‐0.625
DY ‐ Regime 1 0.0241 0.0204 1.181 0.0235 0.0318 0.740 0.0279 0.0305 0.915TERM ‐ Regime 1 0.0684 0.0635 1.077 0.8952 0.0843 10.616 0.8448 0.0856 9.871DEF ‐ Regime 1 0.0336 0.0066 5.091 0.0472 0.0058 8.127 0.0484 0.0064 7.571B11 (DYt on DYt‐1) ‐ Regime 1 0.9939 0.0044 225.89 0.8699 0.0067 130.366 0.8076 0.0068 119.283B12 (DYt on TERMt‐1) ‐ Regime 1 ‐0.0008 0.0055 ‐0.145 ‐0.0009 0.0082 ‐0.113 ‐0.0013 0.0054 ‐0.235B13 (DYt on DEFt‐1) ‐ Regime 1 ‐0.0101 0.0162 ‐0.623 ‐0.0076 0.0252 ‐0.302 ‐0.0078 0.0323 ‐0.241B21 (TERMt on DYt‐1) ‐ Regime 1 ‐0.0110 0.0127 ‐0.866 ‐0.0165 0.0197 ‐0.836 ‐0.0104 0.0161 ‐0.644B22 (TERMt on TERMt‐1) ‐ Regime 1 0.8551 0.0188 45.484 1.2705 0.0317 40.109 0.9845 0.0388 27.201B23 (TERMt on DEFt‐1) ‐ Regime 1 0.1905 0.0522 3.649 0.2333 0.0712 3.277 0.1789 0.0856 2.090B31 (DEFt on DYt‐1) ‐ Regime 1 ‐0.0008 0.0013 ‐0.615 ‐0.0010 0.0014 ‐0.731 ‐0.0005 0.0009 ‐0.575B32 (DEFt on TERMt‐1) ‐ Regime 1 ‐0.0059 0.0019 ‐3.105 ‐0.0053 0.0026 ‐2.017 ‐0.0079 0.0032 ‐2.453B33 (DEFt on DEFt‐1) ‐ Regime 1 0.9695 0.0057 170.09 1.1935 0.0067 177.925 1.0320 0.0084 122.236
Unconstrained Estimates Constrained Estimates (Set 1) Constrained Estimates (Set 2)
56
Table 4 (continued)
Estimates of Two-State Markov Switching Four-Moment CAPM Model
Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐statDY ‐ Regime 2 0.5610 0.1781 3.150 0.5617 0.1447 3.882 0.4485 0.1971 2.275
TERM ‐ Regime 2 0.5123 0.3875 1.322 0.5430 0.3336 1.628 0.6756 0.4471 1.511
DEF ‐ Regime 2 0.2078 0.1041 1.996 0.1242 0.0761 1.632 0.1717 0.0666 2.577B11 (DYt on DYt‐1) ‐ Regime 2 0.8633 0.0422 20.457 0.9216 0.0671 13.732 0.4952 0.0796 6.222B12 (DYt on TERMt‐1) ‐ Regime 2 ‐0.0304 0.0285 ‐1.067 ‐0.0403 0.0303 ‐1.331 ‐0.0393 0.0288 ‐1.366B13 (DYt on DEFt‐1) ‐ Regime 2 0.1041 0.0595 1.750 0.0768 0.0722 1.064 0.0722 0.0560 1.289B21 (TERMt on DYt‐1) ‐ Regime 2 ‐0.0571 0.0827 ‐0.690 ‐0.0675 0.0771 ‐0.875 ‐0.0618 0.0475 ‐1.302B22 (TERMt on TERMt‐1) ‐ Regime 2 0.5807 0.0637 9.116 0.3456 0.0825 4.190 0.3981 0.0776 5.133B23 (TERMt on DEFt‐1) ‐ Regime 2 0.2858 0.1191 2.400 0.2829 0.1206 2.347 0.2477 0.1125 2.201B31 (DEFt on DYt‐1) ‐ Regime 2 ‐0.0091 0.0232 ‐0.392 ‐0.0065 0.0325 ‐0.201 ‐0.0078 0.0457 ‐0.171B32 (DEFt on TERMt‐1) ‐ Regime 2 0.0175 0.0173 1.012 0.0227 0.0139 1.641 0.0179 0.0203 0.879B33 (DEFt on DEFt‐1) ‐ Regime 2 0.9223 0.0335 27.531 1.1681 0.0489 23.903 1.0586 0.0418 25.342
SMB volatility ‐ Regime 1 1.8313 0.0031 1095.6 1.9066 0.0038 966.1 2.0409 0.0336 60.81SMB‐HML correlation ‐ Regime 1 ‐0.1224 0.0014 ‐262.4 ‐0.1193 0.0052 ‐74.21 ‐0.0937 0.0169 ‐5.540HML volatility ‐ Regime 1 1.6766 0.0022 1284.0 1.6829 0.0021 1327.1 1.7483 0.0405 43.18SMB‐MOMO correlation ‐ Regime 1 0.0567 0.0065 29.32 0.0459 0.0079 19.51 0.0479 0.0168 2.855HML‐MOMO correlation ‐ Regime 1 ‐0.0061 0.0069 ‐2.725 ‐0.0040 0.0110 ‐1.095 ‐0.0016 0.0803 ‐0.020MOMO volatility ‐ Regime 1 1.8311 0.0024 1402.6 1.7664 0.0024 1303.7 1.7483 0.0206 84.82SMB‐Mkt correlation ‐ Regime 1 0.2104 0.0149 79.08 0.1883 0.0176 63.88 0.2621 0.0741 3.538HML‐Mkt correlation ‐ Regime 1 ‐0.1043 0.0137 ‐38.90 ‐0.0974 0.0137 ‐37.34 ‐0.1038 0.1084 ‐0.958MOMO‐Mkt correlation ‐ Regime 1 0.2243 0.0012 1055.5 0.2246 0.0023 544.0 0.1722 0.0072 23.83Mkt volatility ‐ Regime 1 3.0513 0.0054 1720.0 3.1321 0.0072 1368.9 2.5946 0.0288 90.11SMB volatility ‐ Regime 2 6.7129 0.4381 102.9 6.2541 0.7113 54.99 4.9379 0.4541 10.874SMB‐HML correlation ‐ Regime 2 0.1370 0.4535 14.99 0.0927 0.6247 6.330 0.0507 0.2902 0.175HML volatility ‐ Regime 2 7.3944 0.2610 209.5 6.8230 0.1652 281.8 6.7947 0.2796 24.30SMB‐MOMO correlation ‐ Regime 2 ‐0.2904 0.7511 ‐25.24 ‐0.2518 0.9721 ‐17.81 ‐0.0182 0.3269 ‐0.056HML‐MOMO correlation ‐ Regime 2 ‐0.3402 1.3249 ‐18.47 ‐0.2904 0.7483 ‐29.12 ‐0.0721 0.2521 ‐0.286MOMO volatility ‐ Regime 2 9.7261 0.2634 359.1 10.9971 0.6427 188.18 8.1625 0.1461 55.86SMB‐Mkt correlation ‐ Regime 2 0.3446 0.5834 41.55 0.2934 1.1449 16.75 0.0066 0.3114 0.021HML‐Mkt correlation ‐ Regime 2 0.2580 0.9900 20.20 0.2518 0.9582 18.73 0.0042 0.4239 0.010MOMO‐Mkt correlation ‐ Regime 2 ‐0.4557 0.0346 ‐1340.8 ‐0.4295 0.6281 ‐78.58 ‐0.0117 0.0143 ‐0.819Mkt volatility ‐ Regime 2 10.4780 0.2163 507.5 10.4498 0.6923 157.7 25.1914 0.0498 506.21
ProbState 1 | State 1 0.9182 0.0596 15.406 0.9241 0.0602 15.361 0.9192 0.1026 8.958ProbState 2 | State 2 0.7200 0.1459 4.935 0.7442 0.1426 5.220 0.7778 0.1565 4.969
Maximum log‐likelihood function:Number of parameters:Akaike information criterion:Hannan‐Quinn inf. criterion:Bayes‐Schwartz inf. criterion:
92 92 92
Unconstrained Estimates Constrained Estimates (Set 1) Constrained Estimates (Set 2)
19.8766
20.9867 22.4000 22.0991
21.2899 20.989020.0519 21.4652 21.1643
‐9597.84 ‐10286.81 ‐10140.12
57
Table 5
Estimates of Three-State Markov Switching Mixture CAPM This table reports the SMLE estimates of the three-state mixture CAPM model:
),(
.
3 if][][][
+])(,[+])(,[+],[+
2 if]|[][
]|,[],[
1 if][
],[
111221
11313,413,313,231
13
113,42
113,3113,231
1121,112,212,221
11,1112,2112,221
11111,211
1111,211
+∼++=
⎪⎪⎪⎪
⎩
⎪⎪⎪⎪
⎨
⎧
=⎪⎩
⎪⎨⎧
+++++=
=
=⎪⎩
⎪⎨⎧
++<++=
+<++=
=⎪⎩
⎪⎨⎧
+++=
++=
+++
+
+++++
++++++++
+
++++−
++
+++++−
+++
+
+++
++++
tStZttt
tWtt
WWtt
Wtt
Wtt
WWt
it
Wt
itt
Wt
itt
Wt
itt
iit
tWtt
WWtB
Wt
Wtt
Wtt
WWt
it
WtB
Wt
Wt
itt
Wt
itt
iit
tWtt
WWtt
WWt
it
Wt
itt
iit
N
SxKurtxSkewxVarx
RRCovRRCovxxCovx
SxxxVarxVarx
RRRRCovxxCovx
SxVarx
xxCovx
Ωηημ 0zBz
zb
zc
zc
ηλλλα
ηλλλαηλλα
ηλλαηλα
ηλα
with constant transition probabilities and in which ,11 ,1, ++ −−=
tt Sjf
tSj gRλ for j = 2, 3, 4. The vector zt collects the predictors (dividend yield, term spread, and default spread). W
tBR 1, + is set to correspond to the conditional expectation of the gross market portfolio return, W
tR 1+ . The first set of constraints imposes that Mt+1 >0 at all times, that f
ttt RME /1]|[ 1 =ℑ+, and that g1,t<0, g’1,t<0, g2,t>0, and g3,t<0 ∀t. The second set of restrictions further imposes that
M’t+1 ≤ 0 and M’’t+1 ≥ 0 ∀t. Standard errors and (pseudo) t-stats associated to correlations refer to estimates of covariance coefficients.
Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat
SMB ‐ CAPM regime 0.0650 0.2330 0.279 0.4681 0.2352 1.990 0.3510 0.2612 1.344
HML ‐ CAPM regime ‐0.1494 0.2569 ‐0.582 ‐0.3964 0.2049 1.934 0.1211 0.2322 0.521
MOMO ‐ CAPM regime 0.9010 0.2423 3.719 0.6669 0.2404 2.774 0.7614 0.3978 1.914
W ‐ CAPM regime 1.7053 0.7871 2.166 0.7557 0.5951 1.270 0.9083 0.5061 1.795
SMB ‐ downside CAPM regime 0.8940 0.7794 1.147 0.1972 0.8277 0.238 0.6194 0.5102 1.214
HML ‐ downside CAPM regime 0.0309 0.6098 0.051 0.3130 0.4481 0.698 0.2736 0.7412 0.369
MOMO ‐ downside CAPM regime 1.1223 1.0434 1.076 0.1343 0.7873 0.171 0.4376 2.8043 0.156
W ‐ downside CAPM regime ‐1.9437 2.2832 ‐0.851 0.4217 2.1489 0.196 ‐0.0083 1.5558 ‐0.005
SMB ‐ 4‐mom. CAPM regime 5.3599 8.0236 0.668 0.1066 8.5619 0.012 1.1572 11.9759 0.097
HML ‐ 4‐mom. CAPM regime ‐3.3910 9.1532 ‐0.370 ‐0.3837 8.3565 ‐0.046 ‐2.1277 25.8484 ‐0.082
MOMO ‐ 4‐mom. CAPM regime 0.6324 6.1167 0.103 3.3813 5.7244 0.591 2.8169 4.9767 0.566
W ‐ 4‐mom. CAPM regime ‐1.6057 4.7328 ‐0.339 ‐0.0077 4.2635 ‐0.002 ‐0.3425 15.0226 ‐0.023
g0 ‐ CAPM Regime 0.1395 0.1350 1.033 0.8193 0.1037 7.898 0.9364 0.1738 5.388
g1 (Covariance risk prem.) ‐ CAPM Reg. ‐0.0770 0.0538 ‐1.431 ‐0.0452 0.0549 ‐0.823 ‐0.0060 0.0049 ‐1.230
g0 ‐ downside CAPM Regime 0.5326 0.2649 2.011 0.0359 0.2404 0.149 0.8992 0.4390 2.048
g1 (Covariance risk prem.) ‐ CAPM Reg. ‐0.0085 0.0063 ‐1.365 ‐0.0091 0.0057 ‐1.586 ‐0.0013 0.0028 ‐0.459
g'1 (Downside Cov. risk) ‐ downs. CAPM Reg. ‐0.0327 0.0241 ‐1.354 ‐0.0310 0.0199 ‐1.562 ‐0.0022 0.0012 ‐1.877
g0 ‐ 4‐mom. CAPM Reg. 1.3369 2.8294 0.473 7.0556 2.3529 2.999 2.2445 1.0509 2.136
g1 (Covariance risk prem.) ‐ 4‐mom. CAPM 0.0689 0.0315 2.189 ‐0.0585 0.0273 ‐2.143 ‐0.0119 0.0036 ‐3.261
g2 (Coskew risk prem.) ‐ 4‐mom. CAPM Reg. 0.0258 0.0135 1.905 0.0080 0.0105 0.755 0.0075 0.0015 5.028
g3 (Cokurt risk premium) ‐ 4‐mom. CAPM ‐0.0014 0.0006 ‐2.418 ‐0.0011 0.0005 ‐2.061 ‐0.0065 0.0013 ‐5.024
c1 (Pred. coeff. on DY) ‐ CAPM Regime 0.5744 0.3614 1.589 0.5854 0.2954 1.981 0.7654 0.3579 2.138
c2 (Pred. coeff. on TERM) ‐ CAPM Regime 1.2347 0.4882 2.529 0.8188 0.4738 1.728 0.7754 0.4872 1.592
c3 (Pred. coeff. on DEF) ‐ CAPM Regime ‐4.5438 0.6278 ‐7.238 ‐3.5446 0.4648 ‐7.626 ‐3.8692 1.1840 ‐3.268
c1 (Pred. coeff. on DY) ‐ down CAPM Reg. 0.1070 0.1131 0.946 0.0988 0.1238 0.798 0.0724 0.2110 0.343
c2 (Pred. coeff. on TERM) ‐ down CAPM ‐0.1063 0.1453 ‐0.731 ‐0.0871 0.1234 ‐0.706 ‐0.0788 0.1921 ‐0.410
c3 (Pred. coeff. on DEF) ‐ down CAPM Reg. 0.6745 0.3550 1.900 0.8172 0.4166 1.962 0.6198 0.4688 1.322
c1 (Pred. coeff. on DY) ‐ 4‐mom. CAPM 0.9111 0.8407 1.084 0.7255 0.9071 0.800 0.9073 1.0534 0.861
c2 (Pred. coeff. on TERM) ‐ 4‐mom. CAPM ‐2.4127 0.9392 ‐2.569 ‐1.4961 0.8765 ‐1.707 ‐1.4353 1.4446 ‐0.994
c3 (Pred. coeff. on DEF) ‐ 4‐mom. CAPM 9.1773 1.6555 5.544 8.8686 1.7603 5.038 5.1898 2.6266 1.976
Unconstrained Estimates Constrained Estimates (Set 1) Constrained Estimates (Set 2)
58
Table 5 (continued)
Estimates of Three-State Markov Switching Mixture CAPM
Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐statSMB volatility ‐ CAPM Regime 1.6569 0.0013 1709.3 1.7033 0.0057 510.31 1.5788 0.0019 1298.9SMB‐HML correlation ‐ CAPM Reg. ‐0.1090 5.30E‐02 ‐4.131 ‐0.0801 0.0944 ‐2.073 ‐0.0967 0.0961 ‐2.047HML volatility ‐ CAPM Regime 1.4545 6.50E‐02 27.11 1.4344 0.0947 21.73 1.2879 0.1081 15.35SMB‐MOMO correlation ‐ CAPM Reg. 0.0359 0.0444 1.759 ‐0.0188 0.0833 ‐0.608 0.0680 0.0843 1.964HML‐MOMO correlation ‐ CAPM Reg. ‐0.0989 0.0771 ‐2.444 ‐0.1352 0.1137 ‐2.698 ‐0.0811 0.0845 ‐1.905MOMO volatility ‐ CAPM Regime 1.5726 0.0745 27.65 1.5816 0.2646 9.455 1.5420 0.0716 33.22SMB‐Mkt correlation ‐ CAPM Regime 0.0522 0.0452 4.372 0.1114 0.1141 4.655 0.0728 0.0846 3.637HML‐Mkt correlation ‐ CAPM Regime ‐0.0874 0.1270 ‐2.287 ‐0.0959 0.4202 ‐0.916 ‐0.0755 0.1586 ‐1.642MOMO‐Mkt correlation ‐ CAPM Reg. 0.2457 0.1189 7.417 0.2601 0.2434 4.732 0.2234 0.1758 5.250Mkt volatility ‐ CAPM Regime 2.7398 0.1204 51.96 2.8006 0.1975 39.72 2.6789 0.2159 33.24
SMB volatility ‐ down. CAPM Reg. 3.8752 0.0759 158.32 4.8936 0.2360 101.45 4.4653 0.1149 121.42SMB‐HML correlation ‐ down. CAPM 0.0721 0.1305 6.824 0.0402 0.2783 2.800 0.0591 0.2023 3.668HML volatility ‐ down. CAPM Reg. 3.9834 0.2267 56.00 3.9633 0.4890 32.12 4.0197 0.2509 45.08SMB‐MOMO correlation ‐ down. CAPM ‐0.1882 0.1405 ‐20.86 ‐0.4826 0.2756 ‐60.45 ‐0.1440 0.2112 ‐16.65HML‐MOMO correlation ‐ down. CAPM ‐0.0976 0.2981 ‐5.244 ‐0.2272 0.6762 ‐9.395 ‐0.0469 0.5465 ‐1.888MOMO volatility ‐ down. CAPM Reg. 5.0246 0.2760 73.19 7.0542 0.4646 107.11 7.8123 0.3510 86.94SMB‐Mkt correlation ‐ down. CAPM 0.4117 0.1649 48.88 0.6000 0.3764 76.06 0.2021 0.1708 40.24HML‐Mkt correlation ‐ down. CAPM ‐0.0495 0.3472 ‐2.871 0.0174 0.6326 1.062 ‐0.0396 0.4240 ‐2.856MOMO‐Mkt correlation ‐ down. CAPM ‐0.1183 0.3540 ‐8.484 ‐0.6022 0.7880 ‐52.57 ‐0.0770 0.5071 ‐9.028Mkt volatility ‐ down. CAPM Reg. 6.3165 0.3377 94.53 9.7518 0.5446 174.60 10.8749 0.5041 117.30
SMB volatility ‐ 4‐mom. CAPM 8.5112 0.2321 218.5 18.6513 0.5653 615.39 3.3854 0.3220 35.59SMB‐HML correlation ‐ 4‐mom. CAPM 0.1921 1.4870 7.365 0.0197 3.5654 4.477 0.0612 2.6795 1.097HML volatility ‐ 4‐mom. CAPM 9.5690 1.5889 40.34 43.5582 2.4036 789.37 14.1903 1.9475 51.70SMB‐MOMO corr. ‐ 4‐mom. CAPM ‐0.3591 1.1009 ‐26.87 ‐0.0567 3.7844 ‐12.46 ‐0.0683 1.6099 ‐1.494HML‐MOMO correlation‐4‐mom. CAPM ‐0.6313 2.2883 ‐25.54 ‐0.0915 4.5872 ‐38.77 ‐0.0767 4.2157 ‐2.682MOMO volatility ‐ 4‐mom. CAPM 13.8237 2.5161 53.16 44.6068 3.7032 512.73 10.3948 4.3923 24.60SMB‐Mkt correlation ‐ 4‐mom. CAPM 0.3196 1.0799 23.87 0.1835 2.3552 40.22 0.1664 1.5363 2.809HML‐Mkt correlation ‐ 4‐mom. CAPM 0.5821 1.8216 28.98 0.5585 3.0887 218.10 0.2398 2.5376 10.27MOMO‐Mkt correlation ‐ 4‐mom. CAPM ‐0.6842 1.9366 ‐46.29 ‐0.5578 3.3145 ‐207.77 ‐0.1142 3.0021 ‐3.030Mkt volatility ‐ 4‐mom. CAPM 13.5395 1.0733 119.6 27.6779 2.7683 276.73 7.6612 1.5991 36.70
ProbCAPM at t| CAPM at t‐1 0.6391 0.2733 2.338 0.8463 0.1495 5.661 0.9265 0.1886 4.913ProbdCAPM at t | CAPM at t‐1 0.1473 0.0784 1.879 0.1520 0.0733 2.074 0.0727 0.0359 2.023ProbCAPM at t | dCAPM at t‐1 0.2794 0.0836 3.342 0.1339 0.0836 1.601 0.0883 0.1147 0.770ProbdCAPM at t | dCAPM at t‐1 0.8527 0.0084 101.512 0.8582 0.1495 5.741 0.8637 0.2278 3.791ProbCAPM at t | 4MOM at t‐1 0.5029 0.1404 3.582 0.0006 0.0059 0.102 0.0008 0.0062 0.125Prob4MOM at t | 4MOM at t‐1 0.4917 0.0973 5.053 0.8980 0.3049 2.945 0.8894 0.3828 2.324
Maximum log‐likelihood function:Number of parameters:Akaike information criterion:Hannan‐Quinn inf. criterion:Bayes‐Schwartz inf. criterion: 21.3998 22.1000 22.4731
19.7346 20.4348 20.808019.9976 20.6977 21.0709
‐9823.95 ‐10005.88138 138 138
Unconstrained Estimates Constrained Estimates (Set 1) Constrained Estimates (Set 2)
‐9482.62
59
Table 6
Out-of-Sample Forecasting Performance on the Benchmark Portfolios The table reports the 1- and 12-month horizon predictive performance of three competing models: the single-state smooth model in which the risk prices are driven by macroeconomic factors, the two-state four-moment Markov switching model, and the three-state Markov switching mixture CAPM.
Average Pricing Error
Std. Dev. of Pricing Error
Root Mean Squared Error
Mean Absolute Error
Predictive R2
(%)
SMB ‐0.00566 2.9405 2.9405 2.1936 1.0755HML ‐0.00275 2.8754 2.8754 2.1313 1.2859MOMO 0.00621 3.8501 3.8501 2.6917 0.1313Market ‐0.00538 4.5019 4.5019 3.5598 0.3446SMB 0.00883 2.6151 2.6151 2.0990 1.3874HML 0.01687 3.1577 3.1577 2.3451 1.5036MOMO ‐0.06698 3.4442 3.4449 2.5866 0.8481Market 0.04378 4.4614 4.4617 3.9421 0.1833SMB 0.03777 2.4974 2.4977 1.6563 3.1642HML ‐0.07177 2.6399 2.6408 1.6929 4.3791MOMO ‐0.02077 3.4984 3.4984 2.1035 4.0546Market 0.14020 4.0747 4.0771 2.7775 3.2755
SMB ‐0.33778 2.9686 2.9878 2.1774 0.9275HML 0.17153 2.9318 2.9368 2.0711 1.1611MOMO 0.13743 3.9912 3.9936 2.7219 0.1175Market ‐0.26621 4.9058 4.9130 3.7294 0.2941SMB ‐0.85940 2.1334 2.3000 1.8701 1.5024HML ‐1.49462 2.6564 3.0480 2.5435 1.5470MOMO 0.47967 2.6992 2.7414 2.1196 0.9239Market ‐2.81160 4.5737 5.3688 3.8993 0.1872SMB 0.73636 1.9345 2.0699 2.0300 1.6074HML 1.12975 3.0826 3.2831 2.5721 2.4086MOMO ‐0.37987 2.3443 2.3749 2.2393 4.4602Market 1.17152 3.6709 3.8534 3.1547 0.9094
SMB 0.02486 3.5172 3.5173 2.9113 0.8982HML 0.02387 3.4362 3.4363 2.8238 0.8709MOMO 0.02512 4.6317 4.6318 3.5895 0.0293Market 0.04569 5.3320 5.3322 4.6575 0.9106SMB 0.03309 3.3208 3.3210 2.7019 1.3914HML 0.01526 3.0309 3.0310 2.4449 1.4603MOMO ‐0.06990 4.8916 4.8921 3.1575 0.9636Market 0.04021 4.6861 4.6862 4.4024 2.3639SMB 0.10494 2.8029 2.8048 2.0611 4.6791HML ‐0.17681 2.9716 2.9768 2.1375 4.2279MOMO ‐0.00237 3.8990 3.8990 2.6108 0.5962Market 0.24436 4.5264 4.5330 3.4655 2.0401
SMB ‐0.29965 3.5292 3.5419 2.9914 0.7930HML 0.23363 3.5096 3.5173 2.8972 0.7524MOMO 0.19072 4.7811 4.7849 3.7876 0.0260Market ‐0.18283 5.7397 5.7426 5.0554 0.7913SMB ‐0.58413 3.1485 3.2023 2.8302 1.2170HML ‐1.02235 3.8358 3.9698 2.8893 1.2782MOMO 0.34895 4.0579 4.0729 3.1820 0.8509Market ‐1.95166 3.8184 4.2883 5.5399 1.9725SMB 0.72266 2.8895 2.9785 2.6069 4.9447HML 1.13727 3.4490 3.6317 2.5441 7.2385MOMO ‐0.39749 3.6983 3.7196 2.9050 3.0578Market 1.27271 4.4686 4.6463 4.4074 3.6034
Single‐state SSMBCP Model
Two‐state Four‐Moment MS
CAPM
Two‐state Mixture CAPM
Forecast Horizon: 12 months
Two‐state Mixture CAPM
Single‐state SSMBCP Model
Unconstrained Models
Unconstrained Models
Two‐state Mixture CAPM
Forecast Horizon: 1 month
Constrain Set 2
Single‐state SSMBCP Model
Two‐state Four‐Moment MS
CAPM
Two‐state Four‐Moment MS
CAPM
Two‐state Mixture CAPM
Constrain Set 2
Single‐state SSMBCP Model
Two‐state Four‐Moment MS
CAPM
60
Figure 1
60-Month Rolling Window Estimates of OLS (Unconditional CAPM) Alphas, Betas, and of Co-Skewness, Co-Kurtosis, and Downside Beta
OLS Alphas
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etu
rn
SMB HML MOMO
OLS Betas
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etu
rn
SMB HML MOMO
Downside Beta
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etu
rn
SMB HML MOMO
Co-Skewness with Market Ptf. Returns
-2
-1.6
-1.2
-0.8
-0.4
0
0.4
0.8
1.2
1.6
2
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etur
n
SMB HML MOMO
Co-Kurtosis with Market Ptf. Returns
-7
-5
-3
-1
1
3
5
7
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bnor
mal
ret
urn
SMB HML MOMO
61
Figure 2
120-Month Rolling Window Estimates of OLS (Unconditional CAPM) Alphas, Betas, and of Co-Skewness, Co-Kurtosis, and Downside Beta
OLS Alphas
-0.8
-0.4
0
0.4
0.8
1.2
1.6
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etu
rn
SMB HML MOMO
OLS Betas
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etu
rn
SMB HML MOMO
Downside Beta
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etu
rn
SMB HML MOMO
Co-Skewness with Market Ptf. Returns
-2
-1.6
-1.2
-0.8
-0.4
0
0.4
0.8
1.2
1.6
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bn
orm
al r
etu
rn
SMB HML MOMO
Co-Kurtosis with Market Ptf. Returns
-7
-5
-3
-1
1
3
5
7
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
% m
onth
ly a
bnor
mal
ret
urn
SMB HML MOMO
62
Figure 3
Smoothed State Probabilities Estimated from Two-State Markov Switching Four-Moment CAPM Model
State 1 smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1927 1937 1947 1957 1967 1977 1987 1997 2007
State 1 (Unconstrained) State 1 (Constrained)
State 2 smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1927 1937 1947 1957 1967 1977 1987 1997 2007
State 2 (Unconstrained) State 2 (Constrained)
63
Figure 4
Smoothed State Probabilities Estimated from Three-State Markov Switching Mixture CAPM Model
CAPM State -- Smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1927 1937 1947 1957 1967 1977 1987 1997 2007
CAPM Regime (Unconstrained) CAPM Regime (Constrained)
Downside CAPM State -- Smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1927 1937 1947 1957 1967 1977 1987 1997 2007
Downside CAPM Reg. (Unconstrained) Downside CAPM Reg. (Constrained)
Four-Moment CAPM State -- Smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1927 1937 1947 1957 1967 1977 1987 1997 2007
4-mom. CAPM Reg. (Unconstrained) 4-mom. CAPM Reg. (Constrained)
64
Figure 5
Smoothed State Probabilities Estimated from Three-State Markov Switching Mixture CAPM Model: 1985-2008 Sub-Sample
CAPM State -- Smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007
Unconstrained Constrained
Downside CAPM State -- Smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007
Unconstrained Constrained
Four-Moment CAPM State -- Smoothed Probabilities
0
0.2
0.4
0.6
0.8
1
1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007
Unconstrained Constrained