Regime Shifts in Empirical Pricing Kernels: A …...2009/10/09 · Regime Shifts in Empirical...

Regime Shifts in Empirical Pricing Kernels: AMixture CAPM∗

Massimo Guidolin†

Manchester Business School and Federal Reserve Bank of St. Louis

August 2009

JEL code: G12, C32.

PRELIMINARY AND INCOMPLETE, PLEASE DO NOT QUOTE WITHOUT PERMISSION

Abstract

We develop a family of Markov switching empirical pricing kernels that mix and nest the standard

conditional CAPM, the downside (semi-variance) conditional CAPM, and richer, four-moment CAPMs

in which coskewness and cokurtosis risks are priced in addition to plain vanilla covariance risk. We

estimate both unconstrained and restricted versions of these kernels, where the restrictions are suggested

by primitive principles concerning preferences and the resulting properties of the intertemporal marginal

rate of substitution, such as non-satiation, global risk aversion, and decreasing absolute risk aversion

over the wealth domain. When a restricted version of the Markov switching pricing kernel that implies

a mixture of the three classical forms of CAPMs is considered, we obtain a pricing performance that is

largely superior to what is provided by a more standard, single-state pricing kernel similar to many popular

implementations of the conditional CAPM. As a result, the alphas characterizing the returns on size-

(SMB), value- (HML), and momentum-sorted portfolios are either small or lack statistical significance,

an indication that a mixture CAPM-based empirical pricing kernel can rationalize the pricing of the cross

section of US stocks in its typical dimensions.

Key words: Capital Asset Pricing Model, Mixture CAPM, Markov Switching, Empirical Pricing Ker-

nel, Asset Pricing Anomaly.

1. Introduction

Is there a fundamental pricing measure (henceforth called the pricing kernel) that can adequately price the

cross-section of US stock returns and at the same time preserve some degree of consistency with elementary

∗This project was supported by INQUIRE UK funding. I would like to thank Kevin Aretz, Mario Cerrato, John Cotter,

Carlo Favero, Marcelo Fernandez, Ana Galvao, Mike Johannes, Frank de Jong (a discussant), Robert Kollmann, Alex Kostakis,

Fulvio Ortu, Peter Pope, Mark Shackleton, Raf Wouters, and seminar participants at Bocconi University, ECARES Brussels,

Lancaster School of Management, University of Manchester (Economics), University of Glasgow, University College Dublin,

Queen Mary University of London, Tilburg University (Finance), Tinbergen Institute Erasmus University Rotterdam, and the

2008 Cititgroup Quant conference in Athens for helpful comments and suggestions. All errors remain our own.†Correspondence to: Prof. Massimo Guidolin, Manchester Business School, Accounting & Finance Division. Phone: +44-

(0)161-306-6406; Fax: +44-(0)161-275-4023. Address: University of Manchester Business School, MBS Crawford House, Booth

Street East, Manchester M13 9PL, United Kingdom. E-mail: [email protected].

economic postulates, such as that free lunches (arbitrage opportunities) are unlikely to be exploitable in low-

frequency (monthly) data and that investors are risk-averse and feature decreasing absolute risk aversion as

their wealth increases? By and large, so far the answer provided by most of the financial economics literature

has been negative. On the one hand, when researchers have postulated a stochastic process for the pricing

kernel which is consistent with micro-founded, rational expectations models, they have normally found that

the resulting kernel can hardly price the cross-section of US stock returns. For instance, it is well known

that simple pricing kernels derived from the process that dynamic general equilibrium models imply for

the intertemporal marginal rate of substitution in consumption, generally fail to generate market (excess)

returns with realistic properties, starting from their mean (the equity premium puzzle), volatility (the excess

volatility puzzle), as well as their predictability over alternative horizons (mean reversion puzzles). On the

other hand, when researchers have adopted an empirical approach to the problem, the resulting, estimated

pricing kernels have accurately priced many dimensions of the US cross-section, but — without assuming

rather complicated, often unnatural structures for the kernel (e.g., that the kernel should depend in specific

ways on aggregate labor income) — they have also displayed properties which either open up the door to

arbitrage opportunities or imply that markets be populated by risk-loving individuals (e.g., see Dittmar,

2002).

In this paper, we propose an alternative, empirical route to the problem of finding rational pricing

kernels that can price the cross-section of US stock returns in a number of dimensions used in the earlier

literature, such as the size of listed firms, their book-to-market multiples, or measures of past stock perfor-

mance (momentum). We show that if the parameters that characterize simple versions of the pricing kernel

are governed by a low-dimensional Markov chain which leads to identify different regimes with different

versions of the basic, textbook Capital Asset Pricing Model (CAPM), the resulting pricing and forecasting

performance is remarkable and the estimated pricing kernel may be restricted to be economically admissible,

i.e., consistent with fundamental economic postulates. The functional form of the pricing kernel is “simple”

in the sense that we only postulate specifications of the kernel that reduce to special cases of the classical

CAPM, such as the plain vanilla CAPM (i.e., in which beta merely depends on conditional covariance risk),

the downside CAPM (henceforth dCAPM, when also downside covariance risk is priced), and a four-moment

CAPM (henceforth 4MOM, when also coskewness and cokurtosis risks are priced). The pricing and fore-

casting performance we demonstrate in this paper are “remarkable” in the sense that they are competitive

with both current empirical benchmarks in the literature — the four-factor Fama and French (1993) model

augmented by a momentum factor as in Carhart (1997) — and with unrestricted, Markov switching CAPMs

in which higher-order (co-)moments are priced in a no-arbitrage framework. Because the empirical pricing

kernels (henceforth, EPKs) we estimate are based on the intuition that over time the markets may switch

among different versions of some form of CAPM as determined by a homogeneous but irreducible first-order

Markov chain, we define the class of pricing kernels described in our paper as mixture CAPMs.

The empirical work undertaken in this paper features three key ingredients: the mixture of alternative

CAPM-type models; the Markov switching (henceforth, MS) component that drives the mixture; the fact

that economic constraints on the coefficients and shape of the pricing kernel are imposed. We use this

Introduction to explain the motivation for each of these ingredients and to connect our efforts to the existing

literature. Our motivation for developing and estimating a rational asset pricing framework that mixes

2

different “strands” of CAPMs is that a growing literature has shown that rather intuitive modifications

of the classical, plain vanilla unconditional CAPM may lead to accurate — albeit, it must be recognized,

unstable — pricing performance. The dCAPM may include either a specific price of risk for the covariance

of portfolio returns with the market, conditioning on the latter falling below some threshold (most typically,

its mean) or feature asymmetric reactions to downside and upside markets separately.1 It was developed by

Bawa and Lindenberg (1977) and Harlow and Rao (1989). Chan (1988), De Bondt and Thaler (1986), and

Petkova and Zhang (2005) have used it to investigate the value premium but have obtained mixed results.

Recently, Post and Vliet (2006) have used this model — and applied the non-parametric stochastic dominance

tests proposed by Post (2003) — to show that downside risk helps to explain the high returns earned by small

caps, value stocks and recent winner stocks, i.e., the same benchmark stock portfolios describing the US cross

section that we use in this paper. Post and Vliet (2005) also use parametric tests to find indications that

conditional downside risk drives asset prices: their mean-semivariance CAPM outperforms the traditional

mean-variance CAPM, both in unconditional and conditional tests. The low (high) beta stocks involve

more (less) systematic downside risk than expected based on their regular betas. This pattern is especially

pronounced during bad states of the world, when the market risk premium is high. They conclude that

conditional downside risk completely explains average returns within the size deciles, it is not related to

distress risk but can only partially explain the momentum effect. Similarly, Ang, Chen, and Xing (2006)

claim that downside beta is a priced risk attribute because stocks that have high covariation with the market

when the market declines exhibit high average returns over the same period.

The 4MOM CAPM introduced by Kraus and Litzenberger (1976) and Harvey and Siddique (2000)

prices higher-order moments. Higher (conditional) moments capture the presence of asymmetries and/or

fat tails in the distribution of asset returns, but are not necessarily the same as upside and downside betas.2

There are several different approaches to include higher moments in CAPM-style frameworks, for example

Friend and Westerfield (1980), Sears and Wei (1985), and Barone-Adesi (1985). Dittmar (2002) finds that

both a quadratic and a cubic pricing kernel are admissible for the cross section of industry portfolios,

whereas the linear single-factor (CAPM) and linear multifactor (Fama-French) pricing kernels are not.

Guidolin and Timmermann (2008a) use a 4MOM international CAPM to model the time series dynamics

of several international equity indices and report that its adoption drastically changes the optimal asset

allocation implications of the classical, mean-variance based CAPM. Although the superior performance

of nonlinear pricing kernels vs. linear pricing kernels has been documented in the literature (e.g., see

1As early as Roy (1952), economists have recognized that investors care differently about downside losses than they care about

upside gains. Markowitz (1959) advocates using semivariance as a measure of risk, rather than variance, because semivariance

measures downside losses rather than upside gains. More recently, the behavioral framework of Kahneman and Tversky’s (1979)

loss aversion preferences and the axiomatic approach taken by Gul’s (1991) disappointment aversion preferences allow agents

to place greater weights on losses relative to gains in their utility functions.2It is important to recognize that downside covariance risk is different from coskewness risk because downside betas explicitly

condition for market downside movements in a nonlinear fashion, whereas the coskewness statistic does not explicitly emphasize

asymmetries across down- and up-markets, even in settings where coskewness may vary over time. Ang, Chen, and Xing (2006)

control for coskewness risk in assessing the premium for downside beta. Downside beta risk and coskewness risk are found

to be empirically different. The high returns to high downside beta stocks are robust to controlling for coskewness risk and

vice versa. Downside beta risk is strongest for stocks with low coskewness. As a result, in this paper we also experiment with

pricing kernels that admit a differentiation between downside covariance risk, and higher comoment risk (i.e., coskewness and

cokurtosis risk).

3

Bansal and Viswanathan, 1993, Chapman, 1997), the superiority of these kernels to a flexible multifactor

model, such as the Fama-French’s (1993) model, has not. This result is particularly interesting because the

nonlinear pricing kernel that we investigate is subject to economic restrictions that cannot affect a standard,

Fama-French type multifactor pricing kernel as many of its factors fail to depend on the wealth process.

Our motivation for modelling the time-variation in the EPK as driven by an unobservable Markov chain is

that the recent empirical asset pricing literature has given growing attention to the fact that the performance

(or lack thereof) of simple, CAPM-style models appears to be unstable over time. Equivalently, while it

has been long recognized that the CAPM and its simple variants easily generates large and statistically

significant “alphas” (i.e., abnormal returns caused by large components of realized excess returns that are

not explained by the risk factors featured by the selected framework), there is now an increasing awareness

that such alphas may be strongly time-varying and difficult to forecast. For instance, Post and Vliet

(2005) find that the role of downside risk is most pronounced in their first subsample (1931-1966) and in

their bad-state subsample (defined by periods of above-average dividend yield). In Post and Vliet’s more

recent subsample (1967-2002) the unconditional mean-variance and unconditional mean-semivariance (i.e.,

dCAPM) models show instead similar performances. Ang and Chen (2007) have stressed that standard

CAPM betas are strongly time-varying, to the point that OLS inferences on alphas and betas might be so

badly misspecified to become unusable to assess the fit of a conditional CAPM.3 While some of the recent

literature has predominantly insisted in finding out the sub-periods over which the performance of simple,

conditional (or even unconditional) CAPM may be considered “acceptable”, and has often offered some

economic intuition for why this may be the case, in this paper we elect to base our research design on the

longest possible sample period for estimation and forecasting purposes. A MS framework is employed to

capture the unstable time series dynamics of the priced risk factors — or better, to endogenously characterize

such dynamics as deriving from the presence of regime switching patterns in the cross section of US equity

returns. As a result, while other papers have argued that the period 1927-1962 contains a few unique events

that could adversely affect econometric estimation during the pre-1963 period (e.g., the Great Depression

and WWII) because these may identify structural breaks, in our empirical strategy it becomes important

to effectively use these events to identify turning points and regime switches in the way the cross-section is

priced.

Our reduced-form conditional asset pricing model that unifies the standard CAPM, a downside risk

framework, and a 4MOM CAPM falls within the class of conditional CAPM models initially developed by

Harvey (1989), Shanken (1990), Ferson and Harvey (1991, 1993, 1999), Cochrane (1996), and Jagannathan

and Wang (1996). Most of these studies use instrumental variables to model the time-variation in the betas

as a linear function of the instruments. Our risk factors (comparable to time-varying betas) and their

unit prices (comparable to conditional market risk premia) are also time-varying, but instead of relying on

3Ang and Chen (2007) prove that when betas vary over time and are correlated with time-varying market risk premia, OLS

alphas and betas provide inconsistent estimates of conditional alphas and conditional betas, respectively. They also show that

the magnitude of the inconsistency in the unconditional OLS alpha, relative to the true conditional alpha, cannot be determined

without direct estimates of the underlying time-varying conditional beta process. The common practice of employing rolling

OLS estimates of betas understates the variance of the true conditional betas. The limiting distribution of the OLS alpha is

also distorted from the standard asymptotic distribution that assumes constant betas. Consequently, a large unconditional OLS

alpha may not necessarily imply the rejection of a conditional CAPM.

4

instrumental variables, we parameterize the unit risk prices themselves as driven by an endogenous latent

process. This has the advantage of not relying on exogenous predictor variables to capture the time-variation

in the betas and avoids any potential omitted variable bias from misspecifying the set of predictor variables

(see Harvey, 2001, and Brandt and Kang, 2004). However, a standard set of predictors which summarize

and forecast business cycle conditions — here the dividend yield, the riskless term spread, and the default

spread from the US corporate bond market — is collected in a vector of instruments which are allowed to

affect the market risk premium as well as the definition of the current Markov state.

Finally, two additional (and somewhat related) choices that inform our research design deserve comment.

First, we perform the estimation of our competing EPKs (with and without MS and mixture CAPMs) on a

small but rather significant set of portfolios that are supposed to capture in a parsimonious way the main

features of the US cross section, i.e., we employ the longest available data series on value-weighted market

(excess) returns, and on long-short portfolios that capture size, value, and momentum anomalies.4 As for

size, it is clearly debatable whether one may still discuss of an asset pricing anomaly (i.e., the fact that on

average small stocks give higher returns than large stocks).5 Yet, it is size-sorted long-short portfolio that

in a sense should allow our MS-driven, mixture approach to express its power: Ang, Chen, and Xing (2006)

have recently documented a strongly time-varying size premium that in many sub-samples is incompatible

with the plain vanilla CAPM. It seems crucial that EPKs such as ours — that are built to offer significant

flexibility — be used to capture the dynamics in the size premium.

The value anomaly consists of the empirical regularity by which value (high book-to-market ratio) stocks

would give on average higher returns than growth (low book-to-market ratio) stocks. As stressed by Zhang

(2005), the existence of a value premium is puzzling not only because modern finance implies that expected

returns ought to purely reward systematic risk-taking, but also because this runs contrary to basic economics

wisdom that growth options hinge upon future economic conditions and must therefore be riskier than assets

in place, which characterize instead value firms. There is rich evidence in the literature of time variations

in the value anomaly. For instance, Ang and Chen (2007) report evidence that a conditional CAPM may

explain the value premium over the period 1926-1963. During the 1964-2004 period, value stocks have lower

betas than growth stocks — the reverse of what the CAPM needs to explain the value premium. As a result,

the CAPM fails the tests for 1963 to 2004, whether or not one allows for time-varying betas. During 1926

to 1963, however, value stocks have higher betas than growth stocks.

The momentum anomaly consists of the regularity by which past “winner” (high performance) stocks

4Multifactor models of asset prices have been successful in pricing the cross section of equities than have single factor models,

i.e., they are by construction consistent with the anomalies. Fama and French (1993) have suggested that the returns to the

portfolios SMB and HML represent hedge portfolios in the sense of Merton (1973). In later work (Fama and French, 1995,

1996), they suggest that the size and book-to-market factors may capture some systematic distress factor. Carhart (1997)

has extended Fama and French’s three-factor model to include a momentum factor. However, multifactor models provide the

researcher with considerable freedom since the models give little guidance for the choice of factors. In contrast, the pricing

kernel in this paper explicitly defines the relevant factor for pricing, the portfolio of aggregate wealth. Further, preference theory

imposes restrictions on the signs of the coefficients associated to each term in the pricing kernel.5Fama and French (2006) have recently confirmed earlier evidence (see Chan and Chen, 1988, Fama and French, 1996) that

the size premium in average returns is consistent with CAPM pricing. The market beta for SMB (see Section 4 for a definition)

for the period 1926-1963, 0.19, is close to that for the period 1963-2004, 0.21. In their CAPM regression for SMB for 1926-2004,

the intercept is 0.10% (t-stat is 0.92). This means that although there exists a precisely estimated size premium for 1926 to

2004, about half of it is absorbed by SMB’s market covariance beta.

5

would systematically outperform past “loser” (poorly performing) stocks, i.e., there would exist a substantial

degree of persistence in realized stock returns that is very hard to rationalize using standard asset pricing

frameworks. Momentum has revealed itself to be particularly difficult to explain away: for instance, Griffin,

Ji, and Martin (2003) fail to find a strong association between international business cycles and momentum

effects. Post and Vliet (2005) conclude that their conditional downside risk model completely explains

average returns within the size deciles but that their conditional framework fails to explain the momentum

effect, although downside risk and conditioning lead to substantial improvements in pricing accuracy.

The second choice that characterizes our research design is the extensive use of comparative out-of-sample

forecast performance measures as a way to validate the quality of our estimated EPK and to minimize the

damaging effects of potential over-fitting and data-snooping. Over-fitting is clearly an issue when large

parametric models are employed, which is one of the distinctive feature of MS models. Data-snooping may

pose critical problems in our research design because the very features of the EPK framework we build and

estimate are largely suggested by the existing literature and by extensive recognition of the data properties

that we undertake in Section 4. The use of predictions to assess the quality and usefulness of EPKs is

applied to out-of-sample experiments in two different ways. Recursive, pseudo out-of-sample (henceforth,

OOS) exercises are used to check whether recursively updated estimates of competing EPKs may accurately

price the same benchmark portfolios used in estimation 1- and 12-month ahead. These exercises have a

pseudo OOS nature in the sense that the recursively update specification of each EPK is the one decided

on the basis of the full sample of data, i.e., it suffers by construction of data snooping issues even though

it provides protection against the perils of over-fitting. Genuine OOS experiments are instead performed

by pricing portfolios different from the benchmark ones — here, industry-sorted and size and value double

sorted portfolios — used in estimation. Because none of this information has been used in setting up our

econometric models, such prediction experiments are likely to provide protection against both over-fitting

and data-snooping issues.6

Our main results are as follows. First, single-state models in which the standard macro-finance predictors

used in the literature affect the time variation of the unit prices of risk provide a poor performance. Even

though our smooth, single-state model allows already non-negligible degrees of flexibility (and nonlinearity)

in the EPK, the alphas for HML and MOMO are systematically statistically significant, with annualized

abnormal rates of return between -0.9 and 6.8 percent in the case of HML, and 1.3 and 17 percent per

year in the case of MOMO. Occasionally, even the SMBalpha turns positive and statistically significant,

an indication that a smooth macro-driven model may even generate a size anomaly, contrary to common

wisdom in the literature. Additionally, the evidence that the two risk factors in a simple single-state model

— covariance and downside covariance risk — may be priced in the cross-section is weak. Both in terms of

root-mean-squared pricing errors (RMSPE) and of (pseudo-) R2s the single-state model yields a mediocre

performance. For instance, all R2s are between 0 and 2.1 percent, and the performance tends to be worse

than what can be attained using a plain-vanilla unconditional CAPM. We obtain sensible improvements

6Our focus on the predictive performance of alternative EPKs is not completely new in the asset pricing literature. Ang, Chen,

and Xing (2006) have checked whether past downside betas predict future expected returns. They find that, for the majority

of their cross section, high past downside beta predicts high future returns over the next month. In a MS framework related to

this paper, Guidolin and Timmermann (2008b) have documented that their MS models accurately predict the dynamics of the

joint density of long-short portfolios similar to ours.

6

of the asset pricing implications of the estimated model under a 4MOM CAPM framework in which the

dynamics in conditional higher order moments is driven by a two-regime MS structure. The first regime is a

persistent bull state in which volatilities are low. The second regime is a less persistent bear regime in which

volatilities are high. The evidence of average abnormal returns across the two regimes disappears almost

completely when the first set of constraints is imposed, i.e., arbitrage opportunities are ruled out, the signs

of the risk premia are constrained to agree with the implications of decreasing absolute risk aversion, and the

estimated pricing kernel implies on average the mean gross short-term rate observed over our sample period.

Even though the bear regime SMB alpha remains rather large in economic terms (4.99%) and commands a

p-value of approximately 0.05, all the remaining alpha coefficients fail to be significant at standard levels.

Importantly, this evidence of non-zero alphas weakening when Markov regimes are modelled in unit risk

prices and the relevant conditional co-moments projected out of the resulting MS model is not a feature we

have assumed a priori; equivalently, the data may have revealed larger, not smaller Markov switching alphas.

When the full set of constraints is imposed, we only find marginal statistical significance for the MOMO’s

alpha in regime 1, while the SMB’s alpha for regime 2 greatly declines and now fails to be significant.

Even though the statistical evidence in favor of a MS structure is strong and the results concerning the

average abnormal returns in the two states are encouraging, some uncertainty remains as to which factors

are priced in the MS 4MOM CAPM. For instance, while in the bear regime only coskewness seems to be

priced, in the bear state there is evidence that all conditional co-moments are priced. Additionally, when

the two-regime 4MOM EPK is constrained to be compatible with standard properties of the intertemporal

rate of marginal substitution for a risk-averse investor, the improvement in pricing performance is uniformly

visible for only one portfolio out of four (the momentum-sorted one). We therefore proceed to estimate a

three-state model in which the nature of the regimes is not left to the data, because the three states are

pre-specified to differ in the way assets are priced, i.e., we further estimate a mixture CAPM in which at all

points in time one and only one asset pricing framework may be generating the observed asset returns. When

the complete set of restrictions are imposed on the EPK, we find that all the alphas stop being statistically

significant, including MOMO’s alpha in the CAPM regime. Interestingly, the alphas remain rather large

in absolute value in correspondence to the third, 4MOM CAPM regime (the alphas range between -2.1%

to 2.8% per month), but they command such high standard errors that the associated p-values are all very

high. Economically, this means that even though the modeled risk factors fail to lead the average abnormal

returns to zero in a mathematical sense, they do so in a statistical sense as the variation in the sample returns

associated to the third state is sufficiently large to drive the classical 95% confidence intervals around the

estimated alphas to include a zero abnormal return. This finding has key economic implications because in

the absence of a rational explanation, the conclusion of Lakonishok, Shleifer, and Vishny (1994) that the

asset pricing anomalies (in particular, value and momentum premia) would be caused by overraction-fueled

irrational misspricing will apply. On the contrary, the ability to isolate one EPK that prices the US cross-

section without generating large and statistically significant abnormal returns (alphas) is consistent with

rational explanations.

The estimated mixture CAPM generates estimates of the risk premia which are rather sensible, validating

the a priori identification of the three statistical regimes with distinct asset pricing states. Moreover, the

regimes isolated by standard MS filtering techniques appear to make economic sense: The US stock market

7

has spent a large proportion of the period 1927-2007 in a plain vanilla CAPM regime. Our estimates reveal

that when constraints are imposed, the CAPM regime has a duration of approximately 14 months and

characterizes 44.1% of our sample. Long spans of data are consistent with a CAPM state in which only

covariance risk is priced, such as most of the 1940s, 1950s and 1960s, the period 1991-1996, and more

recently most of the bull markets of the 2003-2006 period. It seems plausible that extended periods of time

be expression of simple frameworks in which only covariance risk is priced. Additionally, the CAPM regime

typically features high mean excess market returns and high returns all stock portfolios, accompanied by

moderate volatility. On the contrary, the dCAPM regime is scarcely persistent (its average duration is

approximately 7 months) but because of the structure of the estimated transition matrix, it also occurs

with a remarkable frequency, characterizing 39.4% of our sample. Typical periods in which US stock returns

appear to have been generated by the dCAPM are 1929-1930, 1935-1938, 1941-1945, the late 1960s and

late 1970s, 1998-1999, and more recently the 2002-2003 bear markets. Interestingly, from mid-2007, in

correspondence of a situation of financial turmoil, the US equity markets switch from the CAPM to the

dCAPM regime, with an unsettling similarity to the onset of the Great Depression in 1929. Because

many of these periods correspond to bear and volatile markets, the dCAPM regime generates a modest

annualized market risk premium of 3.1% with a volatility of 21.6%. Finally, the 4MOM asset pricing regime

occurs rather infrequently but — when this happens — the state is considerably persistent, with an average

duration of 9 months. However, only 16.6% of our sample is generated by the 4MOM state. The 4MOM

regime is completely characterized by the presence of high volatility. The pricing performance of the three-

state mixture CAPM is attractive: despite the restrictions that define the mixture model, the additional

flexibility offered by a three-state specification offers high payoffs that are especially visible in the absence

of restrictions: all the pseudo-R2 increase when moving from the two-state to the three-state models and

the same holds for the RMSPE which declines. In fact, the mixture CAPM exceeds the R2 levels typical of

a simple, unconditional CAPM and substantially cuts its RMSPE. When the complete set of restrictions is

imposed, a similar result obtains, with the pseudo-R2 now in the range 3.5-6.1%.

Three related paper that deserve discussion as they may help us clarify our contribution are Ang and

Chen (2007), Guidolin and Timmermann (2008a), and Petkova and Zhang (2005). Ang and Chen (2007)

propose and estimate a conditional CAPM with time-varying betas, time-varying market risk premia, and

stochastic systematic volatility. They directly take into account the time-variation of conditional betas in

estimating conditional alphas, rather than relying on incorrect OLS inference. In particular, they treat

betas as endogenous variables that vary slowly and continuously over time and directly estimate them

using Kalman filter techniques in a Bayesian set up. On the contrary our (frequentist) MS approach is

closer in spirit to previous estimates of time-varying betas by Campbell and Vuolteenaho (2004), Fama

and French (2006), and Lewellen and Nagel (2006), among others, who assume discrete changes in betas

across subsamples but constant betas within subsamples. However, as in Ang and Chen (2007), we capture

predictable time-variation in both the conditional mean and the conditional volatility of the market excess

return. We model time-varying market premia by using a latent state variable for the conditional mean of

the excess market return. Moreover, while Ang and Chen estimate separate models for each of the long-short

portfolios that capture the essence of the asset pricing anomalies, we estimate a joint model for all portfolios.

Petkova and Zhang (2005) find that a conditional, time-varying CAPM goes in the right direction when

8

it comes to rationally explain the value puzzle; in particular, value betas co-vary positively with the expected

market risk premium while growth betas display opposite behavior. As a result, the beta of HML positively

co-varies with the expected market risk premium. However, the magnitudes of these covariances are too

small to fully account for the return differentials between value and growth stocks. As a result, the estimated

alphas of HML from conditional market regressions mostly remain positive and significant. However, this

correlation is only estimated indirectly, through instrumental proxies for conditional betas and market risk

premia. On the opposite, in our paper we propose a family of “structural” (in the sense that the priced

conditional comoments are generated endogenously from the presence of MS) econometric models that

generate covariation between expected market risk premia (as well as market volatility and asymmetries)

and betas (including the 4MOM betas) which is therefore estimated in a completely consistent fashion.

This evokes the main features of the 4MOM international CAPM proposed by Guidolin and Timmermann

(2008a), who also use a MS framework and an EPK derived by approximation to a standard intertemporal

marginal rate of substitution to price a number of international equity portfolio indices. However, their

focus is on the optimal portfolio diversification implications of their framework and not on the attempt to

specify an asset pricing framework that “turns off” all abnormal returns. In fact, the size and dynamics of

the alphas in Guidolin and Timmermann (2008a) is compatible with the fact that important risk factors may

have been omitted. Finally, Guidolin and Timmermann do not explore the possibility that — by imposing

restrictions on a multi-state version of their simpler two-state model — their regimes may receive a clear,

mixture-like CAPM interpretation.

The paper is structured in the following way. Section 2 develops our EPKs exploiting an extended

(to include semi-variance components) Taylor expansion of the intertemporal marginal rate of substitution.

Section 3 turns the EPK models developed in Section 2 into an estimable econometric framework. The

notion of mixture CAPM is introduced and possible restrictions on the EPK deriving from simple economic

principles are discussed. Section 4 describes the data and explains in what sense the asset pricing “properties”

(including the related anomalies) of size-, value-, and momentum-sorted portfolios are unstable over time.

Section 5 reports estimates of the models introduced in Section 3 and shows that a mixture CAPM may

explain away the major asset pricing anomalies that have appeared in the literature. Section 6 concludes.

2. Empirical Pricing Kernel Models

Since the seminal work by Harrison and Kreps (1979), we know that, under some regularity conditions, a

random variable Mt+1 that represents a pricing kernel that prices all risky payoffs under the law of one

price and is nonnegative under the condition of no arbitrage can be found. Moreover, the assumption of the

existence of a representative agent allows (at least in static settings) the pricing kernel to be expressed as a

function of aggregate consumption or, equivalently, gross returns on the aggregate wealth portfolio (see e.g.,

Brown and Gibbons, 1985). Similarly to Dittmar (2002) and Guidolin and Timmermann (2008a), we assume

that the pricing kernel may be approximated through a time-varying, third-order Taylor series expansion of

the marginal utility of gross returns on the wealth portfolio (around a zero return on the wealth portfolio,

RWt+1, i.e., around the initial wealth level Wt):

Mt+1 =U0(Wt+1)

U 0(Wt)+

U00(Wt+1)

U 0(Wt)RWt+1 +

U000(Wt+1)

U 0(Wt)(RW

t+1)2 +

U0000(Wt+1)

U 0(Wt)(RW

t+1)3 + o((RW

t+1)4).

9

Under the additional assumption that at least the zeroth and first-order term of this expansion may corre-

spond to a different intertemporal marginal rate of substitution (as in Post and Vliet, 2005) depending on

whether the returns on the wealth portfolios exceed or not some time-varying threshold level RWB,t+1, we can

write the resulting, approximate pricing kernel as:

Mt+1 ' g0,t + g1,tRWt+1 + g01,tminRW

t+1, RWB,t+1+ g2,t

¡RWt+1

¢2+ g3,t

¡RWt+1

¢3, (1)

where gj,t = U(j+1)t /U

(1)t is the ratio of derivatives of the utility function (U (1) ≡ U 0 is the first derivative,

etc.) evaluated at current wealth. RWB,t is a threshold/benchmark level for gross returns on aggregate wealth

computed on the basis of information at time t, Ft, for instance RWB,t = E[RW

t+1|Ft], the conditional mean

return on the aggregate wealth portfolio. Of course, RWB,t = Rf

t (the conditional, gross riskless rate) and

RWB,t = 0 also represent two natural benchmarks.

Assuming positive marginal utility (U 0 > 0), strict risk aversion (U 00 < 0), decreasing absolute risk

aversion (U 000 > 0), and decreasing absolute prudence (U 0000 < 0, as in Kimball, 1993), it follows that g1 < 0,

g01 < 0, g2 > 0 and g3 < 0.7 Negative exponential utility satisfies such restrictions and the same applies

to constant relative risk aversion preferences. Caballe and Pomansky (1996) show that all HARA utility

functions display standard risk aversion properties (i.e., U 0 > 0, U 00 < 0, U 000 > 0, and U 0000 < 0), which is

consistent with a cubic pricing kernel.8 The minRWt+1, R

WB,t+1 term makes the pricing kernel a function of

whether the gross returns on the wealth portfolio exceed or not the benchmark level RWB,t+1, i.e.,

Mt+1 =

(g0 + g01R

WB,t+1 RW

t+1 > RWB,t+1

g0 RWt+1 ≤ RW

B,t+1

+

(g1R

Wt+1 RW

t+1 > RWB,t+1

(g1 + g01)RWt+1 RW

t+1 ≤ RWB,t+1

+ g2¡RWt+1

¢2+ g3

¡RWt+1

¢3.

(2)

This means that in both cases the pricing kernel is a cubic function of the gross returns on the wealth

portfolio, although with different coefficients according to whether RWt+1 ≤ RW

B,t+1 or RWt+1 > RW

B,t+1.9

Ang et al. (2006) stress that for the types of utility functions that generate terms like minRWt+1, R

WB,t

in the expression for the pricing kernel — such as Gul’s (1991) disappointment aversion (DA) function,

which generalizes power utility to the presence of endogenously determined “kinks” — in the representations

7Unless marginal utility of wealth is constant, in our Taylor expansion the coefficients gj,t will be time-varying. However, to

simplify the notation we drop the time index on the utility function.8Scott and Horvath (1980) have shown that a strictly risk-averse individual who always prefers more to less and consistently

(i.e. for all wealth levels) likes skewness will necessarily dislike kurtosis, so that U 0000 < 0 may also be derived as an implication

of U 0 > 0, U 00 < 0, U 000 > 0, when these conditions apply to all wealth levels.9In principle, the pricing kernel could also be generalized to include two additional terms such as min(RW

t+1)2, RW

2B,t+1 andmin(RW

t+1)3, RW

3B,t+1whereRW2B,t andR

W3B,t are thresholds that apply to squared and cubic values of gross returns on the wealth

portfolio. Clearly, this creates a highly flexible but also complex pricing kernel in which there are 8 (= 23) different “branches”

formed by all the possible combinations of the events RWt+1 > RW

B,t+1, RWt+1 ≤ RW

B,t+1, (RWt+1)

2 > RW2B,t+1, (R

Wt+1)

2 ≤ RW2B,t+1,

and (RWt+1)

3 > RW3B,t+1, (R

Wt+1)

3 ≤ RW3B,t+1. We have experimented with a version of our unconstrained asset pricing model

which included the term max(RWt+1)

2, RW2B,t+1, so that the pricing kernel has structure:

Mt+1 =

⎧⎪⎪⎪⎨⎪⎪⎪⎩g0,t+1 + (g1,t+1 + g01,t+1)R

Wt+1 + (g2,t+1 + g02,t+1) RW

t+12+ g3,t+1 RW

t+13

RWt+1 ≤ RW

B,t+1, (RWt+1)

2 ≤ RW2B,t+1

(g0,t+1+g01,t+1R

WB,t+1) + g1,t+1R

Wt+1 + (g2,t+1+g

02,t+1) RW

t+12+ g3,t+1 RW

t+13

RWt+1 > RW

B,t+1, (RWt+1)

2 ≤ RW2B,t+1

(g0,t+1+g02,t+1R

W2B,t+1) + (g1,t+1+g

01,t+1)R

Wt+1 + g2,t+1 RW

t+12+ g3,t+1 RW

t+13

RWt+1 ≤ RW

B,t+1, (RWt+1)

2 > RW2B,t+1

(g0,t+1+g01,t+1R

WB,t+1+g

02,t+1R

W2B,t+1)+g1,t+1R

Wt+1 + g2,t+1 RW

t+12+g3,t+1 RW

t+13

RWt+1 > RW

B,t+1, (RWt+1)

2 > RW2B,t+1

.

We find that the resulting asset pricing model is very hard to estimate and leads to a poor forecasting performance, probably

as a result of overfitting.

10

commonly employed in empirical finance, such as the downside beta CAPM a la Bawa and Lindenberg

(1977), terms like minRWt+1, R

WB,t, and higher-order Taylor expansions a la Dittmar (2002), can only be

approximations, because the underlying utility functions usually do not admit an explicit form and Taylor

expansions are simply approximations of non-smooth functions.

Assuming that a conditionally risk-free asset exists with gross return Rft , and imposing the standard

no-arbitrage condition,

E[Mt+1Rit+1|Ft] = 1 (3)

we get a four-moment asset pricing model with time-varying coefficients. Noting that from the definition of

a conditionally riskless asset we have E[Mt+1Rft |Ft] = Rf

t E[Mt+1|Ft] = 1, so that E[Mt+1|Ft] = 1/Rft and

E[Mt+1Rit+1|Ft]−Rf

t E[Mt+1|Ft] = Cov[Mt+1, Rit+1|Ft] +E[Mt+1|Ft]E[R

it+1|Ft]−Rf

t E[Mt+1|Ft] = 0,

so that

E[Rit+1|Ft]−Rf

t = −Rft Cov[Mt+1, R

it+1|Ft].

Plugging our assumed approximate expression for the pricing kernel into this expression for the conditional

risk premium on any asset or portfolio indexed by i = 1, ..., N , we obtain:

E[Rit+1|Ft]−Rf

t = −Rft g1Cov[R

it+1, R

Wt+1|Ft]−Rf

t g01Cov[R

it+1,minRW

t+1, RWB,t+1|Ft] +

−Rft g2Cov[R

it+1,

¡RWt+1

¢2 |Ft]−Rft g3Cov[R

it+1,

¡RWt+1

¢3 |Ft], (4)

where the coefficients gj (j = 1, 2, 3) and g01 are measurable with respect to time t information, Ft, coherently

with the fact that (4) determines the time t conditional risk premium. At this point, we notice that the

expression Cov[Rit+1,minRW

t+1, RWB,t+1|Ft] can be re-written as (letting R

Wt+1 ≡ E[minRW

t+1, RWB,t+1|Ft]):

Cov[Rit+1, IRW

t+1<RWB,t+1

RWt+1 + IRW

t+1≥RWB,t+1

RWB,t+1|Ft] =

ZRit+1

(Rit+1 −Et[R

it+1])×

×(Z

RWt+1<R

WB,t+1

(RWt+1 − RW

t+1)dF (RWt+1) +

ZRWt+1≥RW

B,t+1

(RWB,t+1 − RW

t+1)dF (RWt+1)

)dF (Ri

t+1)

=

ZRit+1

(Rit+1 −Et[R

it+1])

ZRWt+1<R

WB,t+1

(RWt+1 − RW

t+1)dF (RWt+1)

= Cov[Rit+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1],

since by constructionRRWt+1≥RW

B,t+1(RW

B,t+1 − RWt+1)dF (R

Wt+1) = (R

WB,t+1 − RW

t+1) PrRWt+1 ≥ RW

B,t+1|Ft and

(RWB,t+1 − RW

t+1) PrRWt+1 ≥ RW

B,t+1|FtZRit+1

(Rit+1 −Et[R

it+1])dF (R

Wt+1) = 0.

Substituting (??) in (4), we obtain:

E[Rit+1|Ft]−Rf

t = −Rft g1Cov[R

it+1, R

Wt+1|Ft]−Rf

t g01Cov[R

it+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] +

−Rft g2Cov[R

it+1,

¡RWt+1

¢2 |Ft]−Rft g3Cov[R

it+1,

¡RWt+1

¢3 |Ft]= λ2,tCov[R

it+1, R

Wt+1|Ft] + λ−2,tCov[R

it+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] +

+λ3,tCov[Rit+1,

¡RWt+1

¢2 |Ft] + λ4,tCov[Rit+1,

¡RWt+1

¢3 |Ft] (5)

11

where λj,t = −Rft gj−1 (j = 2, 3, 4) and λ

−2,t = −R

ft g01 so that λ2,t, λ

−2,t > 0, λ3,t < 0 and λ4,t > 0. Notice that

the risk premia carry an index which corresponds to the index of the gj parameters increased by one, because

these coefficients are evocative of the order of the priced risk factors, 2 for covariance risk, 3 for coskewness

risk, 4 for cokurtosis risk. Finally, using the definitions of CAPM beta, downside beta, coskewness, and

cokurtosis,10

βi,t ≡Cov[Ri

t+1, RWt+1|Ft]

V ar[RWt+1|Ft]

β−i,t ≡Cov[Ri

t+1, RWt+1|Ft, R

Wt+1 < RW

B,t+1]

V ar[RWt+1|Ft, RW

t+1 < RWB,t+1]

γi,t ≡Cov[Ri

t+1, (RWt+1)

2|Ft]Skew[RW

t+1|Ft]where Skew[RW

t+1|Ft] ≡ E[(RWt+1 −E[RW

t+1|Ft])3|Ft]

κi,t ≡Cov[Ri

t+1, (RWt+1)

3|Ft]Kurt[RW

t+1|Ft]where Kurt[RW

t+1|Ft] ≡ E[(RWt+1 −E[RW

t+1|Ft])4|Ft],

we obtain:

E[Rit+1|Ft]−Rf

t = λ2,tCov[Ri

t+1, RWt+1|Ft]

V ar[RWt+1|Ft]

V ar[RWt+1|Ft] + λ−2,t

Cov[Rit+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1]

V ar[RWt+1|Ft, RW

t+1 < RWB,t+1]

×

× V ar[RWt+1|Ft, R

Wt+1 < RW

B,t+1] + λ3,tCov[Ri

t+1, (RWt+1)

2|Ft]

Skew(RWt+1|Ft)

Skew[RWt+1|Ft]

+ λ4,tCov[Ri

t+1, (RWt+1)

3|Ft]Kurt[RW

t+1|Ft]Kurt[RW

t+1|Ft]

= βi,tV ar[RWt+1|Ft] + β

−i,tV ar[R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] + γi,tSkew[RWt+1|Ft] + κi,tKurt[RW

t+1|Ft],

where the modified CAPM “betas” for variance, skewness, and kurtosis are

βi,t ≡ λ2,tβi,t β−i,t ≡ λ−2,tβ

−i,t γi,t ≡ λ3,tγi,t κi,t ≡ λ4,tκi,t. (6)

The conditional expression obtained in (5) is a four-moment no-arbitrage asset pricing model which admits

the possibility that the CAPM beta may include a component β−i,t that measures the variance risk of an

asset/portfolio during bear markets only. Since Bawa and Lindberg (1977), β−i,t has been defined as a

downside beta and it measures the contribution of each asset or portfolio i to the variance of the wealth

process in downside markets, when RWt+1 < RW

B,t+1. If an asset tends to move downward in a declining

market more than it moves upward in a rising market, it is an unattractive asset to hold, because it tends

to have very low payoffs precisely when the wealth of investors is low. Investors who are sensitive to

downside losses, relative to upside gains, require a positive premium for holding assets that co-vary strongly

with the market when the market declines. Hence we can expect λ−2,t to be positive and in an economy

with agents placing greater emphasis on downside risk than upside gains, assets with high sensitivities to

downside market movements will have high average returns.11 Additionally, as in Harvey and Siddique

10In the following we assume V ar[RWt+1|Ft] > 0, Skew[RW

t+1|Ft] 6= 0, and Kurt[RWt+1|Ft] > 0.

11There are many ways in which one can construct micro-founded models in which downside risk matters in asset pricing.

First, downside risk is often featured by behavioral models. Shumway (1997) develops an equilibrium model based on loss-averse

investors. Barberis and Huang (2001) use a loss aversion utility function, combined with mental accounting, to construct a

cross-sectional equilibrium. Second, constraints and frictions in rational models where the constraint binds only in one direction

obtain the same effect, for example, binding short-sales constraints (e.g., Chen, Hong, and Stein, 2001) or wealth constraints

12

(2000), Dittmar (2002), and Guidolin and Timmermann (2008a), this asset pricing framework attaches a

positive price to coskewness with the wealth process — the average tendency of an asset to display high (low)

returns when the variance on the wealth process is high (low) — which gives a positive asymmetry to the

joint process for [Rit+1 R

Wt+1]

0, and a negative price to cokurtosis. Because kurtosis can be described as the

degree to which, for a given variance, a distribution is weighted toward its tails, it measures the possible

multi-modality of a distribution, or the probability mass in the tails of the distribution, given its variance.12

Therefore cokurtosis captures the average tendency of an asset to exhibit returns of the same sign as the

market, exactly when the wealth process draws returns from its extreme tails. Clearly, a risk-averse investor

attaches a positive price to (i.e., she demands a negative risk premium on) positive coskewness, because

obtaining high returns from an asset when the market is more volatile helps reducing the overall risk of a

portfolio. Moreover, a risk-averse investor also attaches a negative price to (i.e., she demands a positive risk

premium on) positive cokurtosis.

3. The Econometric Framework

A large body of evidence suggests that return moments and prices of risk are time varying, and a wide array

of studies have used this evidence as a basis for investigating pricing models that hold conditionally (e.g.,

Harvey, 1989, Ferson and Harvey, 1991, Guidolin and Timmermann 2008a,b). To allow for conditional time-

variations in the return process and the possibility of misspecification biases, we extend the four-moment,

downside CAPMs in (5) as follows. Even though (5) represents a sensible way to write the asset pricing

model, empirical estimation is facilitated (see Guidolin and Timmermann, 2008a, for details) by re-writing

(5) as:

E[Rit+1|Ft]−Rf

t = λ2Cov[Rit+1, R

Wt+1|Ft] + λ−2 Cov[R

it+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] +

+λ3Cov[Rit+1, (R

Wt+1)

2|Ft] + λ4Cov[Rit+1, (R

Wt+1)

3|Ft]. (7)

This model no longer expresses conditional risk premia on a portfolio as a sum of estimable asset/portfolio-

specific beta-style loadings (βi, β−i , γi, κi) on risk factors multiplied by the prices of such risks. On the

contrary, (7) turns the risk prices into estimable parameters (λ2, λ−2 , λ3, λ4) and measures instead the

portfolio-specific quantities of risk (expressed in the forms of conditional covariances, Cov[Rit+1, R

Wt+1|Ft],

(e.g., Kyle and Xiong, 2001). Third, fully rational models exist. For instance, Ang, Chen, and Xing (2006) work with Gul’s

(1991) disappointment aversion framework in which disappointment utility is implicitly defined by the preference functional:

U(μW ) =1

K

μW

−∞U(W )dF (W ) +A

+∞

μW

U(W )dF (W )

where U(W ) is the felicity function over end-of-period wealth W , which they choose to be power utility. The parameter

0 < A ≤ 1 is the coefficient of disappointment aversion, F (W ) is the cumulative distribution function for wealth, μW is

the certainty equivalent (the certain level of wealth that generates the same utility as the portfolio allocation determining

W ), and K ≡ Pr(W ≤ μW ) + APr(W > μW ). Outcomes above (below) the certainty equivalent μW are termed “elating”

(“disappointing”) outcomes. If 0 < A < 1, then the utility function down-weights elating outcomes relative to disappointing

outcomes. If A = 1, disappointment utility reduces to a special case of standard CRRA utility; while standard CRRA utility

produces aversion to downside risk, the order of magnitude of the downside risk premium is usually negligible because CRRA

preferences are locally mean-variance.12Kurtosis may be distinguished from variance, which measures the dispersion of observations from the mean, in that it

captures the probability of outcomes that are highly divergent from the mean, that is, extreme outcomes.

13

Cov[Rit+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1], Cov[Rit+1, (R

Wt+1)

2|Ft], and Cov[Rit+1, (R

Wt+1)

3|Ft]) which can be com-

puted in a variety of ways (see below for details). Notice that when (5) is expressed in this way, (??) and

(7) become completely consistent because it is obvious that if (7) is applied to the wealth process, then

Cov[RWt+1, R

Wt+1|Ft] = V ar[RW

t+1|Ft], Cov[Rit+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] = V ar[RWt+1|Ft, RW

t+1 < RWB,t+1],

Cov[Rit+1, (R

Wt+1)

2|Ft] = Skew[RWt+1|Ft], and Cov[Ri

t+1, (RWt+1)

3|Ft] = Kurt[RWt+1|Ft]. This shows that

βW = λ2, β−W = λ−2 , γW = λ3, κW = λ4 because the same model must price all possible portfolios,

including the wealth process itself.

Third, to use a flexible representation without imposing too much structure, the price of risk associated

with these moments is allowed to depend on a latent state variable, St+1, that is assumed to follow a

Markov process but is otherwise not restricted. In turn, this state-dependence carries over to the price of

the risk factors appearing in the equations for returns on the individual portfolios, i.e., λ2,St+1 (covariance

risk), λ−2,St+1 (downside covariance risk), λ3,St+1 (coskewness risk), and λ4,St+1 (cokurtosis risk). Finally,

consistent with empirical evidence in the literature (see e.g., Campbell, Chan, and Viceira, 2003) we allow

for predictability of returns on the wealth portfolio through aM ×1 vector of instruments, zt, also assumedto follow a Markov switching first-order autoregressive process. For instance, similarly to much asset pricing

literature that has followed Fama and French (1989), zt may include variables such as the dividend yield,

the term spread, the default spread, and short-term interest rates. Interestingly, the EPK in (1) implies

that the mixture CAPM holds if and only if the “loadings” of the variables in zt onto market excess returns

are zeros (i.e., if and only if the null hypothesis of cWS = 0 in all regimes), which is a testable restriction.

Defining excess returns on any i−th portfolio as xit+1 ≡ Rit+1 −Rf

t (i = 1, ...,N, where N is the number of

test portfolios to be employed) and on the wealth portfolio as xWt+1 ≡ RWt+1−R

ft , the econometric model can

be summarized as:

xit+1 = αiSt+1 + λ2,St+1Cov[Rit+1, R

Wt+1|Ft] + λ−2,St+1Cov[R

it+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] +

+λ3,St+1Cov[Rit+1, (R

Wt+1)

2|Ft] + λ4,St+1Cov[Rit+1, (R

Wt+1)

3|Ft] + ηit+1

xWt+1 = αWSt+1 + λ2,St+1V ar[RWt+1|Ft] + λ−2,St+1V ar[R

Wt+1|Ft, xWt+1 < xWB,t] + λ3,St+1Skew[R

Wt+1|Ft] +

+λ4,St+1Kurt[RWt+1|Ft] + cWSt+1zt + ηWt+1

zt+1 = μSt+1 +BSt+1zt + ηZt+1. (8)

Consistent with the restrictions implied by our downside, four-moment CAPM in (??)-(7), the risk pre-

mia λj,St+1 (j = 2, 3, 4) and λ−2,St+1 are common across the individual portfolios and the wealth portfolio.

However, we allow for asset-specific intercepts, αiSt+1 , that capture other types of misspecification; further-

more, the presence of time-dependence in the alphas may be useful in what follows to propose a simple

test of whether flexible mixture asset pricing models combining the traditional CAPM (λ2,St+1 > 0), the

downside/partial-moment CAPM (λ−2,St+1 > 0), and the four-moment CAPM (λ3,St+1 < 0 and λ4,St+1 > 0)

may provide a pricing kernel-based explanation for size, value, and momentum anomalies. The innovations

ηt+1 ≡ [η1t+1...ηNt+1 ηWt+1 (ηZt+1)0]0 ∼ N(0,Ωst+1) can display state-dependent covariance matrices, while the

predictor variables, zt+1, follow a first order autoregressive process with state-dependent vector autoregres-

sive coefficients, BSt+1 , as in Guidolin and Timmermann (2006). This is consistent with the high degree of

persistence commonly found in popular predictor variables, such as the dividend yield and short-term inter-

est rates. Crucially, although (8) as an asset pricing framework is a model for conditional means only, and

14

as such it imposes no restrictions on the properties of the covariance matrix, in our econometric framework

the assumption of ηt+1 ∼ N(0,Ωst+1) is important because it may contribute to the endogenous genera-

tion and/or magnification of time-varying kurtosis and cokurtosis. As a result, our assumption of Markov

switching variances and covariances also has key asset pricing implications.

To “close” the econometric specification of the model, we assume that the Markov state variable, St+1,

follows a K-state (homogeneous, irreducible and ergodic) first-order process with constant transition prob-

ability matrix, P:13

P[i, j] = Pr(St+1 = j|St = i) = pij , i, j = 1, ..,K. (9)

Therefore (8) can be interpreted as a time-varying version of the multi-beta latent variable model of Ferson

(1990) where both the risk premia and the amount of risk depend on a latent first-order Markov state

variable.

There are a number of advantages to modelling equity returns in this way. At time t, conditional

on knowing the state next period, St+1, the distribution of portfolio and market returns is multivariate

Gaussian. However, since future states are not known in advance, the time t return distribution is effectively

a mixture of normals with weights reflecting the current state probabilities.14 Such mixtures of normals

provide a flexible representation that can be used to approximate many distributions (e.g., see Harvey and

Zhou, 1993, and Guidolin and Timmermann, 2008b). They can accommodate mild serial correlation in

returns — documented for market returns in a number of papers (e.g., see Campbell et al., 2003) — and

volatility clustering since they allow the first and second moments to vary as a function of the underlying

state probabilities (e.g., see Guidolin and Timmermann, 2006). Finally, multivariate regime switching

models allow return correlations across assets to vary with the underlying regime, consistent with the recent

evidence of asymmetric and time-varying correlations in US equity returns in Ang and Chen (2002) and in

size- and value-sorted portfolios in Guidolin and Timmermann (2008b). Finally, it must be stressed that

our asset pricing model depends on moments of (excess) returns on the market portfolio in addition to

the covariances, coskewness and cokurtosis between portfolio returns and the wealth portfolio. Estimating

the (co-)skewness and (co-)kurtosis of asset returns is difficult (see Harvey and Siddique (2000)). However,

our mixture MS model allows us to obtain precise conditional estimates in a flexible manner as it captures

coskewness and cokurtosis as a function of the mean, variance and persistence parameters of the underlying

Markov states.

13Of course, at the price of considerable complication in the estimation methods, the assumption of a time-homogeneous

Markov chain (transition probability matrix) driving St+1 might be removed. However, for our purposes the model in (8),

allowing time-varying premia as well as risk factors (i.e., conditional covariances, co-skewness, and co-kurtosis) appears to be

flexible enough to the point that additional degrees of flexibility would simply lead to a substantial risk of over-fitting.14Even in the case in which one regime receives a unit (filtered) probability at time t, unless the transition matrix P is

degenerate, the time t + T (T ≥ 1) state will remain uncertain. For instance, assuming K = 2 and calling πt the time t

probability of state 1, the predicted probability of state 1 at time t+1 is πtp11 + (1− πt)p21, where pij ≡ PrSt = j|St−1 = i,a generic element of the transition matrix P. Even though πt = 1, as long as p11 < 1, we have that the predicted probability

of state 1 is p11 < 1. p11 < 1 means that the Markov chain fails to have an absorbing state, which we have assumed.

15

3.1. Mixture Pricing Kernel Models

Amixture CAPM is a time-varying combination of different CAPM-type models — defined as models in which

the pricing kernel only depends on wealth portfolio returns and functions thereof — where one and only one

specific CAPM applies at each point in time. As already discussed in the Introduction, the basic intuition is

that both because preferences may change over time and because the stochastic process of asset returns may

undergo periods of structural instability that change their essential features, it is possible or even plausible

that over different intervals of time, the same asset or portfolio may be priced using a different rational asset

pricing framework. A mixture CAPM is a special case of the general Markov switching framework in (8)

for which a specific structure (that can be interpreted as a set of zero restrictions placed on the risk premia

coefficients) is imposed on the Markov chain:

K = 3 and

⎧⎪⎨⎪⎩λ−2,St+1 = λ3,St+1 = λ4,St+1 = 0 if St+1 = 1

λ3,St+1 = λ4,St+1 = 0 if St+1 = 2

λ−2,St = 0 if St+1 = 3

.

Clearly under the first regime, this restriction delivers a simple, standard CAPM but under the remaining

two regimes, it does not. As a result, each of the three assumed regimes corresponds to a specific asset

pricing model according to the mapping (for i = 1, 2, ...,N):15⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎨⎪⎩xit+1 = αi1 + λ2,1Covt[R

it+1, R

Wt+1] + ηit+1

xWt+1 = αW1 + λ2,1V art[RWt+1] + c

W1 zt + ηWt+1

zt+1 = μ1 +B1zt + ηZt+1 ηt+1 ∼ N(0,Ω1)

if St+1 = 1

⎧⎪⎨⎪⎩xit+1 = αi2 + λ2,2Covt[R

it+1, R

Wt+1] + λ−2,2Covt[R

it+1, R

Wt+1|RW

t+1 < RWB,t+1] + ηit+1

xWt+1 = αW2 + λ2,2V art[RWt+1] + λ−2,2V art[R

Wt+1|RW

t+1 < RWB,t+1] + c

W2 zt + ηWt+1

zt+1 = μ2 +B2zt + ηZt+1 ηt+1 ∼ N(0,Ω2)

if St+1 = 2

⎧⎪⎨⎪⎩xit+1 = αi3+λ2,3Covt[R

it+1, R

Wt+1]+λ3,3Covt[R

it+1, (R

Wt+1)

2]+λ4,3Covt[Rit+1, (R

Wt+1)

3]+ηit+1xWt+1 = αW3 + λ2,3V art[R

Wt+1] + λ3,3Skewt[R

Wt+1] + λ4,3Kurtt[R

Wt+1] + c

W3 zt + ηWt+1

zt+1 = μ3 +B3zt + ηZt+1 ηt+1 ∼ N(0,Ω3)

if St+1 = 3

.

(10)

To save space, the moments that condition on the information set Ft are simply written using a time t pedix,

e.g., Covt[Rit+1, R

Wt+1] ≡ Cov[Ri

t+1, RWt+1|Ft]. In qualitative terms, (10) is very different from (8): while in

the latter, in principle in each of the K regimes all the risk factors are priced, although potentially with

different risk premia as a function of the state St+1, in a mixture CAPM specific sub-sets of risk factors

enter the asset pricing equations in each of the three possible regimes, where K = 3 is dictated not from

the empirical features of the data but from the fact that three alternative CAPM-style pricing frameworks

are being mixed. From an economic viewpoint, (10) means that over time, markets switch among a limited

number of alternative asset pricing frameworks to price assets; the only risk factor which is commonly priced

15We also consider the case in which state 2 is a “pure” dCAPM regime, i.e., if St+1 = 2 then⎧⎪⎨⎪⎩xit+1 = αi2 + λ−2,2Covt[R

it+1, R

Wt+1|RW

t+1 < RWB,t] + ηit+1

xWt+1 = αW2 + λ−2,2V art[RWt+1|RW

t+1 < RWB,t] + c

W2 zt + ηWt+1

zt+1 = μ2 +C2zt + ηZt+1 ηt+1 ∼ N(0,Ω2)

.

However the pricing performance of such a model turns out to be similar to the model in the main text.

16

across all frameworks is covariance risk, although the related unit risk premium may differ across regimes,

i.e., λ2,1, λ2,2, and λ2,3 do not have to be identical. In quantitative terms, (10) is no different from (8); in

fact, if one were unable to reject the hypothesis that λ−2,1 = λ3,1 = λ4,1 = λ3,2 = λ4,2 = λ−2,3 = 0 in (8), it is

clear that (10) and (8) would be identical.

3.2. Other Benchmarks

It is important to recognize that it is a specific assumption on the most adequate empirical framework

that has taken us from (??) to (7), where the prices of risk are time-varying and follow a K-state Markov

switching specification with fixed transition probabilities, (8)-(9). In fact, in the literature we can find three

recent examples of empirical papers that have dealt with EPK models similar in spirit to (1) and that have

proposed empirical strategies that parametrize time-varying risk premia and risk exposure in a different way.

These yield two natural benchmarks for our empirical efforts.

Ang and Chen (2007) propose a fully specified conditional CAPM which can be written as:

xit+1 = αi + βi,t+1E[xWt+1|Ft] + σi i

t+1 i = 1, ..., N

xWt+1 = μt+1 +√υt+1

Wt+1

βi,t+1 = b0,i + b1,iβi,t + σβiβit+1

μt+1 = m0 +m1μt + σμ μt+1

lnυt+1 = v0 + v1 ln υt +υt+1 (11)

where [ it+1Wt+1

βit+1

μt+1

υt+1]

0 ∼ IID N(0,Ψ), and Ψ admits the existence of simultaneous correlations

between μt+1, and

υt+1 (capturing a leverage effect as in Brandt and Kang, 2004) as well as between

βit+1

and μt+1. All other shocks are orthogonal to each other. Although formally (11) does not nest (8)-(9) and

it also not nested in it, there are clear connections. On the one hand, (11) obtains when only covariance

risk is priced and in a symmetric fashion, i.e., λ−2,t = λ3,t = λ4,t = 0, while the expression for xWt+1 rescinds

any connections between market variance and risk premia. This makes (11) considerably more parsimonious

than (8)-(9). On the other hand, while (8)-(9) simply allows variances and covariances of the shocks in ηt+1

to be a function of a latent Markov state variable, (11) incorporates a truly (log-) stochastic volatility model

for excess market returns, which is clearly richer and more sophisticated.16 Additionally, (11) implies that

estimation has to be uniquely based on portfolio returns, while (8)-(9) allows us to use the predictors in zt to

affect the market risk premium as well as the definition of the current Markov state, which is rather general,

may allow us a further misspecification test of the mixture CAPM, and it is useful to try and connect the

Markov state variable, St+1, to business cycle conditions.

One additional difference between (11) and (8)-(9) obviously concerns the assumed dynamics for the

quantity of risk, βi,t. In (11) βi,t follows a continuous Gaussian AR(1) process; in our model, the quantity

16However, while our MS model may roughly capture the evidence of time-varying idiosyncratic volatility, the Gaussian AR(1)

time-varying beta model with fixed σi does not. Campbell et al. (2001) show that the idiosyncratic volatility has noticeably

trended up for individual stocks over the 1990’s. Notice however that incorporating continuous, time varying idiosyncratic

volatility would introduce a difficult identification problem between time varying risk factors and prices and idiosyncratic risk.

Therefore it seems that even a rough MS approach may be sufficient to capture the most important empirical features of the

data.

17

of risk and the exposure of each asset to it — via the conditional moments Cov[Rit+1, (R

Wt+1)

j |Ft] (j = 1, 2, 3)

and Cov[Rit+1, R

Wt+1|Ft, RW

t+1 < RWB,t+1] — follow a first-order, discrete Markov process. Clearly, an AR(1)

process is ideal to capture slow, highly persistent variation, while a Markov process impresses more abrupt,

sudden changes. Ang and Chen (2007) admit to a strong prior that their conditional betas ought to vary

slowly over time. However, Ang and Chen also expect that the conditional shocks hitting the betas (βit+1)

may be quite variable, so σβi could be potentially large. Additionally, notice that while the risk premia

λj,St+1 and λ−2,St+1 (j = 2, 3, 4) may be only take a finite number K of possible values, the risk quantities

Cov[Rit+1, (R

Wt+1)

j |Ft] (j = 1, 2, 3) and Cov[Rit+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] change as a function of all the

information collected in Ft — in particular, the current set of perceived state probabilities — so that if the

Markov states are persistent, then the risk factors in (8)-(9) may yield levels of persistence similar to those

directly modeled by Ang and Chen (2007) in (11). We therefore adopt (11) as a first, natural benchmark.

Post and Vliet (2005) resort to a much simpler framework (to side-step the task of estimating from

portfolio returns data the dynamics of the pricing kernel) but — similarly to our paper — they directly posit

that the pricing kernel has a simple structure given by Mt+1 = g0,t + g01,tminRWt+1, 1 in which

g0,t = b00 + b01zt

g01,t = b10 + b11zt,

where zt is identified with the dividend yield (but they show that this choice is relatively unimportant).

Clearly, their framework corresponds to a case of (1) in which g1 = 0 (so that only downside covariance risk

matters), g2 = g3 = 0 (hence λ3,t = λ4,t = 0), and RWB,t+1 = 1 so that the downside event simply corresponds

to negative net return on the wealth portfolio. Post and Vliet’s framework can be seen as an application of the

approach first proposed by Dumas and Solnik (1995) and Cochrane (1996) who have treated the time-varying

coefficients in representations of the pricing kernel as simple linear functions of time t information variables,

gj,t = b0jzt, (j = 1, 2, 3) where the Z × 1 vector zt collects standard predictors/instrumental variables used

in the empirical finance literature. Also Dittmar (2002), adopts the approach of parameterizing the pricing

kernel as a function of a set of exogenous instruments with predictive power and in an application to a

higher-order CAPM in fact imposes sign restrictions by setting

gj,t = (−1)j(b0jzt)2.

Of course, this implies λj,t = −(−1)j−1(b0jzt)2Rft and restrictions on the risk premia of sign opposite to the

gj,ts.

In our paper we employ a second benchmark econometric framework similar to Dittmar (2002) and Post

and Vliet (2005) in which:

g1,t = b01zt, g01,t = b

01zt,

(therefore λ2,t+1 = −Rft (g1 + b

01Bzt), λ

−2,t+1 = −R

ft (g

01 + b

01Bzt), where B ≡ [ι3 B] and zt ≡ [1 z0t]

0),

g2,t = g3,t = 0, and there is one single state:

xit+1 = αi + λ2,t+1Cov[Rit+1, R

Wt+1|Ft] + λ−2,t+1Cov[R

it+1, R

Wt+1|Ft, R

Wt+1 < RW

B,t+1] + ηit+1

xWt+1 = αW + λ2,t+1V ar[RWt+1|Ft] + λ−2,t+1V ar[x

Wt+1|Ft, RW

t+1 < RWB,t+1] + c

Wzt + ηWt+1

zt+1 = μ+Bzt + ηZt+1. (12)

18

Here ηt+1 ∼ N(0,Ω) and zt includes the same variables used in our MS model, i.e., the dividend yield, the

term spread, and the default spread. Notice that the fact that zt forecasts excess returns on the “market”

portfolio, implies that the conditional variance and partial variances as well as the covariance and partial

covariances in (12) will be time-varying.17 In the following, we refer to (12) as the single-state model in which

risk premia are driven by standard predictors of business cycle conditions, compactly SSBC (single-state

business cycle).

3.3. Economically Admissible Pricing Kernels

Similarly to previous papers (see e.g., Dittmar, 2002, and Post and Vliet, 2005) we ask whether — or better,

under what additional restrictions — can (1) represent an admissible pricing kernel, where “admissible” means

consistent with basic postulates on individual behavior such as non-satiation (i.e., positive marginal utility),

weak risk-aversion (non-increasing marginal utility), and non-increasing absolute risk aversion (convex or

linear marginal utility). The logic is that in all equilibrium asset pricing models and in settings with standard

preferences, the pricing kernelMt+1 ought to correspond (or be proportional) to the intertemporal marginal

rate of substitution (IMRS) in consumption (see Hansen and Jagannathan, 1991), i.e.,

Mt+1 ∝U 0(Ct+1)

U 0(Ct).

Because U 0t(Ct) ∈ Ft, this implies that the basic properties of Mt+1 conditional on Ft ought to follow from

those characterizing U 0(Ct+1) while, similarly to Dittmar (2002), (1) can be read as a Taylor series expansion

of the IMRS. Therefore non-satiation is sufficient for Mt+1 > 0 everywhere; weak risk-aversion is sufficient

forMt+1 to be non-increasing; non-increasing absolute risk aversion implies either a linear or a convexMt+1.

At this point notice that because RWt+1 is a gross return and therefore R

Wt+1 > 0, the signs of Mt+1 and of

its derivatives,

M 0t+1 =

(g1,t + g01,t + g2,tR

Wt+1 + g3,t

¡RWt+1

¢2RWt+1 < RW

B,t+1

g1,t + g2,tRWt+1 + g3,t

¡RWt+1

¢2RWt+1 ≥ RW

B,t+1

M00t+1 = g2,t + g3,tR

Wt+1

only depend on the values of the coefficients, recalling that g1,t < 0, g01,t < 0, g2,t > 0 and g3,t < 0.

Although these values are in principle unknown, exploiting the fact that λj,t = −Rft gj−1,t (j = 2, 3, 4)

and λ−2,t = −Rft g01,t, they can easily be recovered ∀t in correspondence to the estimated values for the risk

premia, i.e., gj−1,t = −λj,t/Rft and g

01,t = −λ

−2,t/R

ft . Of course, in the absence of adequate restrictions, these

estimated values of the time-varying coefficients characterizing the EPK may imply that time periods in

a subset T exist such that if t ∈ T , then it is possible that Mt+1 < 0 (which opens the door to arbitrage

opportunities caused by negative state prices), or thatM 0t+1 > 0 (which implies risk-loving) or thatM

00t+1 < 0

(which implies increasing absolute risk aversion). We therefore estimate (8)-(9) as well as all other models

described in Section 3 under three alternative sets of restrictions:

1. We initially estimate our models without imposing any economic structure. This delivers uncon-

strained estimates for the λj,t+1s (j = 2, 3, 4) and λ−2,t+1 and therefore for gj,t (j = 1, 2, 3) and g01,t.

17When cW = 0 it is clear that variances and covariances will all be constant and simply derive from the off-diagonal elements

of Ω.

19

These estimated values of the time-varying coefficients characterizing the pricing kernel may imply

the existence of time periods when Mt+1 < 0, or M 0t+1 > 0, or M

00t+1 < 0. Clearly, the first two

violations are particularly significant: when arbitrage opportunities exist, then the standard port-

folio/consumption optimization problem that leads to the first-order, Euler condition in (3) fails to

characterize the optimum, which implies that the asset pricing model we have derived is meaningless;

if the representative agent is not globally risk-averse, then the Euler condition in (3) characterizes

the optimum for the investor, but it is only a necessary condition (i.e., it may represent a minimum

and not a maximum); finally, the possibility that an investor has a positive, decreasing, but concave

pricing kernel contradicts much empirical and experimental evidence on investors displaying constant

or decreasing absolute risk aversion.

2. We then estimate all of our models imposing a rather weak set of economic restrictions: (i) that

Mt+1 > 0 at all times in the exercise; (ii) that at all times

E[Mt+1|Ft] =1

Rft

,

as implied by (3) for a conditional riskless asset; (iii) that g1,t < 0, g01,t < 0, g2,t > 0 and g3,t < 0 ∀twhich, because λj,t=-R

ft gj−1,t (j = 2, 3, 4) and λ−2,t = −R

ft g01,t, map into λ2,t¿0,λ

−2,t¿0,λ3,t¡0,λ4,t¿0∀t.

Dittmar (2002) calls the set of conditions in (iii) local restrictions implied by preference theory. Also

notice that the condition under (ii) is essentially void of implications if (i) is not simultaneously

imposed, as one may otherwise simply “use” the time-varying constant g0,t to set the conditional

expectation of the kernel to equate the inverse of the conditionally riskless interest rate. However,

when g0,t is also jointly restricted by the condition Mt+1 > 0, (ii) may have a chance “to bite” on the

data and influence the estimation outcome.18

3. Finally, we impose the full economic structure derived/emphasized in this sub-section, i.e.: (i)Mt+1 >

0 ∀t; (ii) E[Mt+1|Ft] = 1/Rft ∀t; (iii) g1t < 0, g01t < 0, g2t > 0 and g3t < 0 ∀t which, because λj,t=-

Rft gj−1,t (j = 2, 3, 4) and λ−2,t = −R

ft g01,t, map into λ2,t>0,λ

−2,t>0,λ3,t<0,λ4,t>0∀t; (iv) M 0

t+1 ≤ 0 ∀t;(v) M

00t+1 ≥ 0 ∀t. Dittmar (2002) calls restrictions such as (i), (iv), and (v), global restrictions implied

by preference theory.

3.4. Estimation Strategy

Our estimation strategy is similar to Guidolin and Timmermann (2008a,b). Here we briefly describe a few

of the technical issues involved with reference to the two-state MS 4MOM CAPM in (8)-(9). An Appendix

details how the conditional comoments implied by MS can be computed in closed-form following results

in Guidolin and Timmermann (2004, 2008a), which greatly simplifies the iterated (simulated) MLE used

in this paper. Obviously, the issue with the econometric framework in (8)-(9) is that conditional (higher)

comoments that are driven by the presence of MS parameters appear in the conditional mean function that

defines the residual vector ηt+1 ≡ [η1t+1...ηNt+1 ηWt+1 (ηZt+1)0]0 from which the conditional (higher) comoments

18Dahlquist and Soderlind (1999) show that failure to impose this restriction can result in estimation of an admissible pricing

kernel that implies a mean-variance tangency portfolio that is not on the efficient frontier.

20

themselves depend on. However, apart from the fact that the conditional covariances in the model depend

on the MS parameters in highly non-linear ways, (8)-(9) is qualitatively similar to a popular and frequently

used family of nonlinear models that are often also estimated by MLE, ARCH-in mean models, where the

conditional mean function of some variable yt also depends on the filtered conditional variance of y at time

t, hyt , which is a simple, basic form of second co-moment (the conditional variance of y is the conditional

covariance of y with itself). What makes the estimation of (8)-(9) tricky and interesting at the same time

is that while in an ARCH-in mean model, the conditional variance function depends on the residual from

the conditional mean function of y in a very simple way, this is not the case for (8)-(9), and the dynamic

system defined by the model needs to be simulated.

In practice, the estimation is performed by fixing some initial parameter vector θ0 (inclusive of the

estimable elements of the transition probability matrix P) and taking the initial value of the predictors

(z0) and of the relevant co-moments (Cov0[Rit+1, R

Wt+1], Cov0[R

it+1, (R

Wt+1)

2], etc.) as given. One also needs

some initial, starting value for the full-sample path of the regime probabilities, π0tTt=0, where π0t is aK × 1 vector such that ι0Kπ0t = 1. Experience shows that (whenever possible) setting θ0 to the uncon-

strained MS estimates and π0tTt=0 to equal the vectors of filtered probabilities that one can derive froma simple MS time series model (as in Guidolin and Timmermann, 2008b) for a given K, and the ini-

tial comoments to their full-sample unconditional estimates works well in terms of speed of convergence

and stability of the algorithm. At this point, using θ0 and π0tTt=0, one can compute the full sam-ple path of conditional comoments (Covt[R

it+1, R

Wt+1; θ0, π0tTt=0], Covt[Ri

t+1, (RWt+1)

2; θ0, π0tTt=0], etc.for t = 1, ..., T ) using the actual values of the predictors (when relevant) and the fixed starting values

for θ0 and π0tTt=0 as inputs to the closed-form expressions reported in the Appendix. At this point,

Covt[Rit+1, R


t+1, (RWt+1)

2; θ0, π0tTt=0], etc. for t = 1, ..., T can be taken as given

and standard MLE estimation routines can be deployed to obtain a first update of the parameter vector

θ1 and of the filtered probabilities π1tTt=0 (using standard Hamilton-Kim’s type filters). At this point,θ1 and π1tTt=0 are taken as the new initial “conditions” to obtain new sequences of predicted comomentsCovt[R

it+1, R


t+1, (RWt+1)

2; θ1, π1tTt=0], etc. for t = 1, ..., T. One can then iterate

on the algorithm until a pre-set convergence criterion is satisfied.19 Obviously, within each step in which the

comoments Covt[Rit+1, R


t+1, (RWt+1)

2; θ0, π0tTt=0], etc. for t = 1, ..., T are taken

as given (i.e., as if they were predetermined regressors), the algorithm features standard ML estimation of a

MS model with switching regression coefficients (see Hamilton, 1993). What is special about this estimation

algorithm is the step in which — for given parameter estimated and filtered probabilities from the previous

step — the conditional comoments have to be computed to yield the updated matrix of regressors appearing

in the conditional mean function.

As for the standard ML estimation step, letting yt+1 = [x0t+1, x

Wt+1, z

0t+1]

0 be a vector of excess returns

and predictor variables with intercepts μSt+1 = (α1St+1

, .., αNSt+1 , αWSt+1

,μ0zSt+1)0, we can collect the conditional

19Our stopping rule at step j ≥ 1 requires that the Euclidean distance between θj and θj−1 as well as the arithmetic sum

of the K − 1 Euclidean distances between each of the columns of πjtTt=0 and the corresponding column of πj−1tTt=0 beless than e−3. This means that convergence is achieved if and only if a fixed point in the space of conditional comoments and

parameter/filtered probability estimates has been found.

21

moments of returns and the world price of comoment risk in the matrices MSt and ΛSt+1 as follows

MSt ≡

⎛⎜⎝⎡⎢⎣"Cov[xt+1, x

Wt+1|Ft] Cov[xt+1, (x

Wt+1)

2|Ft] Cov[xt+1, (xWt+1)

3|Ft]V ar[xWt+1|Ft] Sk[xWt+1|Ft] K[xWt+1|Ft]

#O

⎤⎥⎦⊗ ι03⎞⎟⎠¯ ¡ι03 ⊗ I¢

ΛSt+1 ≡

⎡⎢⎣ λ2,St+1 ... λ2,St+1 λ2,St+1 0 ... 0

λ3,St+1 ... λ3,St+1 λ3,St+1 0 ... 0

λ4,St+1 ... λ4,St+1 λ4,St+1 0 ... 0

⎤⎥⎦ ,where ι3 is a 3 × 1 vector of ones and J is a matrix that selects the co-moments of excess returns (seeGuidolin and Timmermann, 2008a). We can then write the asset pricing model (5) more compactly as

yt+1 = μSt+1 +MStvec(ΛSt+1) +Bst+1yt + ηt+1. (13)

Here BSt+1 captures autoregressive terms in state St+1 and also collects the coefficients cWSt+1

that measure

the impact of the lagged instruments zt on the market risk premium. Finally ηt+1 ∼ N(0,ΩSt+1) is the

vector of state-dependent innovations. At this point, if ηjSt+1 is a vector of residuals in state St+1, the

contribution to the log-likelihood function conditional on being in state St+1 at time t+ 1 is given (up to a

constant) by:

ln p(yt+1|Ft, St+1;θj) ∝ −1

2ln |ΩjSt+1 |−

1

2η0St+1Ω

−1jSt+1

ηjSt+1 ,

where θj collects the mean (φ), variance (Ω), and transition probability (P) parameters of the model. The

expected value of the log-likelihood employed by the EM algorithm is maximized by choosing the parameters

θj+1 in the j + 1 iteration to satisfy (see Hamilton (1990, p.51)):

TXt=1

KXSt+1=1

∂ ln p(yt+1|Ft, St+1;θ)

∂θ

¯θ=θj+1

p(St+1|y2,y3, ...,yT ;θj) = 0,

where p(St+1|y2,y3, ...,yT ;θj)KSt+1=1 are the smoothed state probabilities for each of theK states. Letting

y ≡ [y02 y03 ... y0T ]0 and η ≡ [η01 η02 ... η0K ]

0, it is useful to re-write the log-likelihood as:

(y1, ...,yT |δ) ∝ −12

KXs=1

ln |Ωs|TXt=2

p(St+1;θj)−1

2

KXs=1

η0s(Σs ⊗Ω−1s )ηs

= −12

KXs=1

ln |Ωs|TXt=2

p(St+1;θj)−1

2η0W−1η

where

Z ≡

⎡⎢⎢⎢⎢⎣Z1

Z2...

ZK

⎤⎥⎥⎥⎥⎦ ; Zi≡

⎡⎢⎢⎢⎢⎣[e0i e

0i ⊗ y01]⊗ IN

[e0i e0i ⊗ y02]⊗ IN...

[e0i e0i ⊗ y0T−1]⊗ IN

⎤⎥⎥⎥⎥⎦

W−1≡

⎡⎢⎢⎣Σ1 ⊗Ω−11 · · · O

.... . .

...

O · · · ΣK ⊗Ω−1K

⎤⎥⎥⎦Σi≡diagp(s2 = i;θj), p(s3 = i;θj), ..., p(sT = i;θj).

22

The EM updating equation for the transition probabilities is based on the smoothed state probabilities

and can be found in equation (4.1) of Hamilton (1990, p. 51). Filtered state probabilities are calculated as

a by-product. The first order conditions for the mean and variance parameters, φ and Ω, are:

∂ ln (yt|δ)∂φ

= −12η0W−1Z = 0 (14)

∂ ln (yt|δ)∂Ωs

= −12

TXt=1

p(St+1 = s;θj)Ω−1s +

1

2Ω−1s ε0sΣsεsΩ

−1s = O s = 1, 2, ...,K, (15)

where εs ≡ [(y2 − Zs2=iφ)0 (y3 − Zs3=iφ)0 ... (yT − ZsT=iφ)0]0 are the residuals in state s and W−1 is a

function of ΩsKs=1. Equation (14) implies that φj+1 is a GLS estimator once observations are replaced by

their smoothed probability-weighted counterparts:

φj+1 = (Z0W−1Z)−1Z0W−1(ιK⊗y). (16)

Similarly, (15) implies the covariance estimator

Ωs =ε0sΣsεsPT

t=1 p(St+1;θj). (17)

φj+1 and Ωs,j+1Ks=1 must be solved jointly since εs enters the expression for the covariance matrix andalso depends on φj+1, while the regime-dependent covariance matrices Ωs,j+1Ks=1 enter (16) via W−1.

Hence, within each step of the EM algorithm, (16)-(17) is iterated upon until convergence of the estimates

φj+1 and Ωs,j+1Ks=1.

4. The Data

We use CRSP equity data (i.e., value-weighted NYSE and, when available, AMEX and NASDAQ) for the

period from January 1927 to March 2008, for a total of 975 observations per series. As stressed by Post and

Vliet (2006), when analyzing risk and risk preferences it is particularly important to include periods during

which investment risks are high and investors may be conjectured to have been keenly sensitive to risk. For

this reason, it appears justified to use extended sample periods that include the prolonged bear markets

of the 1930s, 1970s and early 2000s.20 In particular, we collect the series of value-weighted returns on the

market portfolio in excess of 1-month T-bill yield rate (from Ibbotson Associates) and of returns on Fama

and French’s (1993) SMB (Small Minus Big) and HML (High Minus Low) portfolios.21 The case of the

momentum portfolio (for short, MOMO) is a bit more complicated as six value-weighted portfolios formed

on size and prior returns are used. The portfolios, which are formed monthly, are the intersections of two

portfolios formed on size and three portfolios formed on prior returns. The monthly size breakpoint is the

20In particular, we purposefully retain the early 1927-1962 period. Ang and Chen (2007) show that the value premium for

the 1926-1963 period can be explained using a conditional CAPM. Fama and French (2006a) concur that the value premium

can be captured by the CAPM during the pre-1963 period.21SMB measures the average return on the stocks in the lowest tercile of the size distribution minus the average return on

the stocks in the highest tercile among the size-sorted portfolios; size is measured as total market value of firm equity. HML

measures the average return on the stocks in the highest 50% of the book-to-market-sorted distribution minus the average return

on the stocks in the lowest 50% of the book-to-market-sorted portfolios.

23

median NYSE market equity. The monthly prior return breakpoints are the 30th and 70th NYSE realized

return percentiles. MOMO returns are the average returns on the two high (above the 70th percentile)

prior return portfolios minus the average return on the two low (below the 30th percentile) prior return

portfolios, independently of size. This methodology captures the standard logic of measuring the returns of

past “winners” minus past “losers”.22

The choice of the instrument set zt is motivated by two considerations. First, the instruments should

consist of variables that are able to predict equity portfolio returns. Second, the choice of instruments should

be parsimonious due to power considerations, as argued in Dittmar (2002). Consequently, we use three

instrument series, represented by variables that are well known in the literature for their ability to forecast

(market) stock returns. The first is the value-weighted CRSP dividend yield series, computed as a ratio

between the trailing sum of 12-month dividends and time t−1 monthly price index (multiplied by 100). Thesecond variable is the term spread, computed as the monthly difference between the annualized percentage

yield on a constant maturity 10-year government bond (Treasury note) and the annualized percentage 1-

month Treasury Bill yield. The third variable is the default spread, constructed as the monthly difference

between the annualized Moody’s seasoned Aaa and Baa corporate bond yields (in percentage). In the case

of the default spread, monthly series are obtained as averages of daily series.23 The interest rate data are

obtained from FREDIIR°at the Federal Reserve Bank of St. Louis.

We also obtain series for the 25 Fama and French (F-F, 1993) portfolios and 17 industry portfolios.

The 25 F-F portfolios are obtained by applying a double, 5× 5 sorting on both size (measured by marketequity) and book-to-market ratio. The size breakpoints for year t are the NYSE market equity quintiles

at the end of June of t. The book-to-market ratio for June of year t is the book equity for the last fiscal

year end in t− 1 divided by the market value of the equity for December of t− 1; also the book-to-marketratio breakpoints are NYSE quintiles. The industry portfolios are simply obtained by assigning each NYSE,

AMEX, and NASDAQ stock to an industry portfolio at the end of June of year t based on its four-digit SIC

code at that time; Compustat SIC codes are used when available, while whenever Compustat SIC codes

are not available, CRSP SIC codes are used. Then returns are computed from July of t to June of t + 1.

The 25 F-F portfolios and the 17 industry portfolios are used to investigate the “out-of-sample” pricing and

forecasting performance of the empirical EPK models to be estimated in Section 5.

The literature shows that there are many competing definitions of value-sorted portfolios and, corre-

spondingly, of value and growth style-portfolios.24 We also assess how successful our competing empirical

models for the pricing kernel are with reference to two alternative portfolio construction methods. First,

22All of these data are dowloaded from Kenneth French’s data library, which is at http://mba.tuck.dartmouth.edu/

pages/faculty/ ken.french/Data Library/. SMB and HML returns are computed with reference to annual sortings that are

formed on July of year t and held until June of t+1; these portfolios include all NYSE, AMEX, and NASDAQ stocks for which

market equity data are available for December of t− 1 and June of t, and (positive) book equity data for t− 1. In the case ofMOMO, to be included in a portfolio for month t (formed at the end of the month t− 1), a stock must have a price for the endof month t− 13 and a good (valid) return for t− 2.23Moody’s tries to include bonds with remaining maturities as close as possible to 30 years. Moody’s drops bonds if the

remaining life falls below 20 years, if the bond is susceptible to redemption, or if the rating changes.24High HML portfolios are defined to be value portfolios, while low HML portfolios are growth portfolios. Our definition of

HML has been used as a standard measure of the value premium since Fama and French (1993), and has been used in many

recent studies, including Petkova and Zhang (2005) and Fama and French (2006a).

24

we construct value-minus-growth (VMG) portfolios using deciles instead of the more rudimentary 2 × 3sorting originally used by Fama and French (1993). Using a decile classification, we produce the portfolios

Hd (highest book-to-market decile), Ld (lowest book-to-market decile) and, as a difference, HMLd. These

definition of VMG has been used by Ang and Chen (2007), because a decile-based VMG is expected to give

a stronger evidence on the value premium by using the highest and lowest book-to-market decile portfolios.

Second, we employ a notion of VMG that emphasizes small capitalization stocks. Because the value pre-

mium is supposedly stronger among smaller firms (see Loughran, 1997, and Fama and French, 2006), from

a five-by-five sort on size and book-to-market the small-value (Hs) and small-growth (Ls) portfolios can be

calculated, and the small value premium HMLs is the small-value (Hs) minus the small-growth (Ls).25 For

symmetry, we also use one alternative portfolio construction method applied to size-sorted portfolios: every

year, we sort the CRSP universe in ten size deciles and produce Sd (lowest size decile), Bd (highest size

decile) and, as a difference, SMBd.

Table 1 starts by presenting standard summary statistics for the portfolios which are the object of our

analysis, and for the instruments used to capture predictability in the returns on the wealth process. For

comparison, we include in the Table also summary statistics concerning HMLd, SMBd, and HMLs, as a

way of performing robustness checks. Panel A of the table concerns long-run, full-sample results spanning

the 81 years and 3 months that go from January 1927 to March 2008. Data on excess market returns

show customary features, i.e., a mean excess return (in annualized terms) of 7.6%, (annualized, assuming

counter-factual independence over time) volatility of 18.8%, and an annualized Sharpe ratio of 0.40. Excess

market returns also display a moderately positive skewness (0.23, which is not typical in the literature and

entirely attributable to data from the 90s and the 00s) and rather large excess kurtosis. Overall, there is

little doubt that excess market returns fail to have an unconditional Gaussian density, as shown by the

large Jarque-Bera test statistic. The full-sample data also show clear evidence of the existence of value and

momentum anomalies, to be interpreted as the inability of the unconditional CAPM to explain the returns

from the HML and MOMO long-short portfolios.26 Standard, unconditional CAPM (single-index model)

regressions yield large alphas (5.2% per year for HML, 11.3% for MOMO, up to a stunning 13.4% per year in

the case of HMLs) with p-values virtually indistinguishable from 0, i.e., that are compatible with the null of

the CAPM practically with zero probability. As often stressed in the literature, the betas for these portfolios

are either too small (although statistically significant) or even negative and as such cannot rationalize the

high and statistically significant returns on value and momentum-sorted portfolios (6.1% for HML, 9.2%

for MOMO, and 11.5% for HMLs, in annualized terms); in fact, negative betas can only exacerbate the

phenomenon and they clearly imply alphas which are even larger than the raw, total mean returns from the

long-short portfolios.27 The percentage R2s from a plain vanilla, standard CAPM are practically negligible,

25However, Fama and French (2006a) show that Loughran’s (1997) evidence that there is no value premium among large

stocks seems to be peculiar to (1) the post-1963 period, (2) using the book-to-market ratio as the value-growth indicator, and

(3) restricting the tests to U.S. stocks. In particular, during the period 1927-1962, the value premium is nearly identical for

small and big US stocks.26This is different from Ang and Chen (2007) who report weak value effects for their overall 1927-2001 sample. As we shall

observe later, this is attributable to large HML mean return over the recent 2002-2007 period, when HML has displayed a mean

comparable to the full-sample (0.46%) which cannot be explained by the CAPM (the HML alpha is a large 0.51% with a p-value

of 0.027, while beta is negative).27Our findings for HMLd are mixed. On the one hand, the annualized mean return on HMLd is 6.3% and this mean is

25

between 3 and 14 percent, which confirms the standard adage that equity portfolio returns are extremely

hard to fit using (too) simple asset pricing models.

As reported in many recent papers, the size anomaly fails to be present in the full sample, since the

annualized average return on SMB and SMBd are only 1.9% and 7.4% respectively, only the second mean

is statistically significant, and most importantly, the corresponding alphas are relatively small (only 0.4 and

3.3 percent, respectively) and statistically insignificant. While in the case of the classical Fama and French’s

(1993) SMB portfolio there is indeed very little to explain, in the case of SMBd, a plain vanilla unconditional

CAPM yields a rather large beta of 0.53 capable to account for a substantial portion of SMBd returns (the

R2 is a non-negligible 14%); as a result the alpha is small and not statistically significant. All the long-

short portfolios entail large amounts of excess kurtosis (always statistically significant) and of distributional

asymmetries. However, while most portfolios have positive and significant skewness, HMLs and MOMO

imply negative skewness. In any event, none of the return series under investigation is compatible with an

unconditional Gaussian distribution, since there is clear evidence of the Jarque-Bera statistics exceeding 1-

and 5 percent size critical values. The three predictors used in our analysis confirm standard properties

reported in the literature, in particular a moderate volatility (between 3 and 6 times smaller than the

volatilities typical of equity portfolio returns) and the fact that the term and default spreads were on

average positive over our sample period, as one would expect.

Panels B and C distinguish between the pre- (1927-1963) and post- (1964-2007) COMPUSTAT periods.

Similarly to Ang and Chen (2007), in the pre-COMPUSTAT period we fail to find evidence of a value

anomaly: for both HML and HMLd, the alphas are small and not statistically significant, even though the

mean return on HML is relatively large (6.1% per year); in fact, the point estimate for the HMLd alpha

is negative. This is due to the fact that the pre-1964 CAPM betas are rather large and able to explain

a large portion of the variability in the positive value premia (with R2 of 27% for HML and of 31% for

HMLd). However, the value anomaly persists for small capitalization stocks, because HMLs has a mean

return of over 12% in annualized term and yields a highly statistically significant and large alpha of 13.7%

(which is clearly caused by the fact the single-index beta is once more negative). On the contrary, the value

anomaly is strong in the COMPUSTAT sample, because all the mean portfolio return estimates are high

and statistically significant (from 6.2% for HML to 11% for HMLs) and especially the single-index alphas

are economically large (from 7.3% for HMLd to 13.4% for HMLs) and highly statistically significant. These

results are identical to those reported by Ang and Chen (2007) with reference to HMLd and the period July

1963 - December 2001, i.e., 0.53% per month average return (we have a 0.55%) and a monthly alpha of

0.60% (we have 0.61%); similarly we find that while in the pre-1964 sample the HMLd beta is estimated to

be 0.74 and large enough to explain the performance of the long-short portfolio (R2 is 31%), in the post-1964

subsample, the beta falls negative to -0.11 and can no longer explain the performance of the book-to-market

strategy (R2 is 1%).

Panels B and C show that there is no strong time-variation in the strength of the momentum anomaly.

Although MOMO’s average returns increase from 8.1 to 10 percent (in annualized terms) when going from

the 1927-1963 to the 1964-2008 sub-sample, MOMO’s alpha in fact declines from a stunning 12.4% to a still

statistically different from zero with a p-value between 1 and 5%. On the other hand, the associated single-index alpha is a

moderate 2.9% per annum which fails to be statistically significant. This is probably due to the fact that returns for both

deciles 1 and 10 of the book-to-market sorted distribution are extremely noisy.

26

large 10.3%, and both estimates command p-values which are close to zero, while betas are either zero or

negative. Contrary to what has been sometimes reported (obviously, on shorter or different sub-samples)

there was never substantial evidence of a size anomaly. Both SMB and SMBd command neither statistically

significant and positive average returns (SMBd implies a modest 4.6% over the 1964-2008 period, but the

corresponding p-value is close to 10%) nor significant and positive alphas. While it is sometimes contended

that the size anomaly relates to the early part of the CRSP/NYSE sample, in fact the only feeble signal of

a size anomaly is given by SMBd for the sub-sample 1964-2008, when the alpha is 3.6% in annualized terms

with a p-value of 0.17.

Finally, panel D reports summary statistics for the recent sub-period 1994-2008, which collects only 2-3

complete market cycles and yet includes a sufficient number of observations to allow us to express some

comments on the dynamics of the anomalies in the very recent financial history of the US. Clearly, in recent

periods value and momentum anomalies have been stronger than ever. For instance, HML commands a

plain-vanilla single-index alpha of 9.8% (in annualized terms) and a MOMO alpha of 11.2%, both with

p-values which are essentially zero. As in panels A and C, HMLs is even larger, for instance with an alpha

that is almost double the alpha of the standard, Fama and French’s (1993) HML definition. Both the HML

and MOMO portfolios imply negative betas, which is of course the cause of the large and positive alphas. On

the contrary, it becomes clearly visible that especially in the recent period, the size anomaly has disappeared

altogether, with both negative (or anyway, not significant) raw returns and alphas.

Of course, we are far from being able to claim that we are the first researchers to observe that value-

and momentum-sorted portfolios imply positive and statistically significant abnormal returns that cannot

be explained by pure “beta risk”. For instance, starting with Fama and French (1993), many authors have

estimated simple OLS regressions on portfolios of stocks sorted by book-to-market ratios and rejected the

hypothesis that the OLS alpha is equal to zero. However, as argued by Ang and Chen (2007), using an

unconditional factor model estimated by OLS to make inferences regarding the conditional CAPM may be

treacherous for a number of reasons. Since the seminal work by Jagannathan and Wang (1996), we know that

if time-varying conditional betas are correlated with time-varying market risk premia, then the conditional

CAPM is observationally equivalent to an unconditional multifactor model in which Cov[Et[xWt+1], βt] is an

additional, priced factor. Under the null of a conditional CAPM, we would expect that the estimate of

the unconditional OLS alpha captures both the conditional alpha and the interaction of time-varying factor

loadings and market risk premia. Hence, any statement made about the failure of an unconditional CAPM

to capture the spread of average returns in the cross-section does not imply that a conditional CAPM cannot

explain the cross-sectional spread of average returns. This implies that when conditional betas and market

risk premia are correlated, OLS fails to provide consistent estimates of both the conditional alpha and the

conditional betas. Moreover, Ang and Chen (2007) have shown that the degree of the inconsistency depends

on unknown parameters driving the conditional beta process that are not directly observed. In Section 5

we show that an extended (mixture) conditional CAPM models in which the betas change over time and

are potentially correlated with the risk premium (for instance, because both sets of parameters follow a

Markov chain process driven by the same latent state variable) may generate zero (or at least, statistically

insignificant) abnormal returns.

27

4.1. Time-Varying Abnormal Returns and Co-Moments for Size, Value, and Momentum Portfolios

As a way to provide some intuition for our empirical strategy, we deepen our preliminary investigation of the

data by computing and plotting time-varying (OLS, unconditional) alphas, covariances, partial (downside)

covariances, coskewness, and cokurtosis coefficients for SMB, HML, and MOMO portfolio returns. The

co-moments are all computed vs. value-weighted CRSP market portfolio returns. Co-skewness for portfolio

i is defined as Coskew[Rit, R

Wt ] ≡ Cov[Ri

t, (RWt )

2] and cokurtosis as Cokurt[Rit, R

Wt ] ≡ Cov[Ri

t, (RWt )

3].

Of course, the covariance of these long-short portfolios vs. the market portfolio is proportional to a time-

varying measure of beta, similarly to Ang and Chen (2007); the partial covariance — computed conditioning

on negative market returns, RWt+1 < 0 — is proportional to a time-varying downside beta as in Ang, Chen,

and Xing (2006). However, we extend our investigation to coskewness and cokurtosis as well. Because

our analysis in this Section wants to be a purely exploratory one, we compute time-varying alphas and

co-moments resorting to the simplest possible method, i.e., 60-month rolling window estimates. To facilitate

interpretation, we report covariances scaled by the variance of the market portfolio over the same 60-month

period (i.e., we report standard, unconditional CAPM and downside betas), coskewnesses scaled by the

factor V ar[RWt ]pV ar[Ri

t], and cokurtosis scaled by the factor3pV ar[RW

t ]pV ar[Ri

t]. These choices follow

Ang, Chen, and Xing (2006).

Figure 1 plots the results. In the first plot we report 60-month rolling window estimates of the uncondi-

tional CAPM alphas. All long-short portfolio display an enormous amount of variation over time and in all

the cases, the OLS alpha frequently changes sign over time. The HML alpha is mostly positive, although

recursive estimates from the mid-1930s turn negative and large (below -2% per month at the apex of the

Great Depression, between 1933 and 1934), indicative of abnormal and large returns of growth companies in

excess of value companies, relative to what is justified by the CAPM. Apart from three short-lived episodes

coinciding with the beginning of WWII and the recessions of 1970 and 1974, the HML alpha remains positive

throughout and it tends to persistently exceed 1% per month for long periods such as the 1950s, the late

1980s, the period 1996-2000, and more recently the Spring of 2007. Although there is evidence of negative

HML alphas in the 1930s, the plot does not support a simplistic conclusion that HML abnormal returns are

originated only from the post-1964 period, although it is true that a 1927-1963 average (0.41%) is inferior

to the 1964-2008 one (0.75%) and that both the 1980s and late 1990s feature two remarkable peaks in the

HML alphas. What is evident is that the value alphas are strongly time-varying as well as considerably

persistent.28

Similar remarks hold for the recursive estimates of MOMO’s alpha, although MOMO seems to leave

the Great Depression “regime” before HML does (starting in late 1935), while the negative/low alphas

corresponding to the bear markets of late 1960s, 1970s, and early 1980s are more negative than those

found for HML. MOMO’s alpha lands negative territory also in the bear markets of 2001-2002. In fact,

the correlation between HML and MOMO’s alphas is positive and statistically significant (0.51). On the

opposite, the SMB alpha is positive and negative at equal frequencies: initially positive and large during

the 1930s and 1940s, and then mostly negative between the mid-1980s and 2004. Apart from the early

part of the sample, the size anomaly seems then to be related to two alpha-spike periods (reaching monthly

28The first-order serial correlation coefficient for the HML alpha is 0.986; those for MOMO and SMB are 0.972 and 0.991,

respectively.

28

values of approximately 1%) between 1967 and 1972 and then 1978 and 1985. In fact, the SMB alphas are

negatively correlated with both HML (-0.35) and MOMO (-0.15).

The second and third plot show the standard and downside CAPM betas, respectively. Both definitions

of beta show considerable time variation, as already observed by Ang and Chen (2007).29 However, while

the SMB beta is moderate in absolute terms but generally positive (which explains why on average the SMB

alphas are smaller than those of other long-short portfolios), the HML beta starts positive and relatively

high in the 1930s (around 0.5), it then trends down reaching negative values (of around -0.5) in 2004, and

then resuming towards higher values between 2005 and 2008. This is consistent with Ang and Chen’s finding

of a structural difference between the pre-1963 and the post-1963 samples: in the pre-1963 subsample, the

beta of the book-to-market strategy are statistically positive and large enough to explain the performance of

the strategy; in the post-1963 subsample, book-to-market betas become negative and can no longer explain

the performance of the book-to-market strategy. MOMO betas are instead mostly characterized by high

volatility, although also in this case three clear patches of negative betas — 1930s and 1940s, 1970s, and

the post-2001 period — can be distinguished from periods of positive betas, i.e., the 1950s and the 1980s

and 1990s. The recursive downside betas tend to follow the general patterns revealed by the unconditional

betas: the HML downside beta are simply small, although it becomes negative and rather large (around -0.6)

between 1998 and 2004; the MOMO downside beta is very volatile and turns negative in correspondence

to bear periods, such as the 1930s, the 1970s, the late 1980s, and recently the post-2001 period. The SMB

downside beta is of moderate size but also remains persistently positive getting close to a unit value in the

bull periods of the 1960s, 1980s, and 1990s. Of course, the erratic and volatile dynamics of the HML and

MOMO betas — both unconditional and downside — makes it hard to think that either the unconditional

CAPM or a semi-variance, conditional form of the CAPM may explain the high and positive returns on

these two portfolios. On the opposite, the dynamics over time of both the standard and downside SMB betas

leaves it open the possibility that some small modification of the CAPM may explain away the presence of

high-alpha periods.

The fourth and fifth plots show recursive, rolling window estimates of (scaled) coskewness and excess

cokurtosis. MOMO has dreadful coskewness properties: with few exceptions, coskewness is always negative,

i.e., MOMO returns are below-average when the variance of the market portfolio is above average; half of the

time, coskewness is below -0.5, i.e., it is hardly negligible; during the 1930s and 1940s, the late 1980s, and

in the period that follows 2005, MOMO’s coskewness falls below -1. On the opposite (with the exception

of the 1930s and 1940s, when cokurtosis is negative), MOMO’s excess cokurtosis oscillates around zero,

which illustrates that most of the time the MOMO returns and the skewness of market returns are weakly

correlated. However, coskewness is certainly a possible candidate to explain away the high MOMO’s alphas,

since negative coskewness needs to be rewarded by high risk premia. The coskewness of HML tends instead

to be moderate, although after a start in the positive numbers (which would make HML more appealing

than what is implied by a standard mean-variance framework) in the 1930s, HML coskewness remains in

the negative region most of the time and becomes rather large (around -1) during the 1990s. HML shows

instead persistently positive and high excess cokurtosis between the 1930s and the 1950s; however, after

29Of course, the finding of strong time-variation in recursively estimated betas stresses that it may be difficult to produce a

correct inference regarding the true conditional alpha, because the standard unconditional CAPM OLS regressions are likely to

be misspecified.

29

1987 HML’s excess cokurtosis turns mostly negative. The dynamics in SMB coskewness and cokurtosis are

much less interesting, although coskewness turns large and negative for most of the 1980s and 1990s, while

cokurtosis spikes in the 1940s and between 1987 and 1992. All in all, it remains unclear whether the third

and fourth higher co-moments may provide a substantial contribution to explain the high HML alphas in

the first panel. Section 5 explores this avenue by proposing the concept of mixture CAPM and estimating

a number of related econometric models.

Figure 2 repeats these calculations using 120-month rolling windows. The patterns are qualitatively

identical. The only interesting differences is that the MOMO’s alphas are now essentially always positive,

although they become rather small between the 1970s and 1980s. As conjectured by Ang and Chen (2007),

other features besides CAPM alphas and betas exhibit considerable time variation. In particular, we have

seen in Figure 1 that variation in coskewness (for MOMO and partially HML) and cokurtosis (for MOMO)

may play an important role in explaining away the high and positive alphas of value- and momentum-sorter

long-short portfolios. Moreover, it is clear from Figures 1-2 that many of the series display time-varying

patterns of persistence potentially consistent with the presence of regime switching dynamics, with recurrent

periods of positive and high values followed by periods of negative and small values.30

5. Empirical Estimates

In this section we report and comment on both unrestricted and restricted estimates of the pricing kernel

models reviewed in Section 3 and estimated on a 4 × 1 vector that includes value-weighted excess marketreturns, returns on the standard Fama-French SMB and HML portfolios, as well as returns on the long-short

momentum portfolio; when relevant, the instruments in zt that capture any kind of predictability of market

excess returns are the dividend yield, the term spread, and the default spread, i.e., N = 4 and M = 3.

In what follows we assume that RWB,t+1 = Et[R

Wt+1], where the conditional expectation of one-month ahead

market gross returns is computed using in each occasion the model under estimation. Of course, under

the assumption that the pricing kernel is unique and identifiable, there is nothing compelling in the choice

of basing our estimates on a vector of portfolio returns that — besides excess market returns — picks up

equity portfolio returns that reflect size, value, and momentum anomalies. However, given the key role

that these US equity portfolios have played in the development of the empirical literature on the US equity

cross-section, we believe this estimation exercise to be a sensible enough starting point.

5.1. The Smooth, Single-State Benchmark

We start by presenting empirical estimates of the single-state SSBC benchmark in ((12)). Table 2 reports

parameter estimates and summary statistics concerning the fit provided by the model to the four equity

portfolios under investigation.31 Besides the maximized value of the log-likelihood function, we also report

30For instance, the 120-month rolling window first-order serial correlation coefficient for the HML alpha is 0.992; those for

MOMO and SMB are 0.981 and 0.996, respectively. Of course, this is also an implication of the rolling window structure used

in the construction of the statistics presented.31To save space, we have omitted to report estimates of the covariances between shocks to predictors and shocks to the

predictors themselves and shocks to portfolio returns, i.e., Cov[ηZt+1, ηWt+1, ], Cov[η

Zt+1,η

it+1] (i = SMB, HML, and MOMO),

and V ar[ηZt+1]. They are available from the author upon request.

30

a few standard information criteria (Akaike’s, Hannan-Quinn’s, and Bayes-Schwartz’s) that trade-off fit and

parsimony and penalize the log-likelihood using an increasing function of the number of parameters to be

estimated. In the table, three blocks of estimates appear, corresponding to different sets of restrictions

on the pricing kernel (the first set of estimates is obtained under an empty set of restrictions). The table

shows that as tighter and tighter constraints on the sign and shape of the pricing kernel are imposed, the

maximized log-likelihood rapidly declines, moving from -10339 for the unrestricted case to -11884 when both

sign, monotonicity, and convexity restrictions are imposed. Consequently, the information criteria increase

as we move from the left to the right in Table 2.32 A few indications of problems for (12) in providing an

adequate fit come from the parameter estimates displayed in the table. First, we notice that even though

(12) allows already a non-negligible degree of flexibility to the pricing kernel, the alphas for HML and/or

MOMO are systematically statistically significant, with annualized abnormal rates of return between -0.9

and 6.8 percent in the case of HML, and 1.3 and 17 percent per year in the case of MOMO. In one case

(restriction set 2), also the SMB alpha actually turns positive and statistically significant. This means

that the pricing kernel defined by (12) fails to bring about consistency between the risk factors that are

assumed to be priced (as well as the assumed mapping from predictors to the unit risk prices) — in this

case, covariance and semi-covariance risk — and the dynamics of excess market returns and net returns on

short-long equity portfolios built exploiting the cross-sectional dispersion of firms in the size, value, and

momentum dimensions. Worse, and especially in the unconstrained case, the evidence that the two assumed

risk factors are consistently priced in the cross-section is rather weak. For instance, in the first column, we

have that both the joint nulls of g1 = 0∩b1 = 0 of g01 = 0∩ b1 = 0 cannot be rejected using a standardjoint Wald test (and all the individual coefficients fail to be significant at standard test size levels), which

means that both covariance and downside-covariance risk fail to be priced in the US cross-section. When

economic restrictions are imposed, it remains true that the null of g1 = 0 ∩ b1 = 0 cannot be rejected,although this is no longer the case for the composite hypothesis g01 = 0∩ b1 = 0, i.e., there is at leastweak evidence of downside covariance risk being priced when unit risk prices evolve “smoothly”. In fact,

when restrictions are imposed there is evidence of the loading of the downside covariance risk price on the

term spread being statistically significant (with a borderline p-value just below 0.01), while — recalling that

λ02,t+1 = −Rft (g

01 + b

01Bzt) — it is clear that a few of (yet, insignificant) estimates in b1 and b1 produce

embarrassing coefficient estimates: for instance, a higher time t default spread (typical of business cycle

contractions) forecasts a higher one-month ahead price of downside covariance risk λ02,t+1, which means that

in “bad times” equity prices should be increasing.33

Table 2 also shows the estimates of the VAR(1) for the predictor vector zt when the unit risk prices are

functions of the variables in zt. Clearly, given the structure of (12), the VAR model for zt is autonomous

(i.e., it satisfies a block-exogeneity property) vs. the rest of the model in (12). However, our estimation

32As usual, information criteria are defined in such a way that the best performing model returns the lowest value of the

information critera.33Interestingly, imposing restrictions that ensure that the estimated Mt+1 process is a pricing kernel in an economic sense,

causes the evidence in favor of the hypothesis that covariance and semi-covariance risks are priced to get stronger, with g1

systematically negative and statistically significant. However, these good news are countered by the fact that g01 is estimated

to be negative and significant, which implies that the downside covariance risk premium is negative, which is hardly a sensible

finding. Moreover, a few of the unexplained variance estimates (especially for MOMO and the market portfolio) grow very large

(exceeding 25% per year) and seem “too high” to be sensible in an economic perspective.

31

strategy in practice produces estimators which are of a covariance matrix-weighted type (like GLS, which

are just a special case of MLE) and as a result the estimates of the VAR for zt turn out to also depend on the

fact that economic restrictions on the pricing kernel (hence, the unit risk prices) are imposed. As a matter

of fact, the point estimates of the coefficient in μ and B hardly change with the type of restrictions imposed,

and only the reported standard errors show some degree of sensitivity to the different estimation exercises.

All of the three predictors are strongly persistent, as one would expect, with own-lags VAR coefficients of

between 0.73 (for the term spread) and 1, even though in all cases standard stationarity conditions are

satisfied. Otherwise, there is only some evidence that past values of the default spread help forecasting

subsequent values of the term spread.

Table 3 reports the comparative, in-sample pricing performance of all models examined in this Section,

starting from the SSBC benchmark. Both in terms of root-mean-squared pricing errors (RMSPE) and of

(pseudo-) R2s the model yields a rather mediocre performance. For instance, all R2s are between 0 and 2.1

percent, and the performance tends to be worse than what can be attained using a plain-vanilla unconditional

CAPM. Interestingly, while in the absence of constraints, the RMSPEs are almost entirely caused by variance

in the pricing errors (i.e., the mean pricing errors are generally very small), once constraints are imposed

also mean pricing errors increase and give a contribution to the total RMSPEs. To make sense for how

disappointing the model performance may be, we have to recall that in Table 1 the monthly volatility

estimates for SMB, HML, MOMO, and market were 3.4, 3.6, 4.7, and 5.4 percent, respectively. As a result,

RMSPEs between 2.9 and 5 percent — in short, of a scale comparable to the total volatility of portfolio

returns — appears to be quite unsatisfactory.

One final measure of plausibility of the benchmark (12) is given by average sample measures of the

estimated price of risk coefficients,

λ2 = −Rft (g1 + b

01bBzt)

λ02 = −Rf

t (g01 +

bb01 bBzt), (18)

where hatted matrices refer to iterated MLE estimates, and Rft and z are the sample means of the gross

riskless interest rate and of the predictor variables, respectively. In the unrestricted case, we obtain λ2 =

0.0075 and λ02 = −0.0012, which is clearly not sensible as the downside covariance risk premium would

actually be negative. We also perform the following exercise: we collect the official, monthly NBER recession

dates and separately compute zrect over recession months and z

expt over expansion months. We can then

compute λ2,rec, λ2,exp, λ02,rec, and λ

02,exp, obtaining λ2,exp = 0.0093 and λ2,rec = 0.0001, λ

02,exp = −0.0067

and λ02,rec = 0.0213. Oddly, economic expansions (that according to NBER dating would characterize

approximately 80% of our 1927-2008 sample) are characterized by either economically negligible or even

negative prices of risk. These results show how important is to impose a minimal set of sensible economic

restrictions when estimating empirical pricing kernels. Under the first set of constraints, the risk premia

estimates are on average λ2 = 0.0026 and λ02 = 0.3497, i.e., the downside risk premium becomes particularly

sizeable; under the second set of restrictions also involving the shape of the pricing kernel, the risk premia

estimates in correspondence to mean values of the predictors are instead λ2 = 0.0444 and λ02 = 0.1653. As a

reflection of the risk prices being relatively small, we have observed in Table 3 that the single-state smooth

benchmark may not produce appreciable in-sample R2s and/or reduce the RMSPEs statistics below the

32

sample standard deviation of the portfolio returns under consideration.34

Finally, we compute confidence intervals for these risk price estimates using the following parametric

bootstrap strategy: using the sets of parameter estimates in Table 2 and their (unreported) estimated

covariance matrix, we draw 20,000 975-month long independent samples from the (asymptotic) multivariate

normal distribution of the parameter estimates; for each of the 20,000 draws, we proceed to compute the

estimates of the risk premia implied by (18); we then compute 90% bootstrapped confidence bands by

recording the λ2,0.05 and λ02,0.05 that leave 5% of the boostrapped distribution of the λ2 and λ

02 below λ2,0.05

and λ02,0.05, and the λ2,0.95 and λ

02,0.95 that leave 5% of the boostrapped distribution of the λ2 and λ

02 above

λ2,0.05 and λ02,0.05.

35 Limiting ourselves to the case in the second set of constraints had been imposed, we

find the following bootstrapped 90% confidence bands for the unit risk prices:

λ2 ∈ [−0.1422, 0.2269] λ02 ∈ [−0.6935, 0.7775].

Clearly, these confidence bands are wide and do not allow us to reject the null hypothesis that in fact λ2 = 0

and λ02 = 0, i.e., there is in practice no compensation for either covariance or downside covariance risk, in

which case (18) reduces to a simple way of estimating the means of the equity portfolios with a multivariate

Gaussian framework, when the wealth process is predictable and the predictors follow a standard VAR(1)

process as in much of the existing literature.

5.2. Four-Moment Markov Switching CAPM

Table 4 presents the empirical estimates of the complete two-state model (8)-(9) in which, similarly to

Guidolin and Timmermann (2008a), covariance, coskewness, and cokurtosis risk are all priced, while down-

side covariance risk is not (i.e., λ−2,St+1 = 0 is imposed). The structure of the table and its panels are

otherwise similar to Table 2.36 The unconstrained version of the model reveals that modeling regime shifts

in predictability within a four-moment CAPM framework does bring most alphas towards zero in statis-

tical terms and at the same time also greatly weakens the statistical evidence of the presence of average

abnormal returns that the assumed risk factors (here, covariance, coskewness, and cokurtosis risk) cannot

capture. In fact, only for HML in the first regime and for SMB in the second regime we find evidence of

a statistically significant (positive) alpha. The latter alpha is also the only economically large coefficient

(5.4% per month).37 The evidence of average abnormal returns across the two regimes disappears almost

completely when the first set of constraints is imposed, i.e., arbitrage opportunities are ruled out, the signs

34For instance, when all constraints are imposed, we also compute the recession- and expansion-speficic risk prices and obtain

λ2,rec = 0.0271 and λ2,exp = 0.0487, λ02,rec = 0.0341 and λ

02,exp = 0.2974. Therefore expansions would be characterized by

sensibly higher risk premia on both types of covariance risk and as such command higher expected returns and lower equity

prices, assuming that measured conditional covariances are not sensible to business cycles (which is counterfactual).35When simulating the risk premia levels in correspondence to the means, we set Rf

t = 1/Mt+1(zt), i.e., we compute the mean

riskless rate as an implication of the mean empirical pricing kernel implied by the predictors taken at their means.36We omit to report estimates of the (regime-dependent) covariances between shocks to the predictors and shocks to portfolio

returns, as well as the (regime-dependent) covariance matrix of shocks to the predictors within the assumed VAR(1) framework.37It appears that αW1 = 2.14 is also statistically significant with a p-value between 0.01 and 0.05. However αW1 > 0 does not

imply the presence of any abnormal returns and in fact αW1 can be simply interpreted as an intercept term that adjusts the

mean of fitted excess market returns after taking into account of variance, skewness, kurtosis, and of the effect of the predictors

in zt.

33

of the risk premia are constrained to agree with the implications of decreasing absolute risk aversion, and the

estimated pricing kernel implies the mean short-term rate observed over our sample period. Even though

the regime 2 SMB alpha remains rather large in economic terms (4.99%) and commands a p-value around

0.05, all the remaining alpha coefficients fail to be significant at standard levels. This is quite remarkable

in the light of the Table 1, where the HML and MOMO’s alphas were all between 5.5 and 11.5 percent per

annum and highly statistically significant. As already stressed, this tendency of the evidence of non-zero

alphas to weaken when Markov regimes are modelled in unit risk prices and the relevant conditional co-

moments projected (predicted) out of the resulting regime switching model is not a feature we have assumed

a priori (equivalently, the data may have revealed larger, not smaller Markov switching alphas). Finally,

when the second set of constraints is additionally imposed, we only find marginal statistical significance

for the MOMO’s alpha in regime 1, while the SMB’s alpha for regime 2 greatly declines and now fails to

be significant. All in all, comparing Table 4 with Table 2 shows that when the presence of regimes and

their implications for (co-) higher order moments and predictability are taken into account, most of the

evidence on the US cross-sectional anomalies tends to disappear. Focussing on the last three columns of the

table, where all economic constraints have been imposed, we observe that the residual alphas range from an

annual 2.1% (for MOMO in regime 2) to 33% (for SMB in regime 2, which however fails to be statistically

significant).

Even though the results concerning the average abnormal returns in the two states are promising, some

uncertainty remains as to which factors are priced in this model. Focussing again on the estimates obtained

under third set of constraints, we notice that while in regime 2 only coskewness seems to be priced with

λ32 = −Rft g22 = −2.7651, in regime 1 there is evidence that all conditional co-moments are priced, with

λ21 = −Rft g11 = 0.5008, λ31 = −Rf

t g21 = −0.3736, and λ41 = −Rft g31 = 0.0732, and the hypothesis that

g22 = 0, g11 = 0, g21 = 0, and g31 = 0 are all rejected with p-values between 0.01 and 0.05. Interestingly, the

two regimes have a starkly different asset pricing characterization: regime 1 is a truly four-moment CAPM

state, in the sense that independently of the restrictions imposed, covariance, coskewness, and cokurtosis

are all significantly priced and the premia are far from negligible; regime 2 implies to the contrary that only

coskewness risk be priced and — when all restrictions are imposed — with a unit risk premia that appears not

only statistically significant but also economically large. Additionally, in regime 2 when no sign restrictions

are imposed, the covariance risk premium takes on an incorrect sign (i.e., the higher the covariance between

a portfolio and the wealth process, the lower the portfolio average return in excess of the riskless asset). In

fact, simple calculations based on the smoothed probabilities implied by the model (see Figure 3) reveal that

there are structural differences in the average levels — obtained by simulating the model over time under the

parameter estimates in Table 4 — of covariance, coskewness, and cokurtosis (all measured with respect to

the wealth process) across the two regimes. In regime 1, conditional comoments are small in absolute value

(the average covariances are 1.82, -0.49, and 1.20 for SMB, HML, and MOMO, respectively; the average

coskewnesses are -1.91, -1.11, and -0.98; the average cokurtosis are 47.97, -17.27, and 43.39, for SMB, HML,

and MOMO, respectively).38 However, as argued in the Introduction, a two-state Markov switching model

38These regime-specific estimates are simply obtained by computing the one-steap ahead predicted comoments under the ML

parameter estimates obtained in Table 4 and then classifying each sample period as a regime 1 period if the corresponding

smoothed (full-sample) probability of a regime 1 exceeds (or equals) 0.5, and as a regime 2 period otherwise. For completeness

and because these moments are also priced in our econometric framework, market variance is 11.6 in regime 1 (94.4 in regime

34

does generate rich dynamics in conditional comoments and therefore in the quantity of the risks represented

by the different comoments in regime 1. Correspondingly, we have obtained moderate but statistically

significant estimates of the unit risk premia in this state. On the contrary all conditional comoments are

high in absolute value in regime 2 (the average covariances are 20.58, 21.58, and -45.95 for SMB, HML,

and MOMO, respectively; the average coskewnesses are 168.0, 422.8, and -624.0; the average cokurtosis

coefficients are 9584, 20309, and -31645 for SMB, HML, and MOMO, respectively).39 This means that in

regime 2 all comoments jump to very high absolute value, while the estimated unit prices of risk decline

towards zero, to the point of commanding p-values in excess of 5%, with the exception of the coskewness

risk premium, which in fact remains high.40

Additional aid in interpreting the economic meaning of the two states we have specified comes from

Figure 3, where the (full-sample, ex-post) smoothed probabilities are plotted for each the unconstrained and

set 2-constrained estimates reported in Table 4. The first clear point that the figure allows us to make is that

imposing constraints on the EPK changes the inference on the state probabilities, although only by marginal

amounts. In practice, the correlation between unconstrained and constrained probabilities is very high (e.g.,

0.994 between unconstrained and constrained-set 2 smoothed probabilities of regime 2) and the only visible

difference is a tendency for the unconstrained probabilities to “wiggle” in a restricted range around the 0

and 1 bounds which by construction characterize a probability measure. Even though the figure shows state

probabilities for both regimes, comments are easier to express with reference to the second state. As per the

estimates of the transition probability matrix in Table 4, this regime tends to characterize approximately

24% of our sample (i.e., 234 months) and has an implied average duration of 4.5 months, i.e., once markets

enter this state, they tend to remain there between 4 and 5 months, which is hardly negligible. However, the

figure also clarifies that — apart from these average duration properties — at least four sample periods tends

to be characterized by a prevalence of the second regime: the Great Depression (1929-1935, when the state

2 probability exceeds 0.9 in 56 months out of 72), the period that leads up to WWII (1938-1942, when state

2 probability exceeds 0.9 in 17 months out of 48), the recession and first oil shock of the early and mid-1970s

(1970-1976, when state 2 probability exceeds 0.9 in 19 months out of 72), and the “dot-com” bubble and its

burst of the late 1990s and early part of the new millennium (1999-2002, when state 2 probability exceeds

0.9 in 31 months out of 48). Clearly all these periods correspond to stages of bubbling, hyper-active and

highly volatile US stock markets which always led to spectacular bubble bursting price action, ending in

protracted bear patches. For instance, focussing on the constrained parameter estimates of the last three

columns of Table 4, the model implies an annualized market volatility of 9.0% in regime 1, and of 87% in

2); the third central moment for the market is 15.0 in regime 1 (367.5 in regime 2); the fourth central moment for the market

is 434.8 in regime 1 (44238 in regime 2).39These co-skewness values are only apparently enormous. When scaled by the standard factor V ar[RW

t ] V ar[Rit], the

co-skewness coefficients are 0.27, 0.63, and -0.71, for SMB, HML, and MOMO, respectively. A similar caveat applies to the

co-kurtosis estimates reported in the main text: for instance, a co-kurtosis with the market of 20309 for HML implies in reality

a scaled (using the factor 3 V ar[RWt ] V ar[Ri

t]) co-kurtosis coefficient of 2.38 which is not at all exceptional.40Intuitively, while it is hard to perform effective back-of-the-envelope calculations for regime 1, it is obvious that only

coskewness dynamics may explain portfolio returns in regime 2. Given the negative and large coskewness unit price of risk,

it will be easy to explain the high MOMO returns in this state, as MOMO displays on average negative coskewness with the

market portfolio; however, it remains hard to explain SMB and HML returns as their coskewness coefficients are on average

positive (and this generates negative average returns). This is consistent with the magnitudes of the regime 2 alphas reported

in Table 4.

35

regime 2, almost ten times higher.41 Obviously, even outside the four periods of prevalence of regime 2 we

have just listed, this state also receives a high ex-post likelihood in many other but shorter sample periods

(e.g., the short 1980-1981 recession, the market correction of late 1987, and the early stages of the short

1990-1991 recession).42

In any event, historical memory serves us well in leading us to characterize the second regime as a bear

state of high volatility and low or negative excess market returns. Also the covariance, (co-) skewness, and

(co-) kurtosis of the equity portfolios under investigations dramatically differ across the two regimes. This

is confirmed by an analysis of the estimates obtained in Table 4.43 In state 2, the equity portfolio volatilities

of the shocks to returns are systematically higher than in regime 1: in annualized terms, these are 17.1%

vs. 7.1% for SMB, 23.5% vs. 6.1% for HML, and 28.3% vs. 6.1% for MOMO. Although these are not easily

transparent from the table, we also proceed to compute state 2 typical means and (total) volatilities for the

equity portfolio returns in regime 2. This is done using the parameter estimates in Table 4 and simulating

the estimated multivariate model in real time, to also compute the co-moments that enter the asset pricing

(conditional mean) equations. The moments are then computed by weighting the real-time co-moments with

the smoothed probabilities in Figure 3. In annualized terms, the market exhibits an average excess return of

-3.3% in state 2 vs. 10.5% in state 1, and a volatility of 33.7% in state 2 vs. 12.0% in state 1. These results

extend to the annualized total volatilities of SMB, HML, and MOMO in regime 1 vs. regime 2: 20.8% vs.

7.5%, 22.5% vs. 7.4%, and 31.3% vs. 8.0%, respectively. We have already reported state-specific estimates

of covariances, coskewnesses, and cokurtosis coefficients computed vs. the market portfolio and verified that

they are considerably higher in state 2 than in state 1; in particular, all of these co-moments are higher

(lower) in state 2 than in state 1 for SMB and HML (MOMO); also in the case of the market while (scaled)

skewness is roughly constant across the two regimes (0.40 in state 2 vs. 0.36 in state 1), (scaled) kurtosis

is 4.97 in state 2 vs. 3.04 in state 1.44 Naturally, as also stressed by Figure 3, regime 1 is the complement

of regime 2 and as such it can be best characterized as a bull state of low volatility and high and positive

excess market returns; covariances, (co-) skewness, and (co-) kurtosis coefficients of the equity portfolios

tend to be lower (in absolute value) than in the bear state.

There is one final dimension along which the two regimes appear to differ: the implied strength of the

predictability patters from the selected instruments to excess market returns as well as the time series dy-

41Nonetheless, one should bear in mind that the average duration of regime 2 does not exceed 5 months, and that these

annualized estimates are obtained assuming independence of the states over time, which is clearly not the case in Table 4.

In practice, a regime classification-based estimate of the annualized market volatility yields a 12.0% in regime 1 vs. 33.7%

in regime 2. Even the latter, high volatility estimate is a rather plausible one: for instance, between 1929 and 1934, the US

value-weighted portfolio has displayed an annualized volatility of 41.7%.42Interestingly, the model signals an increasing probability of having entered a state 2 period around the end of 2007, which

is rather plausible given the acute market crises in the Summer and Fall of 2008 (a period not included in our sample).43Also in this case, it appears more meaningful to focus on the constrained estimates.44As a result of the heterogeneous regime properties of co-skewness and co-kurtosis, the two-state Markov switching model

fails to have stark predictions for expected returns on the three long-short portfolios. In the case of MOMO, the mean return

is considerably lower in state 2 (-5.0%) than in state 1 (12.9%); for SMB the mean return is similar and anyway modest across

the two states (0.7% in state 2 and 2.2% in state 1); for HML the mean return is higher in state 2 (20.1%) than in state 1

(2.4%). In this sense, while high momentum returns appear to be a regime 1 phenomenon and as such tend to characterize

more than 2/3 of the sample, the high value returns are a regime 2 phenomenon. However, the spread across regimes for HML

(20.1− 2.4 = 17.7) and momentum-sorted portfolios (12.9− (−5.0) = 17.9) are similar.

36

namics followed by the predictors themselves. Table 4 shows that while in regime 1 none of the variables in

zt predicts subsequent market excess returns (there is a partial exception for the dividend yield in regime 1,

but the corresponding p-value is around 0.05), in regime 2 all the relevant coefficients are larger, econom-

ically relevant, and at least the coefficient of the dividend yield is relatively large (1.59) and statistically

significant.45 Additionally, while all the predictor variables are highly persistent in regime 1, in regime 2 the

dividend yield and the term spread only exhibit intermediate persistency, while the default spread remains

highly persistent. Clearly, these differences are then reflected in the pricing performance of the model and in

the implied R2s. Interestingly, the presence of a few coefficients illustrating “direct” predictability of excess

market returns from the predictors in zt may be constructed as a disappointing result that should lead to a

rejection of our asset pricing model, because it implies that it is not only conditional co-moments that drive

risk premia in our framework. We return to this perspective in Section 6.

In asset pricing terms, these characterizations for the two regimes estimated under (8)-(9) make intuitive

sense. In the stable regime 1, comoments are low in absolute value but display a rich dynamics and as such

they are priced, as revealed by Table 4. In the volatile regime 2, comoments are higher and the pricing

function is dominated by the coskewness term Cov[Rit+1, (R

Wt+1)

2|Ft], i.e., by the ability of equity portfolios

to provide a hedge against the volatility of the wealth process, because only the parameter λ32 is statistically

significant and large in an economic sense. This is consistent with the fact that it is volatility that dominates

the second regime, and the most appreciable property of any asset in that state is the ability to compensate

the higher variance with adequate returns.

The lower portion of Table 4 reports estimates of the transition matrix characterizing the Markov chain

that drives the switching behavior in the pricing kernel. While the estimate of Pr(St = 1| St−1 = 1), the

“stayer” probability of regime 1, is relatively stable as increasingly stringent constraints are imposed, the

estimate of Pr(St = 2| St−1 = 2) increases from 0.72 to 0.78. Correspondingly, regime 2 becomes more and

more persistent, with the estimate of Pr(St = 2| St−1 = 2) that equals 0.72 implying an average duration of3.6 months and an estimate of Pr(St = 2| St−1 = 2) of 0.78 an average duration of 4.6 months. The set 2-constrained estimates of P means that the long-run, ergodic probabilities of regimes 1 and 2 are, respectively

0.73 and 0.27. The maximized log-likelihood function of the two-state MS model is considerably higher than

the single state benchmark in Section 5.1, -9598 vs. -10339. Correspondingly, all information criteria decline

from 21-22 in Table 2 to 19-20 in Table 4. This means that even when the log-likelihood improvement is

penalized by the considerable increase in the number of estimated parameters (from 55 to 92, a difference

of 37 additional parameters to be estimated), the superior performance of the two-state model is difficult to

refute. Along the same lines, because the single-state and the two-state Markov switching models are not

simply nested, it is also sensible to proceed to a likelihood ratio test, which is essentially a way to trade-off

the maximized log-likelihood improvement caused by the flexibility and better fit of the two-state model

and the higher parsimony of the single-state model:

LR = 2[−9598.04− (10339.15)] = 1482.22 a∼ χ237.

45As for the economic “significance” of the coefficients, in regime 2 a one standard deviation increase in the dividend yield

causes an increase in excess market returns of 2.48% (vs. 0.14% in regime 1), a one standard deviation increase in the term

spread causes an increase in excess market returns of 0.69% (vs. -0.01% in regime 1), and one standard deviation increase in

the default spread causes a decrease in excess market returns of -0.40% (vs. +0.16% in regime 1).

37

Under a χ237, a LR statistic of 1482.22 implies a p-value which is essentially zero, a sign of a strong rejection

of the single-state model in favor of the two-state one. As already noticed in Table 2, when all constraints

are imposed on the estimation of the kernel, the maximized log-likelihood declines somewhat, to -10140.

However this values remains critically higher than the corresponding log-likelihood when the single-state

model had been estimated under all the sensible constraints, -11834. The information criteria in Table 4

— between 21 and 22 — all signal that the two-state model ought to be preferred to the single-state one,

for which the information criteria all fell between 24 and 25 in Table 2. Ignoring for simplicity potential

complications for the structure of the asymptotic distribution that may be caused by the fact that we are

comparing constrained MLE estimates, a likelihood ratio test produce a test statistic in excess of 3388 which

is clearly highly statistically significant.

Finally, Table 3 gives account on the comparative in-sample pricing performance of the two-state, four-

moment CAPM model. For the sake of brevity, we only comment on the unconstrained and set 2-constrained

estimates. The top panel of the table clearly shows that the unconstrained two-state model provides by and

large an improvement over the single-state benchmark. All of the pseudo-R2 improve and reach levels of

3.5-6.6% which start being far from negligible for the cross section of US equity returns. Correspondingly,

the RMSPEs generally decline (but excess market returns are an exception), with the most substantial

improvement characterizing the fit to MOMO returns (with a decline in excess of 11%). Interestingly, most

of this decline comes from a reduction of the variance of the pricing errors. When the EPK is constrained to

be compatible with standard properties of the intertemporal rate of marginal substitution for a risk-averse

investor, the improvement is uniformly visible only one portfolio out of four. Oddly — even though R2 and

RMSPEs do measure different aspects of the notion of “fit” — while the R2 improvements concern MOMO

and the market portfolio, the RMSPE improvement involve SMB and MOMO. Interestingly, fitting HML

returns using the two-state model proves rather difficult, also because a substantially negative average pricing

error (bias) of almost -1% appears, i.e., the two-state model systematically over-predicts value minus growth

return spreads. It is also remarkable that a similarly large bias appears for excess market returns (-1.8%

per month), even though the final RMSPE is actually lower than under the single-state smooth benchmark,

thanks to the fact that the Markov switching errors greatly reduces the variance of the pricing errors, i.e.,

the regime switching models prices “worse” on average, but much more consistently, avoiding to produce

huge and aberrant pricing errors. More generally, the resulting pseudo R2 are much less impressive than in

the first panel of the table, achieving levels of 3.0-3.7% only. This implies that a substantial portion of the

explanatory power of a Markov-switching, four-moment CAPM is effectively lost when economic constraints

are imposed to ensure its admissibility. It remains to be seen whether also a Markov-switching mixture

CAPM may be subject to the same effect.

5.3. Mixture CAPM

Table 5 presents the empirical estimates of the three-state mixture CAPM model (10).46 Notice immediately

that even though the mixture CAPM is logically simpler and obtained as a “restriction” imposed on the

completely flexible Markov switching four-moment CAPM, being based on a K = 3-state specification (10)

46For additional clarity and to save space, in what follows we denote as “dCAPM” the downside covariance CAPM and as

“4MOM” the four-moment CAPM.

38

does imply a higher number of parameters to be estimated, 138 vs. 92 in the two-state case of (8)-(9).47 In

the table, a portion of the information criteria — essentially the Akaike and Hannan-Quinn criteria — point

to the fact that the three-state mixture CAPM may be preferred to the two-state four-moment model of

Section 5.2 and, a fortiori, to the single-state smooth benchmark of Section 5.1, and this in spite of the

fact that the mixture model implies a higher number of parameters to be estimated. For instance, under

the constraint set 2, the AIC criterion declines from 20.99 to 20.81 (it was 24.39 in the single-state case)

and the H-Q criterion from 21.16 to 21.07 (it was 24.49 in the single-state model). The indications given by

the Bayesian information criterion (BIC) are instead harder to read, as the BIC decreases when going from

Table 4 to Table 5 only under the constraint set 1 (from 22.40 to 22.10).

Table 5 presents empirical estimates. Similarly to Tables 2 and 4, we omit to report estimates of the

(regime-dependent) covariances between shocks to the predictors and shocks to portfolio returns, as well as

the (regime-dependent) covariance matrix of shocks to the predictors within the assumed VAR(1) framework.

To save additional space, in Table 5 we also omit to report estimates of the MS VAR(1) coefficients of the

predictors.48 Clearly, the three-state mixture CAPM model achieves the goal of leading the estimates of

the average abnormal returns not justified by risk exposure towards zero and certainly to levels of weak

statistical significance. In Table 5, we obtain 3 different estimates for each of the long-short portfolio alphas,

one in correspondence to each of the three pre-assigned asset pricing regimes, e.g., αHMLCAPM , αHML

down−CAPM ,

and αHML4mom−CAPM .We immediately notice that the toughest alpha to “send” to zero (at least, in a statistical

sense) is MOMO’s alpha, especially in correspondence to the first, CAPM-driven regime. In fact, in the

unrestricted case, αMOMOCAPM = 0.90 (which corresponds to a 10.8% per year) and commands a p-value close

to 0.01. However, when the full set of economic restrictions are imposed on the EPK, we find that all the

alphas stop being statistically significant, including αMOMOCAPM . Interestingly, the alphas remain rather large

in absolute value in correspondence to third, 4-moment CAPM regime (the alphas range between -2.1% to

2.8% per month), but they command such high standard errors that the associated p-values are all very

high. Economically, this means that even though the modeled risk factors fail to lead the average abnormal

returns to zero in a mathematical sense, they do in a statistical sense as the variation in the sample returns

associated to the third state is sufficiently large to drive the classical 95% confidence intervals around the

estimated αSMB4mom−CAPM , αHML

4mom−CAPM , and αMOMO4mom−CAPM to always include a zero abnormal return. This

finding has key economic implications because in the absence of a rational explanation (see Fama and

French, 1993, who interpret SMB and HML as priced risk factors in a intertemporal CAPM framework,

and Carhart, 1997, who adds MOMO has a fourth, priced factor), the conclusion of Lakonishok, Shleifer,

and Vishny (1994) that the asset pricing anomalies (in particular, value and momentum premia) would be

caused by overraction-fueled irrational misspricing hold. On the contrary, the ability to isolate one EPK

that — especially under economically meaningful constraints — prices the US cross-section without generating

large and statistically significant abnormal returns (alphas) is consistent with rational explanations. We also

47In any event, let us remark that 138 parameters are not “too many” (apart from numerical considerations), as with

975 × (N + M) = 6, 825 observations, this implies a saturation ratio (the ratio between total number of observations and

number of uknown parameters to be estimated) of almost 50, which is well in excess of the lower bound of approximately 20

normally advised in the non-linear time series literature.48These coefficients (under three alternative sets of restrictions on the estimation program) are available from the Author

upon request.

39

notice that this type of evidence as well as its “progression” across restriction sets is very similar to what

we had documented in Table 4 for the two-state MS CAPM model.

Imposing economic constraints on the estimation of the mixture CAPM appears to generate interesting

payoffs also for the precision with which it is possible to estimate the prices of risk in the model. While

in the unrestricted case, the evidence on the EPK g coefficients is actually a bit puzzling, in the sense

that the statistically significant coefficients raise doubts on the possibility to attribute a compelling asset

pricing interpretation to the three regimes (e.g., the covariance risk coefficient fails to be significant in the

CAPM state, a regime that ought to be only characterized by the pricing of covariance risk), as additional

economic restrictions on the coefficient themselves and the overall shape of the pricing kernel are imposed,

the fraction of 95% classic confidence intervals for the g coefficients that fail to include the zero increases.

The last three columns of Table 5 show that the constant coefficient g0 is significant in all three states,

that λ−2,down−CAPM = −Rf

t g12 = 0.2230 with a p-value between 0.05 and 0.06, and that in the third,

four-moment CAPM regime all the unit risk premia are statistically significant with p-values of 0.01 or

lower (λ2,4mom−CAPM = −Rft g13 = 0.0120, λ3,4mom−CAPM = −Rf

t g23 = −0.0075, and λ4,4mom−CAPM =

−Rft g33 = 0.0065). These results at least partially validate the goodness of the a priori identification of the

three statistical regimes with distinct asset pricing states, in the sense that the second regime produces a

(borderline) statistically significant and positive premium on downside covariance risk, while the third regime

sees all three conditional (symmetric) comoments prices and statistically significant, with a negative risk

premium on coskewness risk.49 In any event, the joint null hypothesis that g11 = g12 = g012 = g13 = g23 = g33

is always rejected; with a p-value between 0.01 and 0.05 in the unrestricted case, and with p-values close

to zero when economic restrictions are imposed. This makes sense because constraining the signs of the g

coefficients to be consistent with non-satiation and (increasing) absolute risk aversion has the predictable

effect of reducing most of the standard errors associated with the estimation.

As customary at this point, additional help in interpreting the economic meaning of the three states we

have specified comes from Figure 4, where the (full-sample, ex-post) smoothed probabilities are plotted for

each the unconstrained and set 2-constrained estimates reported in Table 5. Also in this case, the Figure

shows that imposing constraints in the estimation implies modest changes in the smoothed full-sample

probabilities. Although they must be interpreted with caution because bounded to the [0, 1] interval, the

correlations between CAPM smoothed probabilities across unconstrained and constrained estimates is 0.96,

and the correlation for dCAPM probabilities is 0.85. Because its occurrence is less frequent, the analog

pairwise probability is only 0.71 for the 4MOM CAPM probabilities. In practice Figure 5 shows that while

an unconstrained model would imply several entries and exits between the downside and 4MOMCAPMs over

the periods 1929-1935, and then again in 2000-2001, this is not the case for the constrained smoothed regime

probabilities that instead simply illustrate that both these periods are characterized as 4MOM intervals.

The first panel of Figure 4 makes it obvious that according to the estimates in Table 5, the US stock market

has spent a large proportion of the period 1927-2008 in a plain vanilla CAPM. In fact, the estimates reveal

that when constraints are imposed, the CAPM regime has a duration of approximately 14 months and

49It remains somewhat problematic the finding that λ2,CAPM = −Rft g11 = 0.0206, which is not statistically significant in any

sense (its p-value is 0.22). This means that the first regime is pre-assigned to be a CAPM regime in which only covariance risk

is priced, but our restricted estimates reveal that the resulting estimate for the price of risk is not statistically positive.

40

characterizes 44.1% of the data.50 Long spans of data are captured by the properties of the CAPM state

in which only covariance risk is priced, such as most of the 1940s, 1950s and 1960s, the period 1991-1996,

and more recently most of the bull markets that have occurred between 2003 and 2006. Figure 5 displays

the same smooth probabilities as in Figure 4, but it is limited to the 1985-2008 sub-sample. The point of

these plots is to show the remarkable persistence of the regimes isolated, and in particular of the CAPM

state. Visibly, long stretches of time would have been characterized by the CAPM state in the past 23

years, such as 1985-1986, 1988-1990, 1993-1996, 1998, and the recent up-turn in stock prices of the interval

2004-2006. The horizontal arrows are used to stress the length of these periods of CAPM pricing. It seems

plausible that long patches of time be expression of simple frameworks in which only covariance risk is priced,

which is the key intuition of the classical CAPM. Additionally, it is easy to verify that the CAPM regime

typically features high mean excess market returns and high returns on the remaining stock portfolios under

consideration, accompanied by moderate volatility. For instance, focussing on the constrained parameter

estimates of the last three columns of Table 5, the model implies an annualized market risk premium in the

CAPM state of 10.1% with volatility of 10.4% (i.e., this is an approximate market Sharpe ratio of 0.28 per

month).

The intermediate panels in Figures 4 and 5 shows that the dCAPM regime — when covariance and down-

side covariance risks are differently priced — is scarcely persistent (its average duration is approximately 7

months) but because of the structure of the estimated transition matrix — in particular, the fact that Pr(St =

dCAPM| St−1 = CAPM) is estimated at 0.086 and that the estimate of Pr(St = dCAPM|St−1 = 4MOM) is0.110, i.e., estimates of the transition probabilities that make the switch to the dCAPM regime from both

the CAPM and the 4MOM regimes quite likely — occurs with a remarkable frequency, characterizing 39.4%

of our sample. Needless to say, for many practical applications, to know that a dCAPM pricing regime

has an average duration of 7 months is far from negligible. The moderate persistence associated with this

regime and its frequent occurrence make it hard to eyeball specific time periods in which US stock returns

were predominantly generated by the dCAPM, even though “spikes” of this state are clearly visible in corre-

spondence of 1929-1930, 1935-1938, 1941-1945, the late 1960s and late 1970s, 1998-1999, and more recently

2002-2003. Interestingly, from mid-2007, in correspondence of a deep situation of financial turmoil, the US

equity markets switches from the CAPM to the dCAPM regime, with an unsettling similarity to the onset

of the Great Depression in 1929. Because many of these periods correspond to bear and volatile markets,

the dCAPM regime generates an annualized market risk premium of 3.1% with a volatility of 21.6% (for

a low market Sharpe ratio of 0.04 per month); while the market risk premium is roughly half its historical

long-term mean, the volatility is slightly higher than its long-run estimate.

The bottom panels of Figures 4 and 5 stress instead the fact that the 4MOM asset pricing regime

occurs rather infrequently but — when this happens — the state is considerably persistent, with an average

duration of 9 months. However, in terms of ergodic probability, only 16.6% of the any long sample should be

generated by the 4MOM state, which in our case means 161 months out of 975. Figure 4 clearly shows that

most of these 161 months in our sample can be identified with 1931-1934, 1973, and 2000-2001, for a total

of approximately 84 months.51 Of course, there are many additional probability spikes of the smoothed

50This is the long-run, ergodic probability of the CAPM regime implied by the estimated transition matrix under the full set

of constraints.51It may appear counter-intuitive that a regime with low ergodic probability may be rather persistent. This derives from the

41

probability of this regime, for instance in early 1975 and late 1991. The 4MOM regime is completely

characterized by the presence of high volatility. The constrained estimates in Table 5 imply an annualized

market risk premium of 6.2% (which is close to the historical mean over the full sample) and a stunning

volatility of 47.8%.52 As a result, the corresponding Sharpe ratio is 0.037, which turns out to be even

lower than the reward-to-risk ratio estimated for the dCAPM state. One may also notice that — apart from

the structure we have imposed ex-ante when specifying this mixture model — the three regimes possess a

rather clean ex-post statistical configuration: the CAPM state is a bull regime in which volatility is low; the

dCAPM state is a bear regime in which volatility takes intermediate values; the 4MOM state is dominated

by extraordinarily high volatility, even though the implied risk premium is close to the historical average.

Table 5 also shows the existence of structural differences in predictability patterns (for excess market

returns) across the three regimes and these patterns change as additional economic constraints are imposed.

Under the full set of restrictions, only the CAPM state implies accurately estimated predictability effects,

with higher dividend yields and term spreads in the risk-free yield curve forecasting higher future market

risk premia, and a higher default spread in the private corporate bond market forecasting lower future

risk premia. All these effects are sensible and the corresponding coefficients have the signs that one would

expect in the light of the existing literature. On the contrary, the dCAPM and 4MOM regimes imply very

weak evidence of predictability, with the only exception of the default spread which forecasts higher risk

premia in the 4MOM regime. Even though these differences across asset pricing regimes appear in all the

models we have estimated, the opposition between CAPM and other regimes becomes starker when economic

constraints are imposed in the estimation.53

Table 3 offers evidence of the in-sample pricing performance of the three-state mixture CAPM. Despite

the restrictions that define the mixture model in terms of imposing a characterization of the regimes based on

the asset pricing framework generating returns, the additional flexibility offered by a three-state specification

offers obvious payoffs tat are especially visible in the absence of restrictions: all the pseudo-R2 increase when

moving from the two-state to the three-state models and the same holds — only with the exception of MOMO

returns, for which the RMSPE increases from 3.41 to 3.49 percent — for the RMSPE which declines. In

fact, the mixture CAPM exceeds the R2 levels typical of a simple, unconditional CAPM and substantially

reduces the RMSPE. When the complete set of restrictions is imposed, a similar result obtains, with the

pseudo-R2 now in the range 3.5-6.1%.54

fact that we are estimating a three-state Markov chain: Pr(St = 4MOM|St−1 = 4MOM) is estimated at 0.889; the low ergodic

frequency of the state derives instead from the fact that Pr(St = 4MOM|St−1 = CAPM) and Pr(St = 4MOM|St−1 = dCAPM)are estimated to equal 0.001 and 0.048, respectively, which are rather low values. More generally, it is of some interest to notice

that the estimated transition matrix has a “band-diagonal” structure in which — to an approximation — the markets can only

leave the CAPM to a dCAPM pricing regime, and the 4MOM state to a dCAPM regime. This means that transitions from

the first, low volatility to the third, high-volatility state (and viceversa) will mostly occur going through the second, low risk

premia and moderate volatility regime.52Also in this case one should bear in mind that the average duration of this regime 2 is only 9 months and that the regime

occurs rather infrequently.53As for the economic “significance” of the coefficients, in regime 2 a one standard deviation increase in the dividend yield

causes an increase in excess market returns of 2.48% (vs. 0.14% in regime 1), a one standard deviation increase in the term

spread causes an increase in excess market returns of 0.69% (vs. -0.01% in regime 1), and one standard deviation increase in

the default spread causes a decrease in excess market returns of -0.40% (vs. +0.16% in regime 1).54Also in this case, MOMO produces a RMSE that increases from 3.91 to 3.99 percent when going from two- to three-states.

42

6. Conclusion

This paper has shown that — even when sensible economic restrictions such as non-satiation and global

risk aversion are imposed in the estimation routines — a flexible EPK can be found that prices the US

cross-section of equity returns — as represented by the value-weighted market portfolio and by three long-

short portfolios representing size, value, and momentum sorting — without leaving high and statistically

significant average abnormal returns (alphas) on the table. The existence of the pricing kernel makes the

dynamic properties of cross-sectional US stock returns consistent with rational asset pricing, because the

possible presence of arbitrage opportunities (i.e., the possibility for the estimated EPK to turn negative)

and of risk-loving pricing (even locally, i.e., the possibility for the EPK to turn increasing over the wealth

domain) are explicitly ruled out. The flexibility of the EPK derives from our choice to model regime switches

across different asset pricing frameworks, where the switches are governed by a simple (yet, latent) first-

order Markov chain. In particular, we have found evidence that when the MS EPK is restricted to have

regimes identified ex-ante with distinct asset pricing frameworks — in the sense that each regime corresponds

to periods in which one and only type of CAPM applies from the set standard CAPM, dCAPM, and 4MOM

CAPM — its performance is particularly satisfactory and in some dimensions superior to what could be

otherwise attained by either modelling a single-state in which risk premia change as a function of standard

macro-style predictors (dividend yield, term spread, default spread) similarly to the conditional (4MOM)

CAPMs in Dittmar (2002) and Post and Vliet (2005) or by adopting a 4MOM MS pricing framework along

the lines of Guidolin and Timmermann (2008a).

In an empirical perspective, we have reported to main findings. First, a mixture CAPM obtained from

the MS mixing of CAPM, dCAPM, and 4MOM CAPM delivers alphas which are relatively small and that

especially fail to be statistically significant. This means that a mixture of CAPM that is consistent with

the standard properties of investors’ rationality (i.e., with the typical features of economically admissible

intertemporal marginal rates of substitution) appears to bar the existence of average abnormal returns in

excess to what is justified from genuine risk-taking activities typical of rational investing decisions. In

short: the right econometric model that mixes yet simple asset pricing frameworks yields the sensible

conclusions that there are no free lunches out there, in the markets. This result is consistent with earlier

literature based on different econometric frameworks, for instance Ang and Chen’s (2007) result that when

the existence of correlation between conditional betas are with market risk premia is taken into account, the

Bayesian posterior for size and value alphas is usually concentrated around zero. However, it is interesting to

notice that our zero-alpha result fully obtains also with reference to momentum portfolio strategies, whose

rationalization has proven more elusive in the literature. Second, even though the use of simple CAPM

mixtures — in which only functions of gross returns on aggregate wealth can serve as state variables affecting

the EPK — poses obvious limitations to the ambitions of such exercises, we have reported that the estimated

mixtures produce an interesting OOS pricing performance, in pseudo OOS schemes in which the benchmark

portfolios (market, SMB, HML, and MOMO) are priced 1- and 12-month ahead as well as in genuine OOS

exercises in which portfolios (industry- and value/size-sorted) not used in estimation are priced 1- and 12-

month ahead. In particular, the mixture CAPM model seems to systematically outperform the single-state

smooth benchmark for a majority of the portfolios we have experimented with. This means that the models

Under the first set of constraints, we observe mixed results, with half of the R2s and RMSEs failing to improve.

43

proposed in this paper may also have some further potential for their practical applications.

As it is usual, there are a number of unanswered questions that our research design has left open. For

instance, even though in a MS perspective, the persistence and duration properties of the asset pricing

regimes isolated in Section 5.3 do seem appealing, it remains unclear to what these shifts in (rational)

asset pricing “paradigms” may correspond, not to mention that it remains hard to identify any micro-

founded structural economic model that may explain what factors or events may even cause these shifts.55

Apart from trying and imposing additional structure which might lead to deeper insights on the causes and

nature of the regime shifts in pricing frameworks, our paper could be extended in a number of additional

directions. Our pricing kernel is a function only of the return on aggregate wealth. However, several recent

papers have shown that the specification of aggregate wealth impacts the conclusions of empirical asset

pricing studies. Consequently, we specify the priced factor as a function of both the return on equity and

the return on human capital. Dittmar (2002) incorporates human capital, since recent evidence (Campbell,

1996, Jagannathan and Wang, 1996) suggests that the incorporation of human capital into the pricing

kernel substantially improves the performance of the conditional CAPM. However, in both of these studies,

the return on human capital impacts the cross section of returns linearly. The evidence in Dittmar (2002)

suggests that this linear impact is not sufficient to explain cross-sectional variation in returns. Rather, it is a

nonlinear function of the return on human capital that improves the performance of the model. It would be

interesting to also augment the definition of wealth returns in our framework to include returns on human

capital. It would also be interesting to examine the relation of the estimated EPKs to the volatility bounds

of Hansen and Jagannathan (1997). The bounds represent the minimum volatility that a pricing kernel

must exhibit, given its mean, to be admissible. In this respect, the bounds depict the set of admissible

pricing kernels in the mean-standard deviation space. Dittmar (2002) finds that a cubic pricing kernel

(incorporating returns on human capital) is actually able to generate sufficient volatility to be inside the

Hansen—Jagannathan bounds, but its mean is slightly too high for the pricing kernel to actually lie within

the bounds. Finally, Aretz, Bartram, and Pope (2007) use multivariate GMM models to show that book-to-

market, size, and momentum capture cross-sectional variation in exposures to a broad set of macroeconomic

factors. The performance of their pricing model based on the macroeconomic factors is comparable to the

performance of the Fama and French (1993) model, even though the momentum factor is found to contain

incremental information for asset pricing, consistently with Carhart (1997). However, as discussed in Aretz,

Bartram, and Pope (2007), also these macro-driven models are subject to remarkable instability as a number

of relationships between risk factors and macroeconomic variables do seem to change sign as analysis and

data are updated (e.g., the relation between HML and changes in economic growth expectations), or become

insignificant (e.g., the relation between HML and default risk). As typical in the empirical finance literature,

this may be due to the presence of regimes and/or structural breaks in structural asset pricing relationships.

It may be interesting to explore the notion of MS-driven mixtures as a way to capture, exploit, and forecast

these tendency for macro factors to generate elusive pricing implications over time.

55In other words, these comments imply that the asset pricing regimes identified in the mixture CAPM framework may

be predictable (hence, useful) in a statistical sense, although a lack of understanding for the economic causes underlying the

switches may impose an important upper bound on how much predictability may be uncovered and exploited in practice.

44

References

[1] Aretz, K., S., Bartram, and P., Pope, 2007, Macroeconomic risks and characteristic—based factor models.

Lancaster University Working Paper.

[2] Ang, A., and J., Chen, 2002, Asymmetric correlations of equity portfolios. Journal of Financial Eco-

nomics 63, 443—494.

[3] Ang, A. and J., Chen, 2007, CAPM over the long run: 1926-2001. Journal of Empirical Finance 14,

1-40.

[4] Ang, A., J., Chen, and Y., Xing, 2006, Downside risk. Review of Financial Studies 19, 1191-1239.

[5] Barberis, N., and M., Huang, 2001, Mental accounting, loss aversion, and individual stock returns.

Journal of Finance 56, 1247-1292.

[6] Barone-Adesi, G., 1985, Arbitrage equilibrium with skewed assets. Journal of Financial and Quantita-

tive Analysis 20, 299-313.

[7] Bawa, V. , and E., Lindenberg, 1977, Capital market equilibrium in a mean-lower partial moment

framework. Journal of Financial Economics 5, 189-200.

[8] Brandt, M., and Q., Kang, 2004, On the relationship between the conditional mean and volatility of

stock returns: a latent VAR approach. Journal of Financial Economics 72, 217-257.

[9] Brown, D., and M., Gibbons, 1985, A simple econometric approach for utility-based asset pricing

models. Journal of Finance 40, 359-381.

[10] Campbell, J., M., Lettau, B., Malkiel, and Y., Xu, 2001, Have individual stocks become more volatile?

An empirical exploration of idiosyncratic risk. Journal of Finance 56, 1-44.

[11] Campbell, J., Y., Chan, and L., Viceira, 2003, A multivariate model of strategic asset allocation.

Journal of Financial Economics 67, 41-80.

[12] Campbell, J., and T., Vuolteenaho, 2004, Bad beta, good beta. American Economic Review 94, 1249-

1275.

[13] Carhart, M., 1997, On persistence in mutual fund performance. Journal of Finance 51, 1681-1714.

[14] Chan, K., 1988, On the contrarian investment strategy. Journal of Business 61, 147-163.

[15] Chan, K., and N.-F., Chen, 1988, An unconditional asset-pricing test and the role of firm size as an

instrumental variable for risk. Journal of Finance 43, 309-325.

[16] Chen, J., H., Hong, and J., Stein, 2001, Forecasting crashes: trading volume, past returns and condi-

tional skewness in stock prices. Journal of Financial Economics 61, 345-381.

[17] Cochrane, J., 1996, A cross-sectional test of an investment-based asset pricing model. Journal of Political

Economy 104, 572-621.

45

[18] Dahlquist, M., and P., Soderlind, 1999, Evaluating portfolio performance with stochastic discount

factors. Journal of Business 72, 347-383.

[19] De Bondt, W., and R., Thaler, 1986, Further evidence on investor overreaction and stock market

seasonality. Journal of Finance 42, 557-581

[20] Dittmar, R., 2002, Nonlinear pricing kernels, kurtosis preference, and evidence from the cross-section

of equity returns. Journal of Finance 57, 369-403.

[21] Dumas, B., and B., Solnik, 1995, The world price of foreign exchange risk. Journal of Finance 50,

445-479.

[22] Fama, E., and K., French, 1989, Business conditions and expected returns on stocks and bonds. Journal

of Financial Economics 29, 23-49.

[23] Fama, E., and K., French, 1993, Common risk factors in the returns of stocks and bonds. Journal of

Financial Economics 33, 3-56.

[24] Fama, E., and K., French, 1996, Multifactor explanations of asset pricing anomalies. Journal of Finance

51, 55—84.

[25] Fama, E., and K., French 1997, Industry costs of equity. Journal of Financial Economics 43, 153-193.

[26] Fama, E., and K., French, 2006, The value premium and the CAPM. Journal of Finance 61, 2163-2186.

[27] Fama, E. and J., MacBeth, 1973, Risk, return and equilibrium: empirical tests. Journal of Political

Economy 71, 607-636.

[28] Ferson, W, 1990, Are the latent variables in time-varying expected returns compensation for consump-

tion risk? Journal of Finance 45, 397-429.

[29] Ferson, W., and C., Harvey, 1991, The variation of economic risk premiums. Journal of Political Econ-

omy 99, 385-415.

[30] Ferson, W., and C., Harvey, 1993, The risk and predictability of international equity returns. Review

of Financial Studies 6, 527-566.

[31] Ferson, W., and C. Harvey, 1999, Conditioning variables and the cross-section of stock returns. Journal

of Finance 54, 1325-1360.

[32] Friend, I., and R., Westerfield, 1980, Co-skewness and capital asset pricing. Journal of Finance 35,

897-913.

[33] Griffin, J., X., Ji, and S., Martin, 2003, Momentum investing and business cycle risk: evidence from

pole to pole. Journal of Finance 58, 2515-2547.

[34] Guidolin, M. and A., Timmermann, 2004, Value at risk and expected shortfall under regime switching.

Working paper, University of Virginia and UCSD.

46

[35] Guidolin, M., and A., Timmermann, 2006, An econometric model of nonlinear dynamics in the joint

distribution of stock and bond returns. Journal of Applied Econometrics 21, 1-22.

[36] Guidolin, M., and A., Timmermann, 2008a, International asset allocation under regime switching, skew

and kurtosis preferences. Review of Financial Studies 21, 889-935.

[37] Guidolin, M., and A., Timmermann, 2008b, Size and value anomalies under regime shifts. Journal of

Financial Econometrics 6, 1-48.

[38] Gul, F., 1991, A theory of disappointment aversion. Econometrica 59, 667-686.

[39] Hansen, L. P. and R., Jagannathan, 1997, Assessing specification errors in stochastic discount factor

models. Journal of Finance 52, 557-590.

[40] Harlow, W., and R., Rao, 1989, Asset pricing in a generalized mean-lower partial moment framework:

theory and evidence. Journal of Financial and Quantitative Analysis 24, 285-311.

[41] Harrison, M., and D., Kreps, 1979, Martingales and arbitrage in multiperiod securities markets. Journal

of Economic Theory 20, 381-408.

[42] Harvey, C., 1989, Time-varying conditional covariances in tests of asset pricing models. Journal of

Financial Economics 24, 289-317.

[43] Harvey, C., 2001, The specification of conditional expectations. Journal of Empirical Finance 8, 573-

638.

[44] Harvey, C., and A., Siddique, 2000, Conditional skewness in asset pricing tests. Journal of Finance 55,

1263-1295.

[45] Jagannathan, R., and Z., Wang, 1996, The conditional CAPM and the cross-section of expected returns.


[46] Kraus, A., and R., Litzenberger, 1976, Skewness preference and the valuation of risk assets. Journal of

Finance 31, 1085-1100.

[47] Kyle, A., andW., Xiong, 2001, Contagion as wealth effect of financial intermediaries, Journal of Finance

56, 1401-1440.

[48] Lewellen, J., and S., Nagel, 2006, The conditional CAPM does not explain asset-pricing anomalies.

Journal of Financial Economics 82, 289-314.

[49] Loughran, T., 1997, Book-to-market across firm size, exchange, and seasonality: is there an effect?

Journal of Financial and Quantitative Analysis 32, 249-268.

[50] Markowitz, H., 1959, Portfolio Selection. Yale University Press, New Haven, CT.

[51] Merton, R., 1973, An intertemporal capital asset pricing model. Econometrica 41, 867-887.

47

[52] Petkova, R., and L., Zhang, 2005, Is value riskier than growth? Journal of Financial Economics 78,

187-202.

[53] Post, T., 2003, Empirical tests for stochastic dominance efficiency. Journal of Finance 58, 1905-1931.

[54] Post, T., and H., Levy, 2005, Does risk seeking drive stock prices? Review of Financial Studies 18,

925-953.

[55] Post, T. and P., van Vliet, 2005, Conditional downside risk and the CAPM. Research Paper ERS-2004-

048-F&A Revision, Erasmus Research Institute of Management.

[56] Post, T., and P., van Vliet, 2006, Downside risk and asset pricing. Journal of Banking and Finance 30,

823-849.

[57] Roy, A., 1952, “Safety first and the holding of assets. Econometrica 20, 431-449.

[58] Scott, R. and P., Horvath, 1980, On the direction of preference for moments of higher order than the

variance. Journal of Finance 35, 915-919.

[59] Sears, R., and K.-C., Wei, 1985, Asset pricing, higher moments and the market risk premium: a note.


[60] Shanken, J., 1990, Intertemporal asset pricing: an empirical investigation. Journal of Econometrics 45,

99-120.

[61] Shumway, T., 1997, Explaining returns with loss aversion. Working paper, University of Michigan.

[62] Zhang, L., 2005, The value premium. Journal of Finance 60, 67-103.

Appendix: Moments of Portfolio Returns

To characterize the moments of returns on the world market portfolio and the co-moments with local

market returns, note that mean returns can be computed from

yt+1 ≡ E[yt+1|Ft] =KXl=1

(π0tPel)μl +KXl=1

(π0tPel)Alyt, (19)

where πt is the vector of state probabilities, el is a vector of zeros with a one in the l-th position so

(π0tPel) is the ex-ante probability of being in state St+1 at time t+ 1 given information at time t, Ft, and

μl ≡ μl +MStvec(Υl).

Because μl involves higher order moments of the world market portfolio such as MStvec(Λl) as well

as higher order co-moments between individual portfolio returns and returns on the global market port-

folio, the (conditional) mean returns E[yt+1|Ft] enter the right-hand side of (8). For instance, computing

Cov[xt+1, xWt+1|Ft] requires knowledge of the first h elements of E[yt+1|Ft]. Appendix B explains our iterative

estimation procedure used to solve the associated nonlinear optimization problem.

48

The conditional variance, skew and kurtosis of returns on the world market portfolio, xWt+1, can now be

computed as follows:

V ar[xWt+1|Ft] =KXl=1

(π0tPel)h¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt

¢2i+

KXl=1

(π0tPel)V ar[ηWt+1|St+1 = l]

Sk[xWt+1|Ft] =KXl=1


¢3i+3

KXl=1

(π0tPel)£¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt

¢V ar[ηWt+1|St+1 = l]

¤K[xWt+1|Ft] =

KXl=1


¢4i(20)

+6KXl=1


¢2V ar[ηWt+1|St+1 = l]

i.

Clearly the skew and kurtosis are functions of the mean and variance parameters μi,l, .., μh,l, Al, ΩlKl=1,state probabilities, πt, and the mean of the VAR coefficients, αj ≡ e0j

PKl=1(π

0tPel)Al. Hence, no new

parameters are introduced to capture the higher moments of the return distribution. Such model-based

estimates are typically determined with considerably more accuracy than estimates of the third and fourth

moments obtained directly from realized returns which tend to be very sensitive to outliers.

Similarly, the covariance between country returns, xit+1, and the world market return, xWt+1, is

Cov[xit+1, xWt+1|Ft] =

KXl=1

(π0tPel)£¡μi,l − e0iyt+1 + (e0iAl − αi)yt

¢ ¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt

¢¤+

KXl=1

(π0tPel)Cov[ηit+1, η

Wt+1|St+1 = l], (21)

Given estimates of the parameters and state probabilities, Cov[xit+1, xWt+1|Ft, St] can easily be calculated.

Finally, the coskewness and cokurtosis between local market returns and the world market return is

Cov[xit+1, (xWt+1)

2|Ft] =KXl=1

(π0tPel)h¡μi,l − e0iyt+1 + (e0iAl − αi)yt


¢2i+

KXl=1



¤+2

KXl=1

(π0tPel)£¡μWl − e0h+1yt+1 + (e0h+1Al − αh+1)yt

¢Cov[ηit+1, η

Wt+1|St+1 = l]

¤

49

and

Cov[xit+1, (xWt+1)

3|Ft] =KXl=1

(π0tPel)h¡μi,l − e0iyt+1 + (e0iAl − αi)yt


¢3i+3

KXl=1




¤+3

KXl=1


¢2Cov[ηit+1, η

Wt+1|St+1 = l]

i.

Terms such as¡μi,l − e0iyt+1

¢ ¡μWl − e0h+1yt+1

¢show the deviations of the state-specific mean from the

overall mean and do not arise in single-state models.

50

51

Table 1

Descriptive Statistics for Portfolio Stock Returns This table reports monthly sample statistics for percentage portfolio returns on a variety of equity portfolios The market corresponds to the value-weighted CRSP (NYSE/AMEX/NASDAQ) portfolio; market returns are measured in excess of 1-month T-bill yields. HML and SMB are zero-investment portfolios: HML F-F shorts stocks with below-median book-to-market ratio and goes long in stocks with above-median book-to-market ratio, independently of size (as in Fama and French, 1993); HMLd (decile-based) shorts the lowest book-to-market decile and goes long in the highest book-to-market decile, independently of size; HMLs (small) shorts the two lowest book-to-market deciles among small capitalization stocks (in the two lowest deciles) and goes long in the two highest book-to-market deciles among small capitalization stocks (in the two highest deciles); SMB F-F shorts stocks in the lowest size tercile and goes long in stocks in the highest size tercile, independently of their book-to-market ratio (as in Fama and French, 1993); SMBd (deciles) shorts the lowest size decile and goes long in the highest size, independently of book-to-market ratio. “MOMO” is a portfolio that shorts stocks below the 30th percentile of the distribution of prior returns and goes long in stocks above the 70th percentile, independently of size and book-to-market. DY is the CRSP-implied dividend yield; TERM is the spread between 10-year (constant maturity) government bonds and 1-month T-bill yields; DEFAULT is the spread between Baa and Aaa (long-term) corporate bond yields. In the “Unconditional CAPM” regression section, we have boldfaced the alphas that are statistically significant at a size of 10% or lower; this implies that the corresponding portfolio returns cannot be adequately explained by the standard, unconditional CAPM.

Unconditional CAPM

Portfolio Mean St. Dev. Skew Excess Kurtosis

Jarque‐Bera

p‐value p‐

val. p‐val. Perc. R2

Panel A – Full Sample (1927:01 – 2008:03) Market 0.629*** 5.425*** 0.227* 7.918*** >1000 0.000 − − − − − HML F‐F 0.510*** 3.573*** 2.172*** 15.84*** >1000 0.000 0.428 0.000 0.131 0.000 3.95 HMLd 0.524** 6.658*** 3.003*** 26.92*** >1000 0.000 0.241 0.230 0.441 0.000 12.9 HMLs 0.954*** 7.877*** ‐2.123*** 30.81*** >1000 0.000 1.120 0.000 ‐0.258 0.000 3.18 SMB F‐F 0.155 3.361*** 1.540*** 22.44*** >1000 0.000 0.032 0.759 0.196 0.000 10.1 SMBd 0.615** 7.728*** 4.842*** 47.13*** >1000 0.000 0.273 0.238 0.533 0.000 14.0 MOMO 0.764*** 4.658*** ‐3.019*** 28.43*** >1000 0.000 0.943 0.000 ‐0.285 0.000 10.7 DY 3.927*** 1.566*** 0.529*** 0.163 46.59 0.000 − − − − − TERM 1.522*** 1.222*** ‐0.228 0.596* 22.87 0.000 − − − − − SPREAD 1.123*** 0.708*** 2.480*** 8.825*** >1000 0.000 − − − − − Panel B – Early Sample (1927:01 – 1963:12) Market 0.852*** 6.464*** 0.422** 7.562*** >1000 0.000 − − − − − HML F‐F 0.504** 4.252*** 2.676*** 16.33*** >1000 0.000 0.211 0.224 0.343 0.000 27.3 HMLd 0.489 8.554*** 3.047*** 20.76*** >1000 0.000 ‐0.143 0.674 0.742 0.000 31.4 HMLs 1.002* 10.71*** ‐1.812** 18.47*** >1000 0.000 1.142 0.026 ‐0.164 0.000 9.66 SMB F‐F 0.210 3.518*** 3.733*** 35.86*** >1000 0.000 0.046 0.770 0.192 0.000 12.5 SMBd 0.891 10.03*** 4.688*** 34.77*** >1000 0.000 0.274 0.520 0.724 0.000 21.8 MOMO 0.677*** 5.334*** ‐4.077*** 33.792*** >1000 0.000 1.028 0.000 ‐0.412 0.000 24.9 DY 4.867*** 1.420*** 0.717** ‐0.181 38.66 0.000 − − − − − TERM 1.497*** 0.947*** ‐0.595** 0.999** 44.29 0.000 − − − − − SPREAD 1.256*** 0.925*** 1.959*** 4.476*** 654.7 0.000 − − − − − * Denotes significance at the 10% level. ** Denotes significance at the 5% level. *** Denotes significance at the 1% level.

52

Table 1 (continued)

Descriptive Statistics for Portfolio Stock Returns

Unconditional CAPM Portfolio Mean St. Dev. Skew

Excess Kurtosis

Jarque‐Bera

p‐value p‐val. p‐val. R2

Panel C – Compustat Sample (1964:01 – 2008:03) Market 0.443** 4.369*** ‐0.496** 2.103** 119.6 0.000 − − − − − HML F‐F 0.515*** 2.890*** 0.430** 2.356*** 139.2 0.000 0.629 0.000 ‐0.257 0.000 15.1 HMLd 0.554*** 4.490*** 0.392* 1.245** 47.67 0.000 0.605 0.002 ‐0.109 0.015 1.05 HMLs 0.914*** 4.225*** ‐0.693*** 5.432*** 691.3 0.000 1.115 0.000 ‐0.434 0.000 20.1 SMB F‐F 0.109 3.227*** ‐0.850*** 5.884*** 829.9 0.000 0.019 0.889 0.203 0.000 7.57 SMBd 0.383* 5.039*** 0.737*** 3.909*** 383.9 0.000 0.301 0.168 0.178 0.000 2.41 MOMO 0.836*** 4.009*** ‐0.636*** 5.330*** 664.4 0.000 0.860 0.000 ‐0.053 0.186 3.25 DY 3.141*** 1.208*** 0.282 ‐0.619** 15.51 0.000 − − − − − TERM 1.543*** 1.411*** ‐0.145 0.034 1.896 0.388 − − − − − SPREAD 1.012*** 0.423** 1.259*** 1.765** 209.1 0.000 − − − − − Panel D – Recent Sample (1994:01 – 2008:03) Market 0.538* 4.167*** ‐0.735** 0.969** 22.07 0.000 − − − − − HML F‐F 0.611** 3.424*** 0.765*** 2.581*** 64.15 0.000 0.814 0.000 0.379 0.000 21.3 HMLd 0.246 4.282*** 0.075 1.180** 9.903 0.007 0.420 0.194 ‐0.287 0.000 7.84 HMLs 1.082** 5.566*** ‐0.895*** 4.601*** 170.6 0.000 1.509 0.000 ‐0.705 0.000 27.7 SMB F‐F ‐0.300 3.821*** ‐1.592*** 7.443*** 467.0 0.000 ‐0.364 0.215 0.121 0.087 1.73 SMBd 0.362 5.534*** 0.988** 6.848*** 355.5 0.000 0.353 0.416 0.015 0.885 0.04 MOMO 0.811** 5.007*** ‐0.661** 5.243*** 208.3 0.000 0.929 0.016 ‐0.220 0.017 3.25 DY 1.784*** 0.449** 0.766** 0.120 16.81 0.000 − − − − − TERM 1.456*** 1.281*** 0.160 ‐0.910** 6.632 0.036 − − − − − SPREAD 0.834*** 0.219** 0.955*** 0.143 26.16 0.000 − − − − − * Denotes significance at the 10% level. ** Denotes significance at the 5% level. *** Denotes significance at the 1% level.

53

Table 2

Estimates of Benchmark Single-State Model with Risk Premia Driven by Standard Predictors

This table reports the iterated SMLE estimates of the single-state, SSBC “smooth” model:

,

],|[]|[

],|,[]|,[

11

11,111,211,21

11,1111,2111,21

Zttt

Wtt

WWtB

Wtt

Wttt

Wtt

WWt

it

WtB

Wtt

Wt

ittt

Wt

itt

iit

RRxVarRVarx

RRRRCovRRCovx

++

++++−

++++

+++++−

+++++

++=

++<ℑ+ℑ+=

+<ℑ+ℑ+=

ημ Bzz

zc ηλλα

ηλλα

in which ),( 111,2 tf

tt gR zBb &&&&′+ +−=λ )( 111,2 tf

tt gR zBb &&&&′′′+ +−=λ where ]BB 3[ι≡&& and ]1[ ′≡ ′

tt zz&& , and zt collects the predictors (dividend yield, term spread, and default spread). W

tBR 1, + is set to correspond to the conditional expectation of the gross

market portfolio return, WtR 1+

. The first set of constraints imposes that Mt+1 >0 at all times, that fttt RME /1]|[ 1 =ℑ+

, and

that g1,t<0, g’1,t<0, g2,t>0, and g3,t<0 ∀t. The second set of restrictions further imposes that M’t+1 ≤ 0 and M’’t+1 ≥ 0 ∀t. In the table, DY stands for dividend yield, TERM for the term spread, and DEF for the default spread. Standard errors and (pseudo) t-stats associated to correlations refer to estimates of covariance coefficients.

Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat

αSMB0.1386 0.1050 1.320 0.5397 0.0972 5.552 2.3002 1.3377 1.720

αHML 0.4024 0.0920 4.374 0.5621 0.0764 7.362 ‐0.0727 0.2931 ‐0.248

αMOMO 0.7622 0.1400 5.444 0.1053 0.1342 0.785 1.4152 0.2653 5.334

αW1.1914 0.8587 1.387 2.6618 0.7009 3.798 1.6825 4.1811 0.402

g1 (Cov. risk prem. constant) ‐0.0005 0.0350 ‐0.014 ‐0.0996 0.0176 ‐5.657 ‐0.1447 0.0447 ‐3.236

b1 (Cov. risk prem. loading on DY) ‐0.0040 0.0068 ‐0.588 0.0000 0.0032 ‐0.007 ‐0.0009 0.0226 ‐0.040

b2 (Cov. risk prem. load on TERM) 0.0154 0.0117 1.316 0.0000 0.0091 0.000 0.0014 0.0607 0.023

b3 (Cov. risk prem. load on DEF) ‐0.0014 0.0251 ‐0.056 ‐0.0002 0.0126 ‐0.016 ‐0.0003 0.1151 ‐0.003

g'1 (Semicov. risk prem. const.) 0.0819 0.0581 1.410 0.8434 0.0145 58.200 0.9027 0.2636 3.424

b'1 (Semicov. risk prem. load on DY) ‐0.0100 0.0169 ‐0.592 ‐0.0752 0.0100 ‐7.523 ‐0.0792 0.0281 ‐2.818

b'2 (Semicov. risk prem. load on TERM) ‐0.0148 0.0186 ‐0.796 ‐0.0029 0.0194 ‐0.149 ‐0.0075 0.1664 ‐0.045

b'3 (Semicov. risk prem. load on DEF) ‐0.0173 0.0584 ‐0.296 ‐0.0024 0.0377 ‐0.064 0.0253 0.2456 0.103

c1 (Pred. coeff. of market on DY) ‐0.0377 0.1598 ‐0.236 ‐0.1021 0.1249 ‐0.817 0.3745 2.4484 0.153

c2 (Pred. coeff. of market on TERM) 0.2785 0.2120 1.314 0.2490 0.1712 1.455 1.4156 4.1293 0.343

c3 (Pred. coeff. of market on DEF) ‐0.7380 0.5796 ‐1.273 0.0057 0.5194 0.011 ‐0.0335 2.5563 ‐0.013

SMB volatility 2.9399 0.1598 54.087 4.3461 15.5514 2.395 3.4483 5.9948 1.984

SMB‐HML correlation ‐0.0374 0.2120 ‐1.491 ‐0.2633 0.0293 ‐10.647 ‐0.1685 21.2033 ‐0.106

HML volatility 2.8739 0.5797 14.249 3.1829 0.2698 9.752 3.8574 5.2287 2.846

SMB‐MOMO correlation ‐0.2296 0.1510 ‐17.195 ‐0.0915 0.0071 1.246 ‐0.0331 1.3879 ‐0.382

HML‐MOMO correlation ‐0.0888 0.2411 ‐4.071 0.0261 0.0026 7.118 0.0228 0.0592 6.910

MOMO volatility 3.8467 0.1503 98.430 8.7391 0.0054 519.72 4.6510 0.4180 51.755

SMB‐Mkt correlation 0.2060 0.3187 8.536 0.2794 0.0200 33.113 0.0145 0.1977 1.822

HML‐Mkt correlation ‐0.0434 0.0303 ‐1.846 ‐0.0374 0.0124 ‐4.126 0.0024 0.6108 0.110

MOMO‐Mkt correlation ‐0.0204 0.1842 ‐1.915 0.1803 0.2239 12.857 0.0187 1.0363 0.602

Mkt volatility 4.4924 0.1763 114.52 7.7226 0.0287 175.89 7.1932 14.2156 3.640

DY 0.0622 0.0262 2.374 0.0624 0.0244 2.560 0.0633 0.0246 2.567

TERM 0.2150 0.0667 3.223 0.2239 0.0674 3.321 0.2337 0.0602 3.881

DEF 0.0254 0.0144 1.764 0.0252 0.0186 1.357 0.0246 0.0239 1.029B11 (DYt on DYt‐1) 0.9802 0.0063 155.587 0.9842 0.0079 124.610 0.9753 0.0086 113.895B12 (DYt on TERMt‐1) ‐0.0102 0.0078 ‐1.308 ‐0.0101 0.0090 ‐1.113 ‐0.0097 0.0116 ‐0.833B13 (DYt on DEFt‐1) 0.0251 0.0147 1.707 0.0247 0.0195 1.271 0.0256 0.0210 1.215B21 (TERMt on DYt‐1) ‐0.0191 0.0160 ‐1.194 ‐0.0199 0.0162 ‐1.227 ‐0.0203 0.0186 ‐1.091B22 (TERMt on TERMt‐1) 0.7748 0.0199 38.935 0.7516 0.0204 36.831 0.7278 0.0248 29.400B23 (TERMt on DEFt‐1) 0.1817 0.0374 4.858 0.1760 0.0485 3.629 0.1718 0.0439 3.913B31 (DEFt on DYt‐1) 0.0008 0.0035 0.229 0.0008 0.0033 0.257 0.0009 0.0039 0.228

B32 (DEFt on TERMt‐1) ‐0.0005 0.0043 ‐0.116 ‐0.0005 0.0042 ‐0.120 ‐0.0005 0.0044 ‐0.111B33 (DEFt on DEFt‐1) 0.9759 0.0081 120.481 0.9915 0.0077 129.273 1.0005 0.0095 104.823

Maximum log‐likelihood function:Number of parameters:Akaike information criterion:Hannan‐Quinn inf. criterion:Bayes‐Schwartz inf. criterion:

24.386724.491525.0503

5523.550223.655024.2138

21.321321.426121.9850

‐10339.14 ‐11425.70 ‐11833.5155 55

Unconstrained Estimates Constrained Estimates (Set 1) Constrained Estimates (Set 2)

54

Table 3

Comparing In-Sample Pricing Performances This table reports the average pricing errors,

∑=

−≡T

t

mit

m eTAPE1

,1 ,

where )( ,, mit

it

mit xxe )−≡ is the fitted value of the (excess) returns on equity portfolio i from model m at time t and T is

the sample size; the standard deviation of the pricing errors,

2,,

1

1 )( mimit

T

t

m eeTSDPE −≡ ∑=

− ;

the root mean-squared error, 22 )()( mmm APESDPERMSE +≡ ;

and the (pseudo) R2.

Average Pricing Error Std. Dev. of Pricing Error Root Mean Squared Error Pseudo R2 (%)

SMB ‐0.00064 2.9424 2.9424 1.1608HML 0.00117 2.8760 2.8760 1.3443MOMO 0.00298 3.8491 3.8491 0.1136Market 0.00022 4.4951 4.4951 0.7178SMB 0.00680 2.8035 2.8035 4.2885HML 0.02667 2.7869 2.7870 6.5740MOMO ‐0.07148 3.4091 3.4099 5.0587Market 0.03855 4.8531 4.8533 3.4760SMB 0.01183 2.4930 2.4930 6.5383HML 0.01053 2.6250 2.6250 6.4159MOMO ‐0.04492 3.4925 3.4928 8.6224Market 0.04493 4.0647 4.0650 8.1910

SMB 0.24296 3.0197 3.0294 2.0945HML ‐0.16609 3.0451 3.0496 1.4341MOMO 0.33278 4.1483 4.1616 0.9428Market 0.01231 5.0465 5.0465 0.0025SMB ‐0.73823 2.9117 3.0038 3.4561HML 0.70154 3.2533 3.3281 2.8311MOMO ‐0.00957 3.7799 3.7799 3.1195Market ‐0.17185 4.4096 4.4130 3.6814SMB 0.71827 3.1252 3.2067 2.6495HML 1.07252 5.0777 5.1898 5.4751MOMO ‐0.34627 3.6035 3.6201 1.4030Market 1.05476 4.3812 4.5063 1.6269

SMB ‐0.33362 3.2700 3.2869 4.1617HML 0.17569 3.2268 3.2316 1.9661MOMO 0.13262 4.3909 4.3929 3.2501Market ‐0.25995 5.3993 5.4055 2.9886SMB ‐0.56820 3.1801 3.2305 3.1977HML ‐0.98299 4.3435 4.4533 2.9803MOMO 0.31320 3.9068 3.9193 3.1932Market ‐1.84665 4.8318 5.1727 3.7194SMB 0.22152 2.9357 2.9440 3.5328HML 0.47107 3.3497 3.3827 4.0592MOMO ‐0.13697 3.9912 3.9935 4.7922Market ‐0.65454 5.0741 5.1162 6.1394

Constrain Set 1

Single‐state SSBC Model

Two‐state Four‐Moment MS CAPM


Two‐state Mixture CAPM


Constrain Set 2


Unconstrained Models




55

Table 4

Estimates of Two-State Markov Switching Four-Moment CAPM Model This table reports the iterated SMLE estimates of the two-state, four-moment CAPM model:

ZttSSt

WttSt

WtS

tWtS

WtB

Wtt

WtSt

WtSS

Wt

itt

Wt

itS

tWt

itS

WtB

Wtt

Wt

itSt

Wt

itSS

it

tt

W

tt

ttt

W

t

t

ttt

i

t

xKurt

xSkewxxxVarxVarx

RRCov

RRCovRRRRCovRRCovx

11

11,4

1,3,11,21,21

13

11,4

211,31,111,211,21

11

11

1111

1

1111

]|[

]|[],|[]|[

]|)(,[

]|)(,[],|,[]|,[

++

++

+++++

+++

+++++++++

++=

++ℑ+

+ℑ+<ℑ+ℑ+=

+ℑ+

ℑ+<ℑ+ℑ+=

++

++

+

−

+++

+

+

−

+++

ημ zBz

zc ηλ

λλλα

ηλ

λλλα

with constant transition probabilities and in which ,11 ,1, ++ −−=

tt Sjf

tSj gRλ for j = 2, 3, 4. The vector zt collects the predictors (dividend yield, term spread, and default spread). W

tBR 1, + is set to correspond to the conditional expectation of the gross market portfolio return, W

tR 1+ . The first set of constraints imposes that Mt+1 >0 at all times, that f

ttt RME /1]|[ 1 =ℑ+, and that g1,t<0, g’1,t<0, g2,t>0, and g3,t<0 ∀t. The second set of restrictions further imposes that

M’t+1 ≤ 0 and M’’t+1 ≥ 0 ∀t. Standard errors and (pseudo) t-stats associated to correlations refer to estimates of covariance coefficients.

Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐statSMB ‐ Regime 1 ‐0.1908 0.1554 ‐1.228 ‐0.2566 0.8511 ‐0.301 ‐0.1283 0.5902 ‐0.217HML ‐ Regime 1 0.4610 0.1409 3.272 0.1152 0.7981 0.144 0.0576 0.6557 0.088MOMO ‐ Regime 1 0.1122 0.1886 0.595 0.9411 0.6877 1.368 0.4706 0.6635 0.709W ‐ Regime 1 2.1354 0.4954 4.310 1.5548 0.7385 2.105 0.7774 0.8232 0.944SMB ‐ Regime 2 5.4202 2.6533 2.043 4.9932 2.1683 2.303 2.4966 2.0095 1.242HML ‐ Regime 2 0.2228 7.7102 0.029 ‐0.4640 2.3332 ‐0.199 ‐0.2320 1.6180 ‐0.143MOMO ‐ Regime 2 ‐0.6970 10.2330 ‐0.068 ‐0.0810 4.8504 ‐0.017 ‐0.0405 3.6367 ‐0.011W ‐ Regime 2 3.9820 11.0010 0.362 3.0722 3.9249 0.783 1.5361 3.7104 0.414

g0 ‐ Regime 1 0.6229 0.1255 4.963 0.0410 0.5724 0.072 0.0205 0.4091 0.050g1 (Covariance risk prem.) ‐ Regime 1 ‐2.3316 1.1770 ‐1.981 ‐0.9025 0.5738 ‐1.573 ‐0.4513 0.1101 ‐4.099g2 (Coskew risk premium) ‐ Regime 1 0.5886 0.0464 12.69 0.7450 0.0235 31.767 0.3725 0.1012 3.681g3 (Cokurt risk premium) ‐ Regime 1 ‐0.0836 0.0017 ‐49.18 ‐0.1459 0.0166 ‐8.766 ‐0.0730 0.0290 ‐2.513g0 ‐ Regime 2 1.0548 0.6890 1.531 0.6571 1.6680 0.394 0.3286 1.4518 0.226g1 (Covariance risk premium) ‐ Regime 2 0.4856 0.0067 72.48 ‐0.1994 0.0675 ‐2.953 ‐0.0997 0.0783 ‐1.274g2 (Coskew risk premium) ‐ Regime 2 0.0283 0.0001 205.1 0.0133 0.0095 1.394 0.0067 0.0110 0.607g3 (Cokurt risk premium) ‐ Regime 2 ‐0.0012 0.0000 ‐6549.5 0.0000 0.0002 ‐0.239 0.0000 0.0005 ‐0.039

c1 (Pred. coeff. on DY) ‐ Regime 1 0.1395 0.0956 1.459 0.2088 0.0952 2.159 0.0886 0.0863 1.027c2 (Pred. coeff. on TERM) ‐ Regime 1 ‐0.0112 0.1254 ‐0.089 ‐0.0057 0.0970 ‐0.114 ‐0.0095 0.1183 ‐0.081c3 (Pred. coeff. on DEF) ‐ Regime1 0.2074 0.3334 0.622 0.1384 0.2585 0.447 0.2270 0.2210 1.027c1 (Pred. coeff. on DY) ‐ Regime 2 0.8431 0.4632 1.820 0.6199 0.3700 2.994 1.5865 0.4623 3.431c2 (Pred. coeff. on TERM) ‐ Regime 2 0.4242 0.5201 0.816 0.6342 0.8025 0.536 0.5621 0.8518 0.660c3 (Pred. coeff. on DEF) ‐ Regime2 ‐0.9681 0.9131 ‐1.060 ‐0.8542 0.6479 ‐1.656 ‐0.5673 0.9073 ‐0.625

DY ‐ Regime 1 0.0241 0.0204 1.181 0.0235 0.0318 0.740 0.0279 0.0305 0.915TERM ‐ Regime 1 0.0684 0.0635 1.077 0.8952 0.0843 10.616 0.8448 0.0856 9.871DEF ‐ Regime 1 0.0336 0.0066 5.091 0.0472 0.0058 8.127 0.0484 0.0064 7.571B11 (DYt on DYt‐1) ‐ Regime 1 0.9939 0.0044 225.89 0.8699 0.0067 130.366 0.8076 0.0068 119.283B12 (DYt on TERMt‐1) ‐ Regime 1 ‐0.0008 0.0055 ‐0.145 ‐0.0009 0.0082 ‐0.113 ‐0.0013 0.0054 ‐0.235B13 (DYt on DEFt‐1) ‐ Regime 1 ‐0.0101 0.0162 ‐0.623 ‐0.0076 0.0252 ‐0.302 ‐0.0078 0.0323 ‐0.241B21 (TERMt on DYt‐1) ‐ Regime 1 ‐0.0110 0.0127 ‐0.866 ‐0.0165 0.0197 ‐0.836 ‐0.0104 0.0161 ‐0.644B22 (TERMt on TERMt‐1) ‐ Regime 1 0.8551 0.0188 45.484 1.2705 0.0317 40.109 0.9845 0.0388 27.201B23 (TERMt on DEFt‐1) ‐ Regime 1 0.1905 0.0522 3.649 0.2333 0.0712 3.277 0.1789 0.0856 2.090B31 (DEFt on DYt‐1) ‐ Regime 1 ‐0.0008 0.0013 ‐0.615 ‐0.0010 0.0014 ‐0.731 ‐0.0005 0.0009 ‐0.575B32 (DEFt on TERMt‐1) ‐ Regime 1 ‐0.0059 0.0019 ‐3.105 ‐0.0053 0.0026 ‐2.017 ‐0.0079 0.0032 ‐2.453B33 (DEFt on DEFt‐1) ‐ Regime 1 0.9695 0.0057 170.09 1.1935 0.0067 177.925 1.0320 0.0084 122.236


56

Table 4 (continued)

Estimates of Two-State Markov Switching Four-Moment CAPM Model

Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐statDY ‐ Regime 2 0.5610 0.1781 3.150 0.5617 0.1447 3.882 0.4485 0.1971 2.275

TERM ‐ Regime 2 0.5123 0.3875 1.322 0.5430 0.3336 1.628 0.6756 0.4471 1.511

DEF ‐ Regime 2 0.2078 0.1041 1.996 0.1242 0.0761 1.632 0.1717 0.0666 2.577B11 (DYt on DYt‐1) ‐ Regime 2 0.8633 0.0422 20.457 0.9216 0.0671 13.732 0.4952 0.0796 6.222B12 (DYt on TERMt‐1) ‐ Regime 2 ‐0.0304 0.0285 ‐1.067 ‐0.0403 0.0303 ‐1.331 ‐0.0393 0.0288 ‐1.366B13 (DYt on DEFt‐1) ‐ Regime 2 0.1041 0.0595 1.750 0.0768 0.0722 1.064 0.0722 0.0560 1.289B21 (TERMt on DYt‐1) ‐ Regime 2 ‐0.0571 0.0827 ‐0.690 ‐0.0675 0.0771 ‐0.875 ‐0.0618 0.0475 ‐1.302B22 (TERMt on TERMt‐1) ‐ Regime 2 0.5807 0.0637 9.116 0.3456 0.0825 4.190 0.3981 0.0776 5.133B23 (TERMt on DEFt‐1) ‐ Regime 2 0.2858 0.1191 2.400 0.2829 0.1206 2.347 0.2477 0.1125 2.201B31 (DEFt on DYt‐1) ‐ Regime 2 ‐0.0091 0.0232 ‐0.392 ‐0.0065 0.0325 ‐0.201 ‐0.0078 0.0457 ‐0.171B32 (DEFt on TERMt‐1) ‐ Regime 2 0.0175 0.0173 1.012 0.0227 0.0139 1.641 0.0179 0.0203 0.879B33 (DEFt on DEFt‐1) ‐ Regime 2 0.9223 0.0335 27.531 1.1681 0.0489 23.903 1.0586 0.0418 25.342

SMB volatility ‐ Regime 1 1.8313 0.0031 1095.6 1.9066 0.0038 966.1 2.0409 0.0336 60.81SMB‐HML correlation ‐ Regime 1 ‐0.1224 0.0014 ‐262.4 ‐0.1193 0.0052 ‐74.21 ‐0.0937 0.0169 ‐5.540HML volatility ‐ Regime 1 1.6766 0.0022 1284.0 1.6829 0.0021 1327.1 1.7483 0.0405 43.18SMB‐MOMO correlation ‐ Regime 1 0.0567 0.0065 29.32 0.0459 0.0079 19.51 0.0479 0.0168 2.855HML‐MOMO correlation ‐ Regime 1 ‐0.0061 0.0069 ‐2.725 ‐0.0040 0.0110 ‐1.095 ‐0.0016 0.0803 ‐0.020MOMO volatility ‐ Regime 1 1.8311 0.0024 1402.6 1.7664 0.0024 1303.7 1.7483 0.0206 84.82SMB‐Mkt correlation ‐ Regime 1 0.2104 0.0149 79.08 0.1883 0.0176 63.88 0.2621 0.0741 3.538HML‐Mkt correlation ‐ Regime 1 ‐0.1043 0.0137 ‐38.90 ‐0.0974 0.0137 ‐37.34 ‐0.1038 0.1084 ‐0.958MOMO‐Mkt correlation ‐ Regime 1 0.2243 0.0012 1055.5 0.2246 0.0023 544.0 0.1722 0.0072 23.83Mkt volatility ‐ Regime 1 3.0513 0.0054 1720.0 3.1321 0.0072 1368.9 2.5946 0.0288 90.11SMB volatility ‐ Regime 2 6.7129 0.4381 102.9 6.2541 0.7113 54.99 4.9379 0.4541 10.874SMB‐HML correlation ‐ Regime 2 0.1370 0.4535 14.99 0.0927 0.6247 6.330 0.0507 0.2902 0.175HML volatility ‐ Regime 2 7.3944 0.2610 209.5 6.8230 0.1652 281.8 6.7947 0.2796 24.30SMB‐MOMO correlation ‐ Regime 2 ‐0.2904 0.7511 ‐25.24 ‐0.2518 0.9721 ‐17.81 ‐0.0182 0.3269 ‐0.056HML‐MOMO correlation ‐ Regime 2 ‐0.3402 1.3249 ‐18.47 ‐0.2904 0.7483 ‐29.12 ‐0.0721 0.2521 ‐0.286MOMO volatility ‐ Regime 2 9.7261 0.2634 359.1 10.9971 0.6427 188.18 8.1625 0.1461 55.86SMB‐Mkt correlation ‐ Regime 2 0.3446 0.5834 41.55 0.2934 1.1449 16.75 0.0066 0.3114 0.021HML‐Mkt correlation ‐ Regime 2 0.2580 0.9900 20.20 0.2518 0.9582 18.73 0.0042 0.4239 0.010MOMO‐Mkt correlation ‐ Regime 2 ‐0.4557 0.0346 ‐1340.8 ‐0.4295 0.6281 ‐78.58 ‐0.0117 0.0143 ‐0.819Mkt volatility ‐ Regime 2 10.4780 0.2163 507.5 10.4498 0.6923 157.7 25.1914 0.0498 506.21

ProbState 1 | State 1 0.9182 0.0596 15.406 0.9241 0.0602 15.361 0.9192 0.1026 8.958ProbState 2 | State 2 0.7200 0.1459 4.935 0.7442 0.1426 5.220 0.7778 0.1565 4.969

Maximum log‐likelihood function:Number of parameters:Akaike information criterion:Hannan‐Quinn inf. criterion:Bayes‐Schwartz inf. criterion:

92 92 92


19.8766

20.9867 22.4000 22.0991

21.2899 20.989020.0519 21.4652 21.1643

‐9597.84 ‐10286.81 ‐10140.12

57

Table 5

Estimates of Three-State Markov Switching Mixture CAPM This table reports the SMLE estimates of the three-state mixture CAPM model:

),(

.

3 if][][][

+])(,[+])(,[+],[+

2 if]|[][

]|,[],[

1 if][

],[

111221

11313,413,313,231

13

113,42

113,3113,231

1121,112,212,221

11,1112,2112,221

11111,211

1111,211

+∼++=

⎪⎪⎪⎪

⎩

⎪⎪⎪⎪

⎨

⎧

=⎪⎩

⎪⎨⎧

+++++=

=

=⎪⎩

⎪⎨⎧

++<++=

+<++=

=⎪⎩

⎪⎨⎧

+++=

++=

+++

+

+++++

++++++++

+

++++−

++

+++++−

+++

+

+++

++++

tStZttt

tWtt

WWtt

Wtt

Wtt

WWt

it

Wt

itt

Wt

itt

Wt

itt

iit

tWtt

WWtB

Wt

Wtt

Wtt

WWt

it

WtB

Wt

Wt

itt

Wt

itt

iit

tWtt

WWtt

WWt

it

Wt

itt

iit

N

SxKurtxSkewxVarx

RRCovRRCovxxCovx

SxxxVarxVarx

RRRRCovxxCovx

SxVarx

xxCovx

Ωηημ 0zBz

zb

zc

zc

ηλλλα

ηλλλαηλλα

ηλλαηλα

ηλα

with constant transition probabilities and in which ,11 ,1, ++ −−=

tt Sjf

tSj gRλ for j = 2, 3, 4. The vector zt collects the predictors (dividend yield, term spread, and default spread). W

tBR 1, + is set to correspond to the conditional expectation of the gross market portfolio return, W

tR 1+ . The first set of constraints imposes that Mt+1 >0 at all times, that f

ttt RME /1]|[ 1 =ℑ+, and that g1,t<0, g’1,t<0, g2,t>0, and g3,t<0 ∀t. The second set of restrictions further imposes that

M’t+1 ≤ 0 and M’’t+1 ≥ 0 ∀t. Standard errors and (pseudo) t-stats associated to correlations refer to estimates of covariance coefficients.

Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat

SMB ‐ CAPM regime 0.0650 0.2330 0.279 0.4681 0.2352 1.990 0.3510 0.2612 1.344

HML ‐ CAPM regime ‐0.1494 0.2569 ‐0.582 ‐0.3964 0.2049 1.934 0.1211 0.2322 0.521

MOMO ‐ CAPM regime 0.9010 0.2423 3.719 0.6669 0.2404 2.774 0.7614 0.3978 1.914

W ‐ CAPM regime 1.7053 0.7871 2.166 0.7557 0.5951 1.270 0.9083 0.5061 1.795

SMB ‐ downside CAPM regime 0.8940 0.7794 1.147 0.1972 0.8277 0.238 0.6194 0.5102 1.214

HML ‐ downside CAPM regime 0.0309 0.6098 0.051 0.3130 0.4481 0.698 0.2736 0.7412 0.369

MOMO ‐ downside CAPM regime 1.1223 1.0434 1.076 0.1343 0.7873 0.171 0.4376 2.8043 0.156

W ‐ downside CAPM regime ‐1.9437 2.2832 ‐0.851 0.4217 2.1489 0.196 ‐0.0083 1.5558 ‐0.005

SMB ‐ 4‐mom. CAPM regime 5.3599 8.0236 0.668 0.1066 8.5619 0.012 1.1572 11.9759 0.097

HML ‐ 4‐mom. CAPM regime ‐3.3910 9.1532 ‐0.370 ‐0.3837 8.3565 ‐0.046 ‐2.1277 25.8484 ‐0.082

MOMO ‐ 4‐mom. CAPM regime 0.6324 6.1167 0.103 3.3813 5.7244 0.591 2.8169 4.9767 0.566

W ‐ 4‐mom. CAPM regime ‐1.6057 4.7328 ‐0.339 ‐0.0077 4.2635 ‐0.002 ‐0.3425 15.0226 ‐0.023

g0 ‐ CAPM Regime 0.1395 0.1350 1.033 0.8193 0.1037 7.898 0.9364 0.1738 5.388

g1 (Covariance risk prem.) ‐ CAPM Reg. ‐0.0770 0.0538 ‐1.431 ‐0.0452 0.0549 ‐0.823 ‐0.0060 0.0049 ‐1.230

g0 ‐ downside CAPM Regime 0.5326 0.2649 2.011 0.0359 0.2404 0.149 0.8992 0.4390 2.048

g1 (Covariance risk prem.) ‐ CAPM Reg. ‐0.0085 0.0063 ‐1.365 ‐0.0091 0.0057 ‐1.586 ‐0.0013 0.0028 ‐0.459

g'1 (Downside Cov. risk) ‐ downs. CAPM Reg. ‐0.0327 0.0241 ‐1.354 ‐0.0310 0.0199 ‐1.562 ‐0.0022 0.0012 ‐1.877

g0 ‐ 4‐mom. CAPM Reg. 1.3369 2.8294 0.473 7.0556 2.3529 2.999 2.2445 1.0509 2.136

g1 (Covariance risk prem.) ‐ 4‐mom. CAPM 0.0689 0.0315 2.189 ‐0.0585 0.0273 ‐2.143 ‐0.0119 0.0036 ‐3.261

g2 (Coskew risk prem.) ‐ 4‐mom. CAPM Reg. 0.0258 0.0135 1.905 0.0080 0.0105 0.755 0.0075 0.0015 5.028

g3 (Cokurt risk premium) ‐ 4‐mom. CAPM ‐0.0014 0.0006 ‐2.418 ‐0.0011 0.0005 ‐2.061 ‐0.0065 0.0013 ‐5.024

c1 (Pred. coeff. on DY) ‐ CAPM Regime 0.5744 0.3614 1.589 0.5854 0.2954 1.981 0.7654 0.3579 2.138

c2 (Pred. coeff. on TERM) ‐ CAPM Regime 1.2347 0.4882 2.529 0.8188 0.4738 1.728 0.7754 0.4872 1.592

c3 (Pred. coeff. on DEF) ‐ CAPM Regime ‐4.5438 0.6278 ‐7.238 ‐3.5446 0.4648 ‐7.626 ‐3.8692 1.1840 ‐3.268

c1 (Pred. coeff. on DY) ‐ down CAPM Reg. 0.1070 0.1131 0.946 0.0988 0.1238 0.798 0.0724 0.2110 0.343

c2 (Pred. coeff. on TERM) ‐ down CAPM ‐0.1063 0.1453 ‐0.731 ‐0.0871 0.1234 ‐0.706 ‐0.0788 0.1921 ‐0.410

c3 (Pred. coeff. on DEF) ‐ down CAPM Reg. 0.6745 0.3550 1.900 0.8172 0.4166 1.962 0.6198 0.4688 1.322

c1 (Pred. coeff. on DY) ‐ 4‐mom. CAPM 0.9111 0.8407 1.084 0.7255 0.9071 0.800 0.9073 1.0534 0.861

c2 (Pred. coeff. on TERM) ‐ 4‐mom. CAPM ‐2.4127 0.9392 ‐2.569 ‐1.4961 0.8765 ‐1.707 ‐1.4353 1.4446 ‐0.994

c3 (Pred. coeff. on DEF) ‐ 4‐mom. CAPM 9.1773 1.6555 5.544 8.8686 1.7603 5.038 5.1898 2.6266 1.976


58

Table 5 (continued)

Estimates of Three-State Markov Switching Mixture CAPM

Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐stat Estimate Std. Error Pseudo t‐statSMB volatility ‐ CAPM Regime 1.6569 0.0013 1709.3 1.7033 0.0057 510.31 1.5788 0.0019 1298.9SMB‐HML correlation ‐ CAPM Reg. ‐0.1090 5.30E‐02 ‐4.131 ‐0.0801 0.0944 ‐2.073 ‐0.0967 0.0961 ‐2.047HML volatility ‐ CAPM Regime 1.4545 6.50E‐02 27.11 1.4344 0.0947 21.73 1.2879 0.1081 15.35SMB‐MOMO correlation ‐ CAPM Reg. 0.0359 0.0444 1.759 ‐0.0188 0.0833 ‐0.608 0.0680 0.0843 1.964HML‐MOMO correlation ‐ CAPM Reg. ‐0.0989 0.0771 ‐2.444 ‐0.1352 0.1137 ‐2.698 ‐0.0811 0.0845 ‐1.905MOMO volatility ‐ CAPM Regime 1.5726 0.0745 27.65 1.5816 0.2646 9.455 1.5420 0.0716 33.22SMB‐Mkt correlation ‐ CAPM Regime 0.0522 0.0452 4.372 0.1114 0.1141 4.655 0.0728 0.0846 3.637HML‐Mkt correlation ‐ CAPM Regime ‐0.0874 0.1270 ‐2.287 ‐0.0959 0.4202 ‐0.916 ‐0.0755 0.1586 ‐1.642MOMO‐Mkt correlation ‐ CAPM Reg. 0.2457 0.1189 7.417 0.2601 0.2434 4.732 0.2234 0.1758 5.250Mkt volatility ‐ CAPM Regime 2.7398 0.1204 51.96 2.8006 0.1975 39.72 2.6789 0.2159 33.24

SMB volatility ‐ down. CAPM Reg. 3.8752 0.0759 158.32 4.8936 0.2360 101.45 4.4653 0.1149 121.42SMB‐HML correlation ‐ down. CAPM 0.0721 0.1305 6.824 0.0402 0.2783 2.800 0.0591 0.2023 3.668HML volatility ‐ down. CAPM Reg. 3.9834 0.2267 56.00 3.9633 0.4890 32.12 4.0197 0.2509 45.08SMB‐MOMO correlation ‐ down. CAPM ‐0.1882 0.1405 ‐20.86 ‐0.4826 0.2756 ‐60.45 ‐0.1440 0.2112 ‐16.65HML‐MOMO correlation ‐ down. CAPM ‐0.0976 0.2981 ‐5.244 ‐0.2272 0.6762 ‐9.395 ‐0.0469 0.5465 ‐1.888MOMO volatility ‐ down. CAPM Reg. 5.0246 0.2760 73.19 7.0542 0.4646 107.11 7.8123 0.3510 86.94SMB‐Mkt correlation ‐ down. CAPM 0.4117 0.1649 48.88 0.6000 0.3764 76.06 0.2021 0.1708 40.24HML‐Mkt correlation ‐ down. CAPM ‐0.0495 0.3472 ‐2.871 0.0174 0.6326 1.062 ‐0.0396 0.4240 ‐2.856MOMO‐Mkt correlation ‐ down. CAPM ‐0.1183 0.3540 ‐8.484 ‐0.6022 0.7880 ‐52.57 ‐0.0770 0.5071 ‐9.028Mkt volatility ‐ down. CAPM Reg. 6.3165 0.3377 94.53 9.7518 0.5446 174.60 10.8749 0.5041 117.30

SMB volatility ‐ 4‐mom. CAPM 8.5112 0.2321 218.5 18.6513 0.5653 615.39 3.3854 0.3220 35.59SMB‐HML correlation ‐ 4‐mom. CAPM 0.1921 1.4870 7.365 0.0197 3.5654 4.477 0.0612 2.6795 1.097HML volatility ‐ 4‐mom. CAPM 9.5690 1.5889 40.34 43.5582 2.4036 789.37 14.1903 1.9475 51.70SMB‐MOMO corr. ‐ 4‐mom. CAPM ‐0.3591 1.1009 ‐26.87 ‐0.0567 3.7844 ‐12.46 ‐0.0683 1.6099 ‐1.494HML‐MOMO correlation‐4‐mom. CAPM ‐0.6313 2.2883 ‐25.54 ‐0.0915 4.5872 ‐38.77 ‐0.0767 4.2157 ‐2.682MOMO volatility ‐ 4‐mom. CAPM 13.8237 2.5161 53.16 44.6068 3.7032 512.73 10.3948 4.3923 24.60SMB‐Mkt correlation ‐ 4‐mom. CAPM 0.3196 1.0799 23.87 0.1835 2.3552 40.22 0.1664 1.5363 2.809HML‐Mkt correlation ‐ 4‐mom. CAPM 0.5821 1.8216 28.98 0.5585 3.0887 218.10 0.2398 2.5376 10.27MOMO‐Mkt correlation ‐ 4‐mom. CAPM ‐0.6842 1.9366 ‐46.29 ‐0.5578 3.3145 ‐207.77 ‐0.1142 3.0021 ‐3.030Mkt volatility ‐ 4‐mom. CAPM 13.5395 1.0733 119.6 27.6779 2.7683 276.73 7.6612 1.5991 36.70

ProbCAPM at t| CAPM at t‐1 0.6391 0.2733 2.338 0.8463 0.1495 5.661 0.9265 0.1886 4.913ProbdCAPM at t | CAPM at t‐1 0.1473 0.0784 1.879 0.1520 0.0733 2.074 0.0727 0.0359 2.023ProbCAPM at t | dCAPM at t‐1 0.2794 0.0836 3.342 0.1339 0.0836 1.601 0.0883 0.1147 0.770ProbdCAPM at t | dCAPM at t‐1 0.8527 0.0084 101.512 0.8582 0.1495 5.741 0.8637 0.2278 3.791ProbCAPM at t | 4MOM at t‐1 0.5029 0.1404 3.582 0.0006 0.0059 0.102 0.0008 0.0062 0.125Prob4MOM at t | 4MOM at t‐1 0.4917 0.0973 5.053 0.8980 0.3049 2.945 0.8894 0.3828 2.324

Maximum log‐likelihood function:Number of parameters:Akaike information criterion:Hannan‐Quinn inf. criterion:Bayes‐Schwartz inf. criterion: 21.3998 22.1000 22.4731

19.7346 20.4348 20.808019.9976 20.6977 21.0709

‐9823.95 ‐10005.88138 138 138


‐9482.62

59

Table 6

Out-of-Sample Forecasting Performance on the Benchmark Portfolios The table reports the 1- and 12-month horizon predictive performance of three competing models: the single-state smooth model in which the risk prices are driven by macroeconomic factors, the two-state four-moment Markov switching model, and the three-state Markov switching mixture CAPM.

Average Pricing Error

Std. Dev. of Pricing Error

Root Mean Squared Error

Mean Absolute Error

Predictive R2

(%)

SMB ‐0.00566 2.9405 2.9405 2.1936 1.0755HML ‐0.00275 2.8754 2.8754 2.1313 1.2859MOMO 0.00621 3.8501 3.8501 2.6917 0.1313Market ‐0.00538 4.5019 4.5019 3.5598 0.3446SMB 0.00883 2.6151 2.6151 2.0990 1.3874HML 0.01687 3.1577 3.1577 2.3451 1.5036MOMO ‐0.06698 3.4442 3.4449 2.5866 0.8481Market 0.04378 4.4614 4.4617 3.9421 0.1833SMB 0.03777 2.4974 2.4977 1.6563 3.1642HML ‐0.07177 2.6399 2.6408 1.6929 4.3791MOMO ‐0.02077 3.4984 3.4984 2.1035 4.0546Market 0.14020 4.0747 4.0771 2.7775 3.2755

SMB ‐0.33778 2.9686 2.9878 2.1774 0.9275HML 0.17153 2.9318 2.9368 2.0711 1.1611MOMO 0.13743 3.9912 3.9936 2.7219 0.1175Market ‐0.26621 4.9058 4.9130 3.7294 0.2941SMB ‐0.85940 2.1334 2.3000 1.8701 1.5024HML ‐1.49462 2.6564 3.0480 2.5435 1.5470MOMO 0.47967 2.6992 2.7414 2.1196 0.9239Market ‐2.81160 4.5737 5.3688 3.8993 0.1872SMB 0.73636 1.9345 2.0699 2.0300 1.6074HML 1.12975 3.0826 3.2831 2.5721 2.4086MOMO ‐0.37987 2.3443 2.3749 2.2393 4.4602Market 1.17152 3.6709 3.8534 3.1547 0.9094

SMB 0.02486 3.5172 3.5173 2.9113 0.8982HML 0.02387 3.4362 3.4363 2.8238 0.8709MOMO 0.02512 4.6317 4.6318 3.5895 0.0293Market 0.04569 5.3320 5.3322 4.6575 0.9106SMB 0.03309 3.3208 3.3210 2.7019 1.3914HML 0.01526 3.0309 3.0310 2.4449 1.4603MOMO ‐0.06990 4.8916 4.8921 3.1575 0.9636Market 0.04021 4.6861 4.6862 4.4024 2.3639SMB 0.10494 2.8029 2.8048 2.0611 4.6791HML ‐0.17681 2.9716 2.9768 2.1375 4.2279MOMO ‐0.00237 3.8990 3.8990 2.6108 0.5962Market 0.24436 4.5264 4.5330 3.4655 2.0401

SMB ‐0.29965 3.5292 3.5419 2.9914 0.7930HML 0.23363 3.5096 3.5173 2.8972 0.7524MOMO 0.19072 4.7811 4.7849 3.7876 0.0260Market ‐0.18283 5.7397 5.7426 5.0554 0.7913SMB ‐0.58413 3.1485 3.2023 2.8302 1.2170HML ‐1.02235 3.8358 3.9698 2.8893 1.2782MOMO 0.34895 4.0579 4.0729 3.1820 0.8509Market ‐1.95166 3.8184 4.2883 5.5399 1.9725SMB 0.72266 2.8895 2.9785 2.6069 4.9447HML 1.13727 3.4490 3.6317 2.5441 7.2385MOMO ‐0.39749 3.6983 3.7196 2.9050 3.0578Market 1.27271 4.4686 4.6463 4.4074 3.6034

Single‐state SSMBCP Model

Two‐state Four‐Moment MS

CAPM


Forecast Horizon: 12 months






Forecast Horizon: 1 month

Constrain Set 2



CAPM


CAPM


Constrain Set 2



CAPM

60

Figure 1

60-Month Rolling Window Estimates of OLS (Unconditional CAPM) Alphas, Betas, and of Co-Skewness, Co-Kurtosis, and Downside Beta

OLS Alphas

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etu

rn

SMB HML MOMO

OLS Betas

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etu

rn

SMB HML MOMO

Downside Beta

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etu

rn

SMB HML MOMO

Co-Skewness with Market Ptf. Returns

-2

-1.6

-1.2

-0.8

-0.4

0

0.4

0.8

1.2

1.6

2

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etur

n

SMB HML MOMO

Co-Kurtosis with Market Ptf. Returns

-7

-5

-3

-1

1

3

5

7

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bnor

mal

ret

urn

SMB HML MOMO

61

Figure 2

120-Month Rolling Window Estimates of OLS (Unconditional CAPM) Alphas, Betas, and of Co-Skewness, Co-Kurtosis, and Downside Beta

OLS Alphas

-0.8

-0.4

0

0.4

0.8

1.2

1.6

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etu

rn

SMB HML MOMO

OLS Betas

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etu

rn

SMB HML MOMO

Downside Beta

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etu

rn

SMB HML MOMO

Co-Skewness with Market Ptf. Returns

-2

-1.6

-1.2

-0.8

-0.4

0

0.4

0.8

1.2

1.6

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bn

orm

al r

etu

rn

SMB HML MOMO

Co-Kurtosis with Market Ptf. Returns

-7

-5

-3

-1

1

3

5

7

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

% m

onth

ly a

bnor

mal

ret

urn

SMB HML MOMO

62

Figure 3

Smoothed State Probabilities Estimated from Two-State Markov Switching Four-Moment CAPM Model

State 1 smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1927 1937 1947 1957 1967 1977 1987 1997 2007

State 1 (Unconstrained) State 1 (Constrained)

State 2 smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1927 1937 1947 1957 1967 1977 1987 1997 2007

State 2 (Unconstrained) State 2 (Constrained)

63

Figure 4

Smoothed State Probabilities Estimated from Three-State Markov Switching Mixture CAPM Model

CAPM State -- Smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1927 1937 1947 1957 1967 1977 1987 1997 2007

CAPM Regime (Unconstrained) CAPM Regime (Constrained)

Downside CAPM State -- Smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1927 1937 1947 1957 1967 1977 1987 1997 2007

Downside CAPM Reg. (Unconstrained) Downside CAPM Reg. (Constrained)

Four-Moment CAPM State -- Smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1927 1937 1947 1957 1967 1977 1987 1997 2007

4-mom. CAPM Reg. (Unconstrained) 4-mom. CAPM Reg. (Constrained)

64

Figure 5

Smoothed State Probabilities Estimated from Three-State Markov Switching Mixture CAPM Model: 1985-2008 Sub-Sample

CAPM State -- Smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007

Unconstrained Constrained

Downside CAPM State -- Smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007


Four-Moment CAPM State -- Smoothed Probabilities

0

0.2

0.4

0.6

0.8

1

1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007


Regime Shifts in Empirical Pricing Kernels: A …...2009/10/09 · Regime Shifts in Empirical...

Documents

Transcript of Regime Shifts in Empirical Pricing Kernels: A …...2009/10/09 · Regime Shifts in Empirical...