Model Risk and Disappointment Aversion · Model Risk and Disappointment Aversion Monash CQFIS...

39
Model Risk and Disappointment Aversion Monash CQFIS working paper 2018 – 7 Hasan Fallahgoul School of Mathematical Sciences, Monash University and Centre for Quantitative Finance and Investment Strategies Loriano Mancini Universit` a della Svizzera italiana (USI), Institute of Finance, Lugano, Switzerland Stoyan Stoyanov Stoyan Stoyanov, Stony Brook University, College of Business, New York, USA Abstract Extensions of utility functions sensitive to the tail behavior of the portfolio return distribution may not be approximated reliably through higher-order moment expansions and require specifying a complete dis- tribution. This, however, implies that an optimal allocation can be adversely influenced by an incorrect distribution specification. We de- velop a novel approach for model risk assessment based on a projection method which is applied to portfolio construction. In an out-of-sample empirical study, we find that the marginal utility gains of the opti- mal portfolio of a generalized disappointment aversion investor are remarkably robust to misspecifications in the marginal distributions but are very sensitive to the structural assumption of stock returns implemented through a factor model. Centre for Quantitative Finance and Investment Strategies

Transcript of Model Risk and Disappointment Aversion · Model Risk and Disappointment Aversion Monash CQFIS...

  • Model Risk and DisappointmentAversionMonash CQFIS working paper

    2018 – 7

    Hasan FallahgoulSchool of Mathematical Sciences,

    Monash University and Centre forQuantitative Finance and Investment

    Strategies

    Loriano ManciniUniversità della Svizzera italiana (USI),

    Institute of Finance, Lugano,Switzerland

    Stoyan StoyanovStoyan Stoyanov, Stony Brook

    University, College of Business, NewYork, USA

    Abstract

    Extensions of utility functions sensitive to the tail behavior of theportfolio return distribution may not be approximated reliably throughhigher-order moment expansions and require specifying a complete dis-tribution. This, however, implies that an optimal allocation can beadversely influenced by an incorrect distribution specification. We de-velop a novel approach for model risk assessment based on a projectionmethod which is applied to portfolio construction. In an out-of-sampleempirical study, we find that the marginal utility gains of the opti-mal portfolio of a generalized disappointment aversion investor areremarkably robust to misspecifications in the marginal distributionsbut are very sensitive to the structural assumption of stock returnsimplemented through a factor model.

    Centre for Quantitative Finance and Investment Strategies

    http://www.monash.edu

  • Model Risk and Disappointment Aversion∗

    Hasan Fallahgoul†

    Monash University

    Loriano Mancini‡

    Swiss Finance Institute

    and USI Lugano

    Stoyan Stoyanov§

    Stony Brook University

    July 30, 2018

    Preliminary and incomplete

    Abstract

    Extensions of utility functions sensitive to the tail behavior of the portfolio return

    distribution may not be approximated reliably through higher-order moment expansions

    and require specifying a complete distribution. This, however, implies that an optimal

    allocation can be adversely influenced by an incorrect distribution specification. We de-

    velop a novel approach for model risk assessment based on a projection method which is

    applied to portfolio construction. In an out-of-sample empirical study, we find that the

    marginal utility gains of the optimal portfolio of a generalized disappointment aversion

    investor are remarkably robust to misspecifications in the marginal distributions but

    are very sensitive to the structural assumption of stock returns implemented through a

    factor model.

    Keywords : Model risk, utility function, disappointment aversion.

    JEL classification: C5, G12

    ∗The Centre for Quantitative Finance and Investment Strategies has been supported by BNP Paribas.†Hasan Fallahgoul, Monash University, School of Mathematics and Centre for Quantitative Finance and

    Investment Strategies, 9 Rainforest Walk, 3800 Victoria, Australia. E-mail: [email protected].‡Loriano Mancini, Università della Svizzera italiana (USI), Institute of Finance, Via Buffi 6, CH-6904

    Lugano, Switzerland. E-mail: [email protected].§Stoyan Stoyanov, Stony Brook University, College of Business, New York, USA. E-mail:

    [email protected]

  • 1 IntroductionSince the introduction of the classical mean-variance analysis by Markowitz (1952), empirical

    research has established stylized facts about financial asset returns which include asymmetry,

    fat-tails, time-dependent volatilities and correlations. Portfolio construction models which

    take into account such empirical facts come in two forms: (i) they either include a particular

    multivariate model for the assets returns or (ii) rely on higher-order empirical moments

    through a Taylor expansion of the expected utility function. The benefit of the former is that

    an analytic solution might be feasible because of the structure of the assumed multivariate

    law while the latter approach is non-parametric and is subject to weaker assumptions.1

    Martellini and Ziemann (2010) provide an extensive out-of-sample analysis of the benefits

    of including skewness and kurtosis in a stock portfolio through a Taylor expansion. Because

    of the explosion of the number of parameters to be estimated in covariance, co-skewness and

    co-kurtosis estimators, they consider various methods such as single factor model and shrink-

    age which ameliorate the estimation problem at the cost of assuming a certain structure.

    Martellini and Ziemann (2010) find that the structural assumptions leads to significant mon-

    etary utility gains. However, the relative impact of skewness and kurtosis remains marginal

    even in the case of the more sophisticated estimators.

    Apart from the stylized facts which deviate from the classical distributional assumption

    of normality, extensions of the risk-aversion preferences behind expected utility theory have

    also been considered. Gul (1991) develops the disappointment aversion (DA) theory, includes

    the standard expected utility theory as a special case, which is consistent with behavioral

    evidence such as the Allais paradox.2 Ang et al. (2005) apply the DA theory of Gul (1991)

    to the static and dynamic portfolio construction problems and they provide some justifica-

    tions to address the equity premium puzzle. Routlege and Zin (2010) suggest an extension,

    i.e., Generalized disappointment aversion (GDA), which captures aversion to disappointing

    outcomes and apply it to a general problem in asset pricing. Because the outcomes that

    disappoint are penalized differently, GDA preferences exhibit a stronger aversion to negative

    1Many families of distributions have been considered in the literature, such as: elliptical distributions (Owenand Rabinovitch (1983)), hyperbolic skewed t, stable Paretian (Mittnik and Rachev (2000)), etc.2The Allais paradox is defined as a violation of independent axiom in the choice problem.

    1

  • skewness and excess kurtosis if compared to the corresponding preferences defined by the

    utility function and disappoint aversion.

    Dahlquist et al. (2017) apply the generalized disappointment-aversion utility to portfolio

    construction and find that the opportunity cost of ignoring skewness can be substantially

    higher than that of an expected utility maximizer. In contrast to Martellini and Ziemann

    (2010), the authors assume a parametric model with a Gaussian multivariate component

    and a single exponentially distributed linear factor capturing systematic tail events. Apart

    from relying on a particular parametric structure, Dahlquist et al. (2017) consider a portfolio

    of three funds – a bond, a stock, and cash – and the analysis is carried out through fund

    separation. It is, however, unclear if the cost of ignoring skewness is the same out-of-sample

    for equity-only investors and if the cost of ignoring kurtosis is of similar magnitude.

    An out-of-sample analysis of utility gains resulting from including skewness and kurtosis

    with GDA preferences is not straightforward. We demonstrate that the Taylor expansion

    approach is not applicable and, as a consequence, an explicit parametric hypothesis is needed

    which introduces model risk – the parametric hypothesis may be incorrect rendering the

    analysis biased.

    This paper contributes to the literature in several ways. Firstly, we develop a novel general

    methodology, i.e., a semi-parametric L-moment approximation, which has the capacity to

    improve a parametric model in a non-parametric fashion. The parametric model is regarded

    only as a starting point. Thus, instead of attempting to approximate the objective function

    through a Taylor expansion, we approximate the portfolio return distribution. The new

    approach is applicable not only in portfolio construction but also in other fields such as risk

    measurement.

    Secondly, we apply the new method to portfolio construction with GDA preferences and

    also separately to expected utility where the new method is an alternative to the Taylor

    expansion approach. Since we use the factor structure for the distribution of the returns, it

    is possible to check the robustness of the optimal solution with respect to certain misspeci-

    fications of the marginal distribution. Furthermore, thanks to our semi-parametric approxi-

    mation, it is possible to evaluate the cost of ignoring higher moments, such as skewness and

    kurtosis, for an investor with GDA preferences.

    2

  • To this end, we compare several model specifications in an extensive out-of-sample em-

    pirical study. We find that for all parameter combinations of GDA preferences, introducing

    a structure through a single factor model and taking into account higher-order moments

    results in significant monetary utility gains relative to a mean-variance investor employing

    a sample estimator for the covariance matrix. However, the out-of-sample cost of ignoring

    skewness and kurtosis remains marginal. The average monetary utility gains relative to a

    Gaussian GDA investor using the same structural assumption do not exceed 0.3%. Our

    findings extend Martellini and Ziemann (2010) and Das and Uppal (2004) beyond standard

    power utility theory and imply that the high cost of ignoring skewness in-sample reported

    in Dahlquist et al. (2017) does not materialize out-of-sample for pure equity investors.

    The rest of the paper is organized as follows. In section 2, we discuss the theory of gener-

    alized disappointment aversion as well as the models of asset returns. The semi-parametric

    construction for the asset returns is given in Section 3. Finally, a comprehensive empirical

    study is provided in Section 5.

    2 Portfolio construction with generalized disappointment

    aversion

    2.1 Generalized disappointment aversionThe attitude of an investor toward risk is not symmetric. Gul (1991) develops an axiomatic

    model which capture downside risk and he uses the disappointment aversion for the downside

    risk. Routlege and Zin (2010) extend the Gul’s model by introducing an exogenous parameter

    for level of disappointment aversion. The investor’s objective is to maximize the utility of

    the certainty equivalent of terminal wealth which is the same as maximizing the certainty

    equivalent itself because utility functions are non-decreasing.

    Let W and µ be the terminal wealth and certainty equivalent, respectively. Following

    Routlege and Zin (2010), the certainty equivalent for generalized disappointment aversion

    (GDA) preferences is defined implicitly by the following equation

    u(µ) = Eu(W )− θE[(u(δµ)− u(W ))I{W≤δµ}

    ](1)

    where I is an indicator function that equals one if the condition holds and zero otherwise,

    3

  • δ ∈ [0, 1] defines the disappointment threshold, θ ≥ 0 is the sensitivity to disappointment,

    and

    u(x) =

    x1−γ

    1− γ, if γ > 0 and γ 6= 1

    lnx, if γ = 1(2)

    is the standard power utility function where γ is the constant relative risk aversion parameter.

    From (1), the outcomes that lie below the disappointment threshold of δµ are penalized

    with the proportion of θ. If θ = 0, equation (1) simplifies to the expected utility function.

    If δ = 1, the generalized disappoint aversion of Routlege and Zin (2010) reduces to the

    disappointment aversion model of Gul (1991). As noted in Routlege and Zin (2010), the

    GDA preferences defined through (1) are sensible also for δ > 1 however the right hand-side

    needs to be scaled by A = (1− θ(δ1−γ − 1))−1 so that the certainty equivalent of a constant

    amount equals the amount itself.3

    By incorporating (2) in (1), Routlege and Zin (2010) highlight a few key properties.

    Firstly, the generalized disappointment aversion preferences have an axiomatic structure

    which proves to be more compelling for asset pricing than the disappointment aversion model

    by Gul (1991) and traditional expected utility theory. Secondly, there is a direct relation

    between the risk aversion parameter γ, the disappointment threshold δ, and the sensitivity

    to disappointment θ. More precisely, an increase in θ or δ increases risk aversion in that

    the certainty equivalent falls. Finally, it is possible to show that µ is implicitly defined also

    through the following equation

    µ =

    (E[W 1−γ

    (1 + θI{W≤δµ}

    )]1 + θδ1−γE

    [I{W≤δµ}

    ] )1/(1−γ) , if γ > 0 and γ 6= 1exp

    (E[lnW

    (1 + θI{W≤δµ}

    )]− θ ln δE[I{W≤δµ}]

    1 + θE[I{W≤δµ}

    ] ) , if γ = 1 (3)and is linearly homogeneous as a function of W .

    In the context of myopic portfolio construction, the terminal wealth can be written as

    W = W0Rw, where W0 is the initial wealth and Rw is the gross return on the investor’s

    portfolio over the investment horizon. The gross return is calculated through the portfolio

    3It is straightforward to verify that this value of the contant A makes sense only if γ 6= 1. If γ = 1, the valueof the threshold δ cannot exceed one.

    4

  • weights Rw = 1 + w′r, where r denotes the vector of stock returns, respectively.

    Assuming that the terminal wealth W is entirely determined by the portfolio outcome,

    an investor whose preferences are consistent with (3) constructs a portfolio which maximizes

    the certainty equivalent,

    maxw

    µw (4)

    s.t. w′e = 1

    where the constraint indicates that the portfolio is self-financing and µw satisfies (3) with

    W = 1 + w′r. We ignore the impact of W0 because by the linear homogeneity property of

    µw, the optimal solution of (4) does not depend on the initial wealth. If θ = 0, then the

    optimal allocation of (4) maximizes the expected utility in equation (2).

    Because the outcomes that disappoint are penalized, GDA preferences exhibit a stronger

    aversion to negative skewness and excess kurtosis if compared to the corresponding pref-

    erences defined by the utility function in (2). However, an attempt to analyze the impact

    of higher-order moments on the optimal allocation through the classical Taylor expansion

    method as in Martellini and Ziemann (2010) fails because equation (3) is not a smooth

    function in the state space.4 We resolve this problem by developing a novel semi-parametric

    method which improves non-parametrically the portfolio return distribution starting from

    an initial parametric model. One important advantage of this construction is that it works

    through the quantile function and is, therefore, uniquely positioned to be combined with the

    Monte Carlo method. As a consequence, we employ the Monte Carlo method to compute

    the expectations in (3) in order to find the optimal allocation for the investor from (4).

    We study out of sample the relative value added of assuming a one-factor structure in the

    vector of stock returns and also the opportunity cost of ignoring the information in skewness

    and kurtosis for a GDA investor.

    2.2 Models of asset returnsEmpirical research has established stylized facts about financial asset returns which include

    asymmetry, fat-tails, time dependent volatilities and correlations. Portfolio construction

    models which take into account such empirical facts come in two forms: (i) they either

    4Further details are provided in the Appendix.

    5

  • include a particular multivariate model for the assets returns, i.e., Owen and Rabinovitch

    (1983), or (ii) rely on higher-order empirical moments, estimated with or without assuming

    a linear structure, through a Taylor-series expansion of the expected utility function, i.e.,

    Martellini and Ziemann (2010).

    In this paper, we develop an intermediate method which is neither fully parametric like

    (i) nor completely non-parametric like (ii) and is referred to as semi-parametric. We impose

    a factor structure which goes back to Sharpe (1963), who assumes a single-factor model for

    a vector of returns as follow

    rit = α + βiFt + �it (5)

    where rit, �i, and βi are the asset return, idiosyncratic noise, and loading factor(s) of asset i,

    respectively. The regression residuals are assumed to be homoscedastic and cross-sectionally

    uncorrelated.

    We model asset returns by imposing the factor structure in three ways. First, we make the

    classical assumption that the factor returns and the idiosyncratic noise follow the Gaussian

    distribution. This distributional assumption ignores skewness and excess kurtosis.

    Second, we improve the Gaussian distribution for the factor returns and the residuals

    by incorporating skewness and kurtosis non-parametrically through a new method based on

    L-moments. The normal distribution is, thus, regarded as an initial model, possibly not a

    very good one. If the factor returns and the residuals are symmetric and do not exhibit any

    excess kurtosis, then there is no improvement and the return distribution defaults to the

    Gaussian law.

    Finally, for the purpose of a robustness test we assume a subclass of the generalized hyper-

    bolic distribution for the factor returns and the residual. This represents a fully parametric

    assumption including the Gaussian law as a special case.5 In contrast, the semi-parametric

    method improves the initial model only through the information in the third and the fourth

    moments rather than the full distribution.

    5There is strong evidence in favor of the variance gamma and normal inverse Gaussian, i.e., Carr et al. (2003).These two distributions are a special case of the generalized hyperbolic distribution. Detailed informationabout the generalized hyperbolic distribution can be found in Eberlein (2001).

    6

  • 3 Semi-parametric constructionDenote by X and F a random variable describing stock returns and the true distribution of

    X. Also, denote by G the distribution of a model we have selected for X which may be quite

    different from the true distribution F . Since the true distribution is generally unknown, it is

    important to measure the impact of assuming an incorrect but simpler model which comes

    in the form of an opportunity cost resulting from model risk. In portfolio construction, we

    can measure the out-of-sample utility loss of working with a simpler model.

    In this section, we introduce a methodology that has the capacity to assess the utility

    loss of using G compared to an improvement of G which is closer to F in some sense. Under

    certain conditions, the improvement of G converges to the true F and, thus, the starting

    model G becomes irrelevant at the limit. However, starting from a more relevant initial

    model leads to a better rate of convergence to the true but unknown model.

    Figure (1) illustrates the idea behind this construction. We make the assumption that

    the random variable X has finite variance, which is non-restrictive in portfolio construction.

    Because G is a distribution function, G(X) transforms the random variable X into V which

    belongs to the set of random variables with support in [0, 1]. If G is the true distribution

    of X, then G(X) is a standard uniform distribution on interval (0, 1). Next, by projecting

    V onto a sequence of increasing sets Ln, we construct a sequence Vn. Ln represents the set

    of all random variables with support in [0, 1] such that all L-moments of order higher than

    n are equal to zero. Finally, one selected element of the sequence, Vn, is transformed back

    into the initial space Xn = G−1(Vn) and becomes a better model for X. By increasing n

    one recovers V , and thus X through the inverse transformation, because asymptotically the

    sequence Ln fills the entire set of random variables with support in [0, 1].

    Insert Figure 1 about here

    There are two important steps in the above methodology: (i) the transformation G which

    determines V and (2) the projection of V onto the Ln, i.e., Vn. The former is done by the

    assumed base model G and the latter – by the Legendre-Fourier series expansion of the

    quantile functions of V , denoted by QV .

    7

  • In summary, there are two important advantages of this methodology. Firstly, as pointed

    out earlier, it can improve a base model if there are features of the data not captured properly

    by the model. Secondly, the only way to solve problem (4) efficiently when the Taylor

    series technique is not applicable is through simulation. Because the method provides an

    approximate model by its quantile function which appears almost in closed form, it greatly

    facilitates the Monte Carlo method. The L-moments appear in a natural way because we

    choose to work with the quantile function.

    3.1 L-momentsIntroduced by Sillito (1951) and popularized later by Hosking (1990), L-moments describe

    geometric properties of the probability distribution of a random variable just like the ordinary

    moments. They are theoretically more appealing due to their relative robustness and, just

    like the ordinary moments, they are easy to interpret. Further on, any random variable

    with a finite variance can be characterized in terms of its L-moments. To the best of our

    knowledge, L-moments have not been extensively applied in the field of finance. Shalit and

    Yitzhaki (1984) explore the mean-Gini framework as an alternative to mean-variance; the

    Gini measure is related to the second L-moment.

    L-moments are defined as weighted averages of the quantile function in which the weights

    are determined by the shifted Legendre polynomials. More precisely, the L-moment of order

    r ≥ 1, for the random variable X, is defined by

    λr =

    ∫ 10

    Q(u)P ∗r−1(u)du

    where Q(u) = inf{x : P (X ≤ x) ≥ u} is the inverse distribution function of X and

    P ∗j (u) =

    j∑k=0

    p∗j,kuk

    in which

    p∗j,k = (−1)j−k(j

    k

    )(j + k

    k

    )=

    (−1)j−k(j + k)!(k!)2(j − k)!

    .

    For a detailed discussion of L-moments, see Hosking (2005).

    Finally, the sample L-moments given by

    λ̂r =r∑

    k=0

    p∗r,kbk (6)

    8

  • where bk = n−1∑n

    j=k+1(j−1)(j−2)...(j−k)(n−1)(n−2)...(n−k)Xj:n in which X1:n ≤ X2:n ≤ . . . ≤ Xn:n is the sorted

    sample are unbiased estimators of λr, see Hosking (2005) for additional details.

    Even though L-moments are different from the classical moments, they describe similar

    geometric properties of the return distribution. For example, λ1 = EX, λ2 equals 1/2 of

    Gini’s coefficient and is a measure of scale, λ3 and λ4 are measures of skewness and kurtosis,

    respectively. In contrast to the classical moments which appear naturally in portfolio theory

    from Taylor expansion of expected utility, L-moments of a return distribution arise naturally

    from an expansion of its quantile function.

    3.2 The approximation and rate of convergenceThe shifted Legendre polynomials form an orthogonal basis in the space of square-integrable

    functions defined on the interval [0, 1] denoted here by L2(0, 1). The squared L2-norms of

    the polynomials equal ||P ∗r−1||22 = 1/(2r − 1), where r ≥ 1.6 The connection with the space

    of random variables with a finite second moment is that if EX2 < ∞, then the quantile

    function QX ∈ L2(0, 1) and the L-moments appear in the Legendre-Fourier series expansion

    of QX ,

    QX(u) =∞∑k=1

    akP∗k−1(u), (7)

    where ak = (2k − 1)λk.

    An approximation of the portfolio return quantile function through the first four terms

    in (7) incorporates its skewness and kurtosis as measured by the third and the fourth L-

    moment. Even though this is not a direct decomposition of the objective function in (4),

    it is straightforward to generate simulations, evaluate the corresponding expectations in

    equation (3), and solve the optimization problem numerically.

    Such a naive attempt to approximate the portfolio return quantile function through (7)

    has two problems. First, simply truncating the right hand-side of (7) may not lead to a

    proper quantile function. Second, even if it does, the approximation quality may be bad.7

    We resolve the two problems in the following way. Instead of working directly with the

    portfolio returns, we start from a given initial model G which is then improved by including

    6See, for example, Lebedev (1972).7In fact, the support of a random variable arising in this way is bounded.

    9

  • L-moments of the transformed portfolio returns. More precisely, we use (7) to approximate

    the quantile function QV (u) of the random variable V = G(X) in Figure 1. We demonstrate

    that the approximation quality in the center of the portfolio return distribution improves

    quickly while the tails are essentially determined by G. This resolves the second issue. The

    first problem is resolved by a projection method which is developed in this section.

    We consider a non-atomic probability space (Ω,A, P ) and the space X = L2(Ω,A, P )

    of random variables X : Ω → R having a finite second moment. Denote the set of square-

    integrable random variables with zero L-moments of order higher than n by Ln,

    Ln = {X ∈ X : λk(X) = 0, k > n}. (8)

    The following properties hold.8

    Proposition 1. If Ln is defined as in (8), then

    L1 ⊂ L2 ⊂ . . . ⊂ L∞ = X .

    Furthermore, L1 is the set of all constants and L2 is the set of all uniform distributions on

    [a, b]. Finally, the quantile function of Y ∈ Ln, n ≥ 1, is given by

    QY (u) =n−1∑i=0

    yi+1

    (n− 1i

    )ui(1− u)n−1−i (9)

    where y1 ≤ y2 ≤ . . . ≤ yn and the support of Y is the interval [y1, yn]

    Denote the L-moments of the transformed random variable V by λ1, λ2, . . . and define

    ak = (2k − 1)λk, k ≥ 1. The quantile function of the approximation Vn is given by

    QVn(u) =n−1∑i=0

    y∗i+1

    (n− 1i

    )ui(1− u)n−1−i (10)

    where

    y∗ = arg miny||a−By||1

    s.t. yi ≤ yj, i ≤ j

    in which ||x − z||1 =∑|xi − zi|, a = (a1, . . . , an−1), y = (y1, . . . , yn−1) with the additional

    constraint that y0 = 0 and yn−1 = 1, and B = {βi,j}ni,j=1 is an n× n matrix with elements,

    βi,j = (2i− 1)∫ 10

    (n− 1j − 1

    )uj−1(1− u)n−jP ∗i−1(u)du. (11)

    8Proofs are provided in the Appendix.

    10

  • A rate of convergence result and an explicit formula for βi,j are provided in the following

    theorem. The convergence rate is stated in terms of the uniform distance between the

    unknown distribution function FX and that of the approximation FXn .

    Theorem 1. Suppose X ∈ X with density fX = F ′X and assume a parametric hypothesis for

    X defined by a cdf G with a density g = G′. Set V = G(X) and assume that FV = FX ◦G−1

    has absolutely continuous derivatives F ′V , . . . , F(k)V . If QVn is constructed as in (10) and

    Xn = G−1(Vn), then

    supx|FX(x)− FXn(x)| ≤

    C1

    (Cωk(

    1n)∑n

    m=1

    √2m− 1

    nk+

    Ck+1

    k(n− 32) . . . (n− 2k+1

    2)

    √π

    2(n− k − 2)

    ) (12)where C1 =

    (1 + supx

    fX(x)g(x)

    ), Ck+1 = 2

    ∫ 10

    |F (k+1)V (u)|√u(1−u)

    du which is assumed finite, C is an

    absolute constant, and ωk(x) is the modulus of continuity of F(k)V . The matrix elements of B

    in (10) are provided by the expression

    βi,j = (2i− 1)(n− 1j − 1

    ) i−1∑l=0

    (−1)i−1+l

    j + l

    (i−1l

    )2(n+i−1j+l

    ) .In practice, the approximation is constructed by matching the first n sample L-moments

    in (6) and the optimization problem in (10) is solved as a linear program. The interpretation

    of (12) is that depending on the smoothness of FV , the approximation quality improves

    quickly at the center.

    3.3 On the role of the parametric assumptionBy design, as n → ∞ the L-moments of Vn match the L-moments of V and therefore Vnconverges in distribution to V by (7) regardless of the choice of G. By the continuous

    mapping theorem, then Xn converges in distribution to X regardless of G. Because G is our

    “initial guess” for FX , then what is the price of a bad parametric hypothesis?

    As an extreme example, consider the case in which we completely avoid specifying G by

    projecting X directly on Ln. Then, by Proposition 1 any approximation based on a finite

    number of L-moments is a bounded random variable which implies that the tails of X are

    very poorly approximated.

    The result in (12) indicates that G has an impact on the rate of convergence. As another

    11

  • extreme case, assume G is chosen very well and is very “close” to FX . In fact, we assume

    that the cdf FV of V = G(X) is “close” to a uniform distribution in the following sense:

    F ′V (u) < 1 + � and |F ′′V (u)| < �. Then, the right hand-side of (12) can be simplified:

    C1 = �, Ck+1 = C2 ≤ �2π, ω1(1/n) ≤ �/n

    and we obtain

    supx|FX(x)− FXn(x)| ≤ �(2 + �)

    (C∑n

    m=1

    √2m− 1

    n2+

    π√

    (n− 32)√n− 3

    ).

    Furthermore, if higher-order derivatives can be bounded by � then the rate of convergence

    increases.

    3.4 An exampleWe provide an empirical example which illustrates the quality of the approximation. We

    take the daily returns of S&P500 index in the two-year period 2015-2016 and we calculate

    the approximations for different n with G following a fitted Gaussian distribution.

    The results are provided in Figure 2. The upper panel shows a histogram with the

    data together with the fitted parametric assumptions and the approximations for n equal to

    three, four, and five. The case n = 2 is not of interest because it reproduces the parametric

    hypothesis.

    The lower panel shows box-plots of 50 bootstrapped values of the Kolmogorov-Smirnov

    (KS) distance between the data and the corresponding approximation for n = 2, . . . , 10. The

    upper panel illustrates the improvement visually.

    Insert Figure 2 about here

    Clearly, increasing n leads to a significant improvement in the KS distance in both cases

    and the improvement is more dramatic when the selected model is the Gaussian which comes

    to no surprise given that the daily stock returns are known to be leptokurtic.

    4 MethodologyThe empirical testing of the out-of-sample performance for the L-moment approximation and

    GDA is mostly based the empirical methodology adopted by Jagannathan and Ma (2003)

    and Martellini and Ziemann (2010).

    12

  • Our methodology is in the same line as Martellini and Ziemann (2010) and deviates from

    Jagannathan and Ma (2003) in two important aspects. First, we extract one hundred ran-

    domly chosen baskets of stocks of dimension 10, 20, and 30 stocks from the CRSP database,

    rather than considering a single out-of-sample portfolio. Second, instead of resampling every

    year, we hold constant the universe of stocks for each portfolio. This procedure has several

    advantages. Firstly, we are able to measure the stability of each of the constructed port-

    folios based on distribution assumptions for factors and errors. This is in the same line as

    Martellini and Ziemann (2010) in which the estimation quality of the underlying estimators

    is assessed by fixing the universe of each constructed portfolios. Secondly, this procedure

    allows us to check the robustness of the constructed portfolios for each randomly chosen

    basket of stocks.

    We collect monthly returns on common stocks listed on the NYSE, NYSE Amex Equities

    (formerly know as the AMEX), and NASDAQ stock exchanges, with a sample period ranging

    from January 1975 until December 2017. We discard the penny stocks from sample.9 We

    obtain a valid series for 306 stocks from the CRSP database. We build optimal portfolios

    at the end of April each year estimating model parameters using the previous 60 monthly

    returns. As in Jagannathan and Ma (2003) and Martellini and Ziemann (2010), the optimized

    portfolios are held throughout the next 12 months until a new allocation decision takes place.

    As a result, we have 38 years of monthly out-of-sample data.10

    Our main objective is to investigate the impact of incorporating information about higher-

    order moments in the portfolio construction problem for investors with GDA preferences.

    To this end, the rebalancing problem is solved under four different assumptions. Firstly,

    we assume the factor structure (5) for the returns of stocks in which the distribution of the

    factor returns and residuals can be semi-parametric (i.e. approximated through L-moments

    with a Gaussian base model) or parametric (i.e. Gaussian or normal inverse Gaussian).

    After estimation of the factor and residuals, a sample of 10,000 scenarios is generated for

    9Penny stocks are stocks with a listed prices smaller than $5.10After the portfolio is rebalanced at the end of April of a given year, we maintain a fixed-mix strategy ona monthly basis until the portfolio is rebalanced again in April of the following year. In contrast, Martelliniand Ziemann (2010) employ buy-and-hold. A fixed-mix strategy maintains the optimal allocation until thenext rebalancing while a buy-and-hold leads to overweighting of stocks with rising prices. Regardless, aslong as one and the same method is followed for all portfolios and models, the out-of-sample results arecomparable.

    13

  • the stock returns through (5).11 Based on the generated scenarios, the optimal weights are

    obtained by solving the optimization problem in (4) in which the mathematical expectation

    is computed through the Monte Carlo method. In this way, the pure Gaussian case is nested

    in both the semi-parametric case and the NIG case within the factor model structure. That

    is, if the factor returns and the residuals happen to be Gaussian, then there should be no

    incentive out-of-sample to employ a more complicated model. We consider the NIG case as

    a robustness check; that is, we test if the conclusions that we reach with the semi-parametric

    model are also valid under a complete parametric specification which includes the Normal

    distribution as a special case.

    Secondly, we treat the multivariate normal as the data generating process for the distri-

    bution of the stock returns directly without assuming a factor structure for the covariance

    matrix. Then, the objective function in (4) is optimized based on 10,000 simulated returns

    from multivariate normal with a covariance matrix obtained through the sample estimator.

    Thirdly, the optimal weights for rebalancing are obtained from global minimum variance

    (GMV) using again the sample estimator for the covariance matrix.12

    Finally, in order to compare the empirical performance of the L-moments with the classi-

    cal higher-order moment portfolio model as in Martellini and Ziemann (2010), the portfolio

    is rebalanced such that investor’s optimization problem is approximated based on the first

    four moments. In this case, the investor’s preferences follow the standard power utility func-

    tion approximated through the Taylor series using the first four moments – mean, variance,

    skewness, and kurtosis. This case is equivalent to the objective function of Martellini and

    Ziemann (2010).13

    The semi-parametric construction requires choosing the L-moment order n. Theorem 1

    indicates that the approximation improves with n and one should thus consider an order as

    higher as possible. On the other hand, increasing n leads to more parameters and therefore

    higher estimation risk. The choice of n has to be predicated on the right balance between

    11A separate simulation study indicates that the estimation noise in the optimal solution when working withabout 500 observations for parameter estimation is about one order of magnitude higher than the MonteCarlo noise in the optimal solution caused by approximating the expectations in the objective function with10,000 scenarios. As a consequence, the Monte Carlo noise when working with 10,000 scenarios can be safelyignored.12The covariance matrix for the GMV problem is the same as multivariate normal assumption.13In particular, equation (10) of Martellini and Ziemann (2010).

    14

  • increasing accuracy and keeping estimation risk at minimum. The example in Figure 2

    implies that n = 4 is the lowest order at which we have the highest marginal improvement

    in the KS statistic.14

    To check whether this choice of n provides a better fit in-sample compared to the Gaussian

    base model for our data set, at each rebalancing point we estimated the KS distance between

    each stock and three models – the Gaussian, the semi-parametric with n = 4, and NIG with

    parameters estimated by maximum likelihood. We use the previous 60 months of stock

    returns in this calculation as well as in the portfolio construction problem. Because we have

    306 KS distances per model per rebalancing, we aggregate the results by computing the

    average distance at each rebalancing point per model and also the average distance across

    time for each stock per model.

    Insert Figure 3 about here

    Figure 3 provides a visual representation of the averages. The plot on the left shows that

    whenever a rebalancing decision is to be made, the average fit of the semi-parametric model

    as measured by the KS distance is better than that of the Gaussian and is slightly worse

    than the average quality of the NIG fit. The plot on the right implies that if we average

    the quality of the fit for each stock across time, we obtain a better distribution of the KS

    distances with the semi-parametric and NIG. As a result, there is a strong indication for an

    improved in-sample fit if the model can account for the skewness and kurtosis in the data,

    which is not surprising.

    It has been well understood that the sample mean is a poor estimator of the population

    mean, e.g. Jorion (1985), and estimation of the portfolio return is a challenging task. Hence,

    we follow Jagannathan and Ma (2003) and Martellini and Ziemann (2010) and neutralize

    14One difficulty with the example in Figure 2 is that the true data generating process is unknown and,thus, the distance between the approximation and the true model is also unknown; the Kolmogorov-Smirnovstatistic measures the distance between the model and the sample. However, we performed simulation studiesnot presented in the paper but available upon request in which the true model is known (Student t withfive degrees of freedom and NIG with realistic parameter values) and we found that n = 4 leads to the bestimprovement in the distance between the approximation and the true model for several measures of similarity– the uniform distance, the Anderson-Darling measure, and the Kullback-Leibler divergence. Also, the modelorder n = 4 leads to virtually the same approximation benefits regardless of whether the parameters in (1)are estimated through the projection method as in (10) or maximum likelihood. As a result, we choose towork with the projection method because of its computational simplicity.

    15

  • the impact of the expected return by centralizing the generated scenarios.

    In total, there are four different cases for portfolio construction and each case includes

    multiple modeling assumptions. In order to analyze the out-of-sample performance of the

    constructed portfolios, we compare the certainty equivalents for an investment under each

    of these models. In this context, a model is assumed be a combination of factor structure

    (or no such structure) and a some distributional assumption (through a semi-parametric or

    a parametric hypothesis). To carry out a comparison, we follow Ang and Bekaert (2002)

    and compute the economic loss or benefit that results from using a specific model instead

    of another one. Since we directly maximize the certainty equivalent for each model, it is

    straightforward to adopt the method of Ang and Bekaert (2002).

    We find the amount of wealth w̄ required to compensate an investor for using model i

    instead of model j for a T -period horizon for a given basket of stocks k in the following way.

    Denote the certainty equivalent of model i for basket k by µ(i,k)0,T . Then, w̄(k) is given by

    w̄(k) =µ(i,k)0,T

    µ(j,k)0,T

    . (13)

    Therefore, the compensation in cents per dollar of wealth for each basket k is given by

    w(k) = 100 × ((w̄(k))12 − 1) which is the annual required payment. We refer to w(k) as

    monetary utility gain and abbreviate it as MUG. The collection of all MUGs across all baskets

    is the MUG distribution. Using the certainty equivalent for comparing the performance of

    asset allocation under different scenarios is along the line of Campbell and Viceira (1999),

    Campbell and Viceira (2001), and Kandel and Stambaugh (1996).

    5 Empirical analysisIn this section, our main objective is to investigate empirically the economic value of in-

    corporating information about skewness and kurtosis of stock returns for investors with

    GDA preferences. Because the traditional Taylor series method is not feasible, we use the

    semi-parametric method as our main tool. As a robustness check, we also use the NIG

    distribution which can take into account skewness and kurtosis but is, in contrast, a fully

    specified parametric model.

    As a first step, we establish a common benchmark for the MUG calculation in (13) for

    16

  • the entire parameter space of GDA preferences. The case θ = 0 which corresponds to the

    classical power utility is easy because the standard global minimum variance (GMV) is the

    second-order approximation and can be used as a benchmark. The general case is, however,

    non-trivial because a second-order approximation cannot be derived in the same way.

    To resolve this problem, we calculate (13) with model i corresponding to the multivariate

    normal distribution for the stock returns and model j corresponding to the GMV portfolio

    computed with the same covariance matrix through the sample estimator. These two models

    are equivalent for power utility at any risk-aversion level γ but it is unclear if the same

    statement holds for the more general GDA case.

    Table 1 provides the corresponding MUGs statistics. Panel A of Table 1 shows the

    MUGs statistics of an investor with standard utility preference. Hence, the multivariate

    normal (MV) is essentially the minimum variance problem. Panel B addresses the MUGs

    statistics for GDA preferences. The data in Table 1 suggest that the out-of-sample returns

    between MV and GMV is not significant. For example, the mean of MUGs for an investor

    with power utility (GDA) preference ranges from −0.010%(−0.024%) to −0.003%(−0.009%)

    per annum when using MV instead of GMV. As a consequence, we adopt the GMV portfolio

    as a benchmark for the entire parameter space of GDA preferences. Therefore, any out-

    of-sample benefits of including information about higher-order moments or other modeling

    assumptions should translate into a positive average MUG relative to the GMV case for any

    choice of parameters.

    Insert Table 1 about here

    As a second step, we establish that the single-factor semi-parametric model has a similar

    performance to the single-factor higher-order moments model as in Martellini and Ziemann

    (2010) for a power utility investor. This justifies its application in assessing the value added

    of incorporating higher-order moments for GDA preferences where the classical higher-order

    moments approximation is not feasible. Thus, in order to carry out the comparison, we

    assume the factor structure in (5) for the stocks returns in which the distribution of the

    factor returns and residuals is modeled through the semi-parametric technique. We refer to

    this case as SF semi-parametric. Furthermore, as a robustness check, we also use the normal

    17

  • inverse Gaussian (NIG) as a parametric distribution of the factor returns and residuals in

    (5) and we abbreviate this case by SF NIG.

    Table 2 shows the out-of-sample MUGs statistics for the SF semi-parametric and SF

    NIG against the higher-order moments approximation when the sample estimator is used

    in the case of power utility with the risk aversion parameter γ = 10. A comparison of

    MUGs mean statistics reveals that by using SF semi-parametric (SF NIG) against the

    higher-order moments approximation one has an increase in the average MUGs reaching

    from 2.049%(1.678%) to 5.245%(5.0245%). According to the Ang and Bekaert (2002) and

    Martellini and Ziemann (2010), where they use the same level of risk aversion and investment

    horizon, these results are highly economically significant. Another inference from Table 2 is

    that by increasing the number of stocks in the portfolio, there are higher benefits in adopting

    SF semi-parametric or SF NIG model. In fact, the results in Table 2 are very similar to the

    corresponding comparison in Table 4 of Martellini and Ziemann (2010) which indicates that

    the SF semi-parametric model works similarly to the single-factor higher-order moments

    model in Martellini and Ziemann (2010).

    Insert Table 2 about here

    We proceed with the main objective of the study which we carry out in two steps. Firstly,

    we calculate the MUGs of the SF semi-parametric model relative to GMV in which the

    covariance matrix is estimated with the sample estimator for a GDA investor with parameters

    θ = 1, 2, δ = 0.96, 0.98, 1, 1.02, 1.04, and γ = 1, 5, 10, 15. This set of parameters is considered

    also in Dahlquist et al. (2017) and Routlege and Zin (2010). Secondly, we calculate the

    MUGs of the SF semi-parametric model relative to a single-factor Normal model for the

    same parameter set.15 We also repeat the same steps for the SF NIG model to check the

    robustness of the conclusions.

    Figures 4 and 5 display the average monetary utility gain (MUG) of the SF semi-

    parametric and SF NIG model computed relative to the GMV portfolio for different choices

    of the GDA parameters. The average is calculated across 100 basket of stocks of size 10, 20,

    and 30, respectively. The plots reveal that by increasing the level of risk aversion from γ = 115The single-factor Normal model in the case of power utility is equivalent to GMV in which the covariancematrix is estimated through the same single-factor model.

    18

  • to γ = 15, the average of MUGs increases across all parameter combinations. This increase

    is strongly pronounced for 30 stocks, which the average of MUGs is changing from 3%(4%)

    to 10%(12%) when θ = 1(θ = 2) for SF semi-parametric model. The same conclusion holds

    for the SF NIG model. Increasing θ also leads to higher risk-aversion and also improves the

    average MUGs.

    Insert Figure 4 and 5 about here

    Figure 6 provides the same comparison but limited to power utility function which arises

    when θ = 0. Both plots confirm the same trends with respect to changes in γ, number of

    stocks, and also θ when compared to the corresponding cases in Figures 4 and 5.

    Insert Figure 6 about here

    5.1 Costs of ignoring skewness and kurtosisThe empirical analysis confirms that there is significant economic value in working with a

    structural model capable of accounting for the skewness and kurtosis present in the data.

    There are two possible sources of value added – the structural assumption (single factor

    versus sample estimators) and the capability to recognize presence of non-Gaussianity in the

    data. In this section, we identify the main source of value added by computing the MUGs of

    SF semi-parametric relative to the SF Normal model. In essence, we compare out-of-sample

    two investors with the same GDA preferences (the same set of input parameters) who both

    agree on the factor structure of the stock returns. However, one of them ignores the skewness

    and kurtosis in the data and behaves as if the multivariate return distribution is Gaussian.

    Figure 7 provides the corresponding average MUGs for the same combinations of param-

    eters. Clearly, the out-of-sample benefits of incorporating information about higher-order

    moments are from 0% to about 0.03% in the best case with no clear trends with respect to

    γ or θ. The economic significance of these numbers is marginal.

    The SF NIG case presented in Figure 8 is slightly different in that the average MUGs are

    negative from about -0.3% to about 0%. That is, the GDA investor who behaves as if the

    multivariate return distribution is Gaussian is better off by a small margin for any parameter

    combination.

    19

  • Our conclusions in the special case of power utility are consistent with those of Martellini

    and Ziemann (2010) and Das and Uppal (2004). In the more general case of GDA preferences,

    our conclusions indicate that the substantial in-sample cost of ignoring skewness reported in

    Dahlquist et al. (2017) does not materialize out-of-sample for a pure equity investor.

    6 ConclusionGeneralized disappointment aversion (GDA) preferences suggested in Routlege and Zin

    (2010) generalize power expected utility by penalizing outcomes below a threshold related to

    the certainty equivalent. The model has been applied to portfolio theory in Dahlquist et al.

    (2017) where a fund separation theorem is derived in a single factor model setting with full

    parametric specifications for the factor returns and residuals. The authors conclude that for

    a GDA investor there is a high marginal utility loss of ignoring skewness in contrast to a

    power utility investor. That is, there is a significant marginal utility loss if a GDA investor

    incorrectly employs the Gaussian distribution to make decisions.

    An out-of-sample analysis of this question is non-trivial. The methodology developed

    by Martellini and Ziemann (2010) and Ang and Bekaert (2002) relies on a Taylor series

    approximation to study the effects of ignoring skewness and kurtosis or the marginal impact

    of imposing a structural hypothesis. This technique, however, is not applicable for GDA

    preferences.

    In this paper, we develop a novel approach which relies on a non-parametric improvement

    of a given base model, in this case – the Gaussian distribution. To assess the out-of-sample

    cost of ignoring skewness and kurtosis or a factor structure, we combine the new semi-

    parametric method with a single factor model and we compute the marginal utility gains

    (MUGs) relative to a global minimum variance (GMV) investor and to an investor making

    decisions through a single factor model with normally distributed factor returns and resid-

    uals. We find that the out-of-sample average MUGs of the single factor semi-parametric

    specification relative to GMV are economically significant for all parameter combinations.

    However, the most important source of value added is the structural assumption rather than

    higher-order moments.

    As a result, for a GDA investor the cost of ignoring skewness and kurtosis as measured

    20

  • by the out-of-sample average MUGs appears to be negligible, up to 0.03% which is similar to

    the values reported by Martellini and Ziemann (2010) and Das and Uppal (2004) for power

    utility investors. For GDA investors, however, our results imply the cost of ignoring skewness

    reported in Dahlquist et al. (2017) does not materialize out-of-sample.

    21

  • Table 1: Monetary utility gains statistics of multivariate normal optimal portfolios (mean-variance) relative to global minimum variance (GMV).Monetary utility gain (MUG) statistics of multivariate normal optimal portfolios computedwith the Monte Carlo method relative to the global minimum variance portfolio with thesame covariance matrix. At the end of April each year by using the prior 60 months, optimalportfolio weights are obtained for two assumptions: (M1) a multivariate normal distributionis the data generating process for the return of stocks, and; (M2) the GMV. These weightsare then applied to form portfolios that are held until the next estimation (end of April ofnext year). The MUG is defined as the annual payment that an investor requires in orderto be indifferent between a portfolio based upon the M1 and a portfolio based upon M2.The statistics are based on 100 random portfolios of 20 stocks with annual rebalancing. Thepower utility function corresponds to GDA utility with θ = 0. Std denotes the standarddeviation and Med the median. Each portfolio consists of eligible stocks listed on the NYSE,the NYSE Amex Equities, and NASDAQ stock exchanges. The historical data are fromJanuary 1975 through December 2017. Panel A reports the statistics for different valuesof the risk-aversion parameter, i.e., γ, for the power utility function and Panel B includesresults for different values of the loss threshold δ for GDA with γ = 10 and θ = 1.

    Mean Std Min 5% Median 95% Max

    Panel A: Power utilityγ = 1 -0.003 0.063 -0.182 -0.114 -0.004 0.104 0.147γ = 5 -0.006 0.068 -0.220 -0.127 -0.012 0.107 0.145γ = 10 -0.010 0.098 -0.259 -0.162 -0.012 0.132 0.433γ = 15 -0.009 0.213 -0.651 -0.261 -0.018 0.291 1.354Panel B: γ = 10, θ = 1δ = 0.96 -0.024 0.186 -0.518 -0.277 -0.030 0.231 0.945δ = 0.98 -0.026 0.183 -0.565 -0.275 -0.035 0.262 0.909δ = 1 -0.014 0.146 -0.289 -0.213 -0.030 0.215 0.680δ = 1.02 -0.016 0.113 -0.319 -0.177 -0.018 0.127 0.525δ = 1.04 -0.009 0.110 -0.286 -0.159 -0.013 0.146 0.555

    22

  • Table 2: Monetary utility gains statistics for standard higher moments (non-parametric),L-moments (semi-parametric), and normal inverse Gaussian (parametric, NIG).Monetary utility gain (MUG) statistics of semi-parametric with two L-moments (SF Normal)against to global minimum variance (GMV). MUG statistics of semi-parametric with fourL-moments (SF semi-parametric) against to first four sample moments (non-parametric).MUG statistics of a parametric case, where the distribution of the factor and residuals follownormal inverse Gaussian (SF NIG) against to first four sample moments (non-parametric).At the end of April each year by using the prior 60 months, optimal portfolio weights areobtained for each scenario. These weights are then applied to form portfolios that are helduntil the next estimation (end of April of next year). The MUG is defined as the annualpayment that an investor requires in order to be indifferent between a portfolio based upontwo scenarios. The statistics are based on 100 random portfolios of 10, 20, and 30 stockswith annual rebalancing. The power utility function corresponds to GDA utility with θ = 0.Std denotes the standard deviation and Med the median. Each portfolio consists of eligiblestocks listed on the NYSE, the NYSE Amex Equities, and NASDAQ stock exchanges. Thehistorical data are from January 1975 through December 2017. The investor preferencefollows a power utility where the risk aversion parameter γ = 10.

    Mean Std Min 5% Median 95% Max

    Panel A: N = 10SF Semi-parametric 2.049 1.949 -1.398 -0.236 1.810 4.853 14.850SF NIG 1.678 2.369 -12.823 -0.611 1.621 4.386 13.446Panel B: N = 20SF Semi-parametric 3.347 1.639 -0.675 1.134 3.281 6.222 9.962SF NIG 3.196 1.670 -0.672 0.926 3.065 5.804 9.882Panel A: N = 30SF Semi-parametric 5.245 1.904 1.160 2.236 5.195 8.595 10.045SF NIG 5.044 2.077 -3.708 1.683 4.975 8.398 9.770

    23

  • G−1(U)

    G−1(V3)

    G−1(Vn)

    G−1(V4)

    Projections of V

    Random variableswith finite variance Random variables with support in [0, 1]

    U(0, 1)

    X

    V

    V3

    V4

    Vn

    Z = X2

    X3

    X4

    Xn

    ...

    L3

    L4

    ......

    Ln

    G(X)

    Figure 1: The distribution of X is approximated by the distribution G of Z, both X andZ are assumed to have finite variance. The assumed model G transforms X intoV = G(X) with support in [0, 1] which is approximated through a projection Vnon the set Ln. The projection is then transformed back into the original spaceas an improved model for X, Xn = G−1(Vn). The notation U(0, 1) stands for theuniform distribution in [0, 1] which is the only member of the set L2.

    sample

    Fre

    quen

    cy

    −0.04 −0.02 0.00 0.02 0.04

    020

    4060

    80

    DataSemi−parametric, n = 3Semi−parametric, n = 4Semi−parametric, n = 5Normal fit

    2 3 4 5 6 7 8 9 10 11

    0.00

    0.04

    0.08

    0.12

    Number of L−moments

    KS

    Dis

    tanc

    e

    Normal Distribution

    Figure 2: A histogram of the S&P500 daily returns in 2014-2016 compared to the fittedG, Gaussian (left), and the corresponding approximations for different n andbootstrapped Kolmogorov-Smirnov distance (right) between the sample and thecorresponding approximations for n = 2, . . . , 10.

    24

  • 1980 1985 1990 1995 2000 2005 2010 2015 20200.07

    0.075

    0.08

    0.085

    0.09

    0.095

    0.1

    0.105

    0.11

    0.115

    0.12

    Normal

    Semi-parametric

    NIG

    Normal Semi-parametric NIG

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    Figure 3: The average Kolmogorov-Smirnov (KS) distance of all stocks at each rebalancing(left) and boxplots of the average Kolmogorov-Smirnov distance of all stocks inwhich the averaging is across time (right). The semi-parametric model uses fourL-moments and the NIG parameters are estimated by maximum likelihood. Theparameter estimation and the calculation of the KS distance is performed withthe previous 60 monthly returns.

    25

  • 0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12A

    vg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    10 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    10 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    20 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    20 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    30 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    30 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    Figure 4: The average out-of-sample monetary utility gain (MUG) of the single factor semi-parametric model with n = 4 computed relative to the global minimum variance(GMV) portfolio for different choices of the generalized disappointment aversion(GDA) parameters. The average is calculated across 100 basket of stocks of size10, 20, and 30, respectively. γ is the risk aversion parameter, θ is the sensitivityto the disappointment aversion, and δ is the level of disappointment aversion.

    26

  • 0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12A

    vg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    10 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    10 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    20 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    20 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    30 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.040

    2

    4

    6

    8

    10

    12

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    30 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    Figure 5: The average out-of-sample monetary utility gain (MUG) of the single factor NIGmodel computed relative to the global minimum variance (GMV) portfolio fordifferent choices of the generalized disappointment aversion (GDA) parameters.The average is calculated across 100 basket of stocks of size 10, 20, and 30, respec-tively. γ is the risk aversion parameter, θ is the sensitivity to the disappointmentaversion, and δ is the level of disappointment aversion.

    27

  • 10 20 30

    Number of stocks

    0

    1

    2

    3

    4

    5

    6

    7

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    Semi-parametric, = 0

    = 1

    = 5

    = 10

    = 15

    10 20 30

    Number of stocks

    0

    1

    2

    3

    4

    5

    6

    7

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    NIG, = 0

    = 1

    = 5

    = 10

    = 15

    Figure 6: The average out-of-sample monetary utility gain (MUG) of the single factor semi-parametric model (left) and single factor normal inverse Gaussian model (right)computed relative to global minimum variance (GMV) for power utility for dif-ferent choices of the risk aversion. The average is calculated across 100 basket ofstocks of size 10, 20, and 30, respectively. γ is the risk aversion parameter and θis the sensitivity to the disappointment aversion.

    28

  • 0.96 0.98 1 1.02 1.04-0.01

    -0.005

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    10 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.04-0.01

    -0.005

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    10 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.04-0.01

    -0.005

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    20 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.04-0.01

    -0.005

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    20 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    10 20 30

    Number of stocks

    -0.005

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    = 0

    = 1

    = 5

    = 10

    = 15

    Figure 7: The average out-of-sample MUG of the single factor semi-parametric model withn = 4 computed relative to the single factor Normal model for different choicesof the GDA parameters and the power utility case with θ = 0 (bottom). Theaverage is calculated across 100 basket of stocks of size 10 and 20, respectively.

    29

  • 0.96 0.98 1 1.02 1.04-0.35

    -0.3

    -0.25

    -0.2

    -0.15

    -0.1

    -0.05

    0

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    10 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.04-0.35

    -0.3

    -0.25

    -0.2

    -0.15

    -0.1

    -0.05

    0

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    10 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.04-0.35

    -0.3

    -0.25

    -0.2

    -0.15

    -0.1

    -0.05

    0

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    20 Stocks, = 1

    = 1

    = 5

    = 10

    = 15

    0.96 0.98 1 1.02 1.04-0.35

    -0.3

    -0.25

    -0.2

    -0.15

    -0.1

    -0.05

    0

    Avg

    MU

    G r

    ela

    tive

    to

    SF

    No

    rma

    l (in

    %)

    20 Stocks, = 2

    = 1

    = 5

    = 10

    = 15

    10 20 30

    Number of stocks

    -0.15

    -0.1

    -0.05

    0

    Avg

    MU

    G r

    ela

    tive

    to

    GM

    V (

    in %

    )

    = 0

    = 1

    = 5

    = 10

    = 15

    Figure 8: The average out-of-sample MUG of the single factor NIG model computed rela-tive to the single factor Normal model for different choices of the GDA parametersand the power utility case with θ = 0 (bottom). The average is calculated across100 basket of stocks of size 10 and 20, respectively.

    30

  • A Technical Appendix

    A.1 GDA preferences and Taylor approximationA standard approach is to compute the Taylor expansion of the utility function u(x) around

    the point x = EX and then to integrate the expansion with respect to the distribution of X.

    Thus, on the left hand-side we get Eu(X) and on the right hand-side we get a sum of terms

    of the type u(k)(EX)(E(X − EX)k)/k!. This approach is sensible only if u(x) is sufficiently

    smooth. Otherwise, the integrated expansion may not coincide with Eu(X).

    The expression in (1) is harder to deal with in this fashion because the certainty equivalent

    is implicitly defined through the equation. However, one can attempt to approach the

    problem differently in the following way. The right hand-side of equation (1) can be restated

    as

    Ev(W,m) = E(W 1−γ − θ

    [(δ1−γm1−γ −W 1−γ)I{W≤δm}

    ])in which v(x,m) is a function of x and m. The certainty equivalent is then defined as

    µ : µ1−γ = Ev(W,µ). Similarly to the standard method, we can try to approximate v(x,m)

    for any fixed m through a Taylor expansion at x = EW . If for each fixed m the series

    expansion of v converges for all x in the support of W , then the expansion can be integrated

    to obtain an approximation for Ev(W,m). Unfortunately, the indicator function makes

    the function v non-smooth, therefore, the Taylor series expansion does not converge for all

    x in the support of W .16 The series expansion, then, cannot be integrated to obtain an

    approximation for Ev(W,m).

    A.2 ProofsProof of Proposition 1. The inclusion Lm ⊂ Ln, m ≤ n, is obvious from the definition.

    A sequence of L-moments uniquely identifies the quantile function by the Legendre-Fourier

    series expansion in (7) and completeness of L2([0, 1]) and therefore L∞ = X .

    To prove that (9) holds, first notice that (9) is a Bernstein polynomial with non-decreasing

    coefficients and therefore QY (u) is an increasing function in [0, 1] (see, for example, Lorentz

    16Unless the distribution of W is such that P(W ≤ δm) = 0 or P(W ≤ δm) = 1. In such a case, however,GDA preferences are uninteresting because either there is no penalty at all (no outcomes disappoint) andwe are back to power utility or all outcomes are penalized.

    31

  • (1986)) and is, therefore, a quantile function of a r.v. Because it is a polynomial of order

    n − 1, the coefficients of order larger than n in the Legendre-Fourier expansion in (7) are

    zeros.

    In the opposite direction, choose Y ∈ Ln. Its quantile function is then a polynomial of

    order n− 1 and can be expressed in a Bernstein form, which is (9). The set L1 contains all

    r.v.s with constant quantile functions and L2 contains all r.v.s with linear quantile functions.

    Finally, QY (0) = y1 and QY (1) = yn. �

    Lemma 1. Suppose that X has a density fX and that G has a density g. Then,

    supx∈R|FX(x)− FXn(x)| ≤

    (1 + sup

    x

    fX(x)

    g(x)

    )sup

    0≤u≤1|QV (u)−QVn(u)|. (14)

    where V = G(X) and Vn = G(Xn).

    Proof. Suppose Y and Z are r.v. and Y has a density F ′Y . We take advantage of the following

    inequality,

    ρ(Y, Z) ≤ (1 + supxF ′Y (x))L(Y, Z) (15)

    in which ρ(Y, Z) = supx∈R |FY (x) − FZ(x)| denotes the Kolmogorov metric and L(Y, Z)

    denotes the Lévy metric with the following definition in terms of the inverse cdfs,

    L(Y, Z) = inf{� > 0 : QY (t− �)− � ≤ QZ(t), (16)

    QZ(t− �)− � ≤ QY (t), ∀� ≤ t ≤ 1}

    see (Zolotarev, 1997, Chapter 1.5) and (Rachev et al., 2013, Chapter 4.2). From (16) it

    follows that

    QY (t− �)−QZ(t− �) ≥ � and QZ(t− �)−QY (t− �) ≥ �, ∀� ≤ t ≤ 1

    and therefore L(Y, Z) = � ≤ sup0≤u≤1 |QY (u)−QZ(u)|. Combining this inequality with (15)

    yields

    supx∈R|FY (x)− FZ(x)| ≤

    (1 + sup

    xF ′Y (x)

    )sup

    0≤u≤1|QY (u)−QZ(u)|.

    Because ρ(Y, Z) is invariant of monotonic transformations, we have the following equality:

    32

  • ρ(X,Xn) = ρ(V, Vn). We apply the inequality above to ρ(V, Vn) and we obtain

    supx|FX(x)− FXn(x)| ≤

    (1 + sup

    xF ′V (x)

    )sup

    0≤u≤1|QV (u)−QVn(u)|.

    The result then follows by noticing that F ′V (x) = dFX(G−1(x))/dx = fX(G−1(x))/g(G−1(x))

    Proof of Theorem 1. Choose a r.v. Y ∈ Ln. The quantile function is a polynomial and

    can be written in a Legendre form using a Legendre-Bernstein basis transformation,

    QY =n−1∑l=1

    ãlP∗l−1(u),

    in which ã = (ã1, . . . , ãn) is computed as ã = By where the matrix elements of B are given

    in (11), see Farouki (2000). The absolute difference of the quantile functions of V and Y can

    be bounded in the following way:

    |QV (u)−QY (u)| ≤

    ∣∣∣∣∣∞∑l=1

    alP∗l−1(u)−

    n∑l=1

    ãlP∗l−1(u)

    ∣∣∣∣∣≤

    n∑l=1

    |al − ãl||P ∗l−1(u)|+∞∑

    l=n+1

    |al|

    ≤n∑l=1

    |al − ãl|+∞∑

    l=n+1

    |al| = S1 + S2

    where the last inequality follows from the property |P ∗k (u)| ≤ 1, ∀k. We deal with the two

    terms separately.

    Wang and Xiang (2012) derive an upper bound for S2 under the assumptions in the

    current proposition,

    S2 ≤Ck+1

    k(n− 32) . . . (n− 2k+1

    2)

    √π

    2(n− k − 2),

    where Ck+1 = 2∫ 10

    |F (k+1)V (u)|√u(1−u)

    du. We use the substitution u = (x+ 1)/2, x ∈ [−1, 1] to derive

    Ck+1 from the expression in Wang and Xiang (2012) in order to translate it for the shifted

    Legendre polynomials.

    To derive an upper bound for the first term, we use the definitions of al and ãl,

    S1 =n∑l=1

    ∣∣∣∣∣(2l − 1)∫ 10

    QV (u)P∗l−1(u)du− (2l − 1)

    n∑j=1

    yj

    ∫ 10

    bn−1j−1 (u)P∗l−1(u)du

    ∣∣∣∣∣ ,where the second term is row l of B multiplied by y and bmk (u) =

    (mk

    )uk(1 − u)m−k is the

    33

  • Bernstein basis function. As a result,

    S1 ≤n∑l=1

    ∫ 10

    ∣∣∣∣∣QV (u)−n∑j=1

    yjbn−1j−1 (u)

    ∣∣∣∣∣ (2l − 1) ∣∣P ∗l−1(u)∣∣ du≤ sup

    u

    ∣∣∣∣∣QV (u)−n∑j=1

    yjbn−1j−1 (u)

    ∣∣∣∣∣n∑l=1

    (2l − 1)∫ 10

    ∣∣P ∗l−1(u)∣∣ du≤

    n∑l=1

    √2l − 1× sup

    u

    ∣∣∣∣∣QV (u)−n∑j=1

    yjbn−1j−1 (u)

    ∣∣∣∣∣where the last inequality follows from the inequalities between L1 and L2 norms,

    ∫ 10|P ∗l−1(u)|du ≤

    ||P ∗l−1||L2 = 1/√

    2l − 1.

    The bound derived in the last inequality holds for all y which are non-decreasing. We can,

    therefore, choose y so that the approximating polynomial is the best monotone polynomial

    approximation in the uniform metric. A result in Leviatan (1988) restated in our notation

    shows that if QV has continuous k derivatives, then there exist non-decreasing polynomials

    pn of degree not exceeding n such that

    supu|QV (u)− pn(u)| ≤ Cn−kωk(1/n)

    where ωk is the modulus of continuity of Q(k)V and C is an absolute constant. This choice of y

    minimizes the upper bound but it is not necessarily the choice that minimizes S1. Therefore,

    we minimize the left hand-side independently,

    minyi≤yj

    ||a−By||1 ≤n∑l=1

    √2l − 1× Cωk(1/n)

    n−k.

    The r.v. Y determined by the optimization problem is in fact Vn. As a result,

    |QV (u)−QVn(u)| ≤n∑l=1

    √2l − 1× Cωk(1/n)

    n−k+

    Ck+1

    k(n− 12) . . . (n− 2k−1

    2)

    √π

    2(n− k − 1).

    Finally, the upper bound in the proposition follows from (14).

    The explicit representation is derived directly. Set β̃i,j = βi,j/(2i− 1). Then,

    β̃i,j =

    ∫ 10

    bn−1j−1 (u)P∗i−1(u)du =

    i−1∑l=0

    (−1)i−1+l(i− 1l

    )∫ 10

    bn−1j−1 (u)bi−1l (u)du

    =

    (n− 1j − 1

    ) i−1∑l=0

    (−1)i−1+l(i− 1l

    )2 ∫ 10

    ul+j−1(1− u)n−j+i−1−ldu

    =

    (n− 1j − 1

    ) i−1∑l=0

    (−1)i−1+l(i− 1l

    )2(l + j − 1)!(n− j + i− 1− l)!

    (n+ i− 1)!

    34

  • =

    (n− 1j − 1

    ) i−1∑l=0

    (−1)i−1+l(i−1l

    )2(n+i−1j+l

    ) 1j + l

    .

    where the first equality follows from the following representation (see Farouki (2012)),

    P ∗r (u) =r∑

    k=0

    (−1)r+k(r

    k

    )brk(u), where b

    rk(u) =

    (r

    k

    )uk(1− u)r−k.

    Other expressions are also available, see Farouki (2000). �

    35

  • ReferencesAng, A., and G. Bekaert. 2002. International asset allocation with regime shifts. The Reviewof Financial Studies 15:1137–1187.

    Ang, A., G. Bekaert, and J. Liu. 2005. Why stocks may disappoint. Journal of FinancialEconomics 76:471–508.

    Campbell, J. Y., and L. Viceira. 2001. Who should buy long-term bonds? AmericanEconomic Review 91:99–127.

    Campbell, J. Y., and L. M. Viceira. 1999. Consumption and portfolio decisions when ex-pected returns are time varying. The Quarterly Journal of Economics 114:433–495.

    Carr, P., H. Geman, D. B. Madan, and M. Yor. 2003. Stochastic volatility for Lévy processes.Mathematical Finance 13:345–382.

    Dahlquist, M., A. Farago, and R. Tedongap. 2017. Asymmetries and Portfolio Choice. Reviewof Financial Studies 30:667–702.

    Das, S. R., and R. Uppal. 2004. Systemic risk and international portfolio choice. The Journalof Finance 59:2809–2834.

    Eberlein, E. 2001. Application of generalized hyperbolic Lévy motions to finance. In Lévyprocesses, pp. 319–336. Springer.

    Farouki, R. 2000. Legendre-Bernstein basis transformations. Journal of Computational andApplied Mathematics 119(1-2):145–160.

    Farouki, R. 2012. The Bernstein polynomial basis: A centennial perspective. ComputerAided Geometric Design 29(6):379–419.

    Gul, F. 1991. A theory of disappointment aversion. Econometrica pp. 667–686.

    Hosking, J. R. M. 1990. L-moments: Analysis and estimation of distributions using linearcombinations of order statistics. Journal of the Royal Statistical Society, Series B 52:105–124.

    Hosking, J. R. M. 2005. Regional Frequency Analysis: An Approach Based on L-moments.Cambridge University Press.

    Jagannathan, R., and T. Ma. 2003. Risk reduction in large portfolios: Why imposing thewrong constraints helps. Journal of Finance 58:1651–1683.

    Jorion, P. 1985. International portfolio diversification with estimation risk. Journal ofBusiness pp. 259–278.

    Kandel, S., and R. F. Stambaugh. 1996. On the predictability of stock returns: an asset-allocation perspective. The Journal of Finance 51:385–424.

    Lebedev, N. N. 1972. Special functions and their applications. Dover.

    36

  • Leviatan, D. 1988. Monotone and comonotone polynomial approximations revisited. Journalof Approximation Theory 53:1–16.

    Lorentz, G. 1986. Bermstein polynomials. Chelsea.

    Markowitz, H. 1952. Portfolio Selection. The Journal of Finance 7:77–91.

    Martellini, L., and V. Ziemann. 2010. Improved Estimates of Higher-Order Comoments andImplications for Portfolio Selection. Review of Financial Studies 23:1467–1502.

    Mittnik, S., and S. Rachev. 2000. Stable Paretian Models in Finance. Wiley.

    Owen, J., and R. Rabinovitch. 1983. On the class of elliptical distributions and their appli-cations to the theory of portfolio choice. The Journal of Finance 38:745–752.

    Rachev, S., L. Klebanov, S. Stoyanov, and F. Fabozzi. 2013. The Methods of Distances inthe Theory of Probability and Statistics. Springer.

    Routlege, B., and S. Zin. 2010. Generalized Disappointment Aversion and Asset Prices. TheJournal of Finance LXV:1303–1332.

    Shalit, H., and S. Yitzhaki. 1984. Mean-Gini, portfolio theory, and the pricing of risky assets.Journal of Finance 39:1449–1468.

    Sharpe, W. F. 1963. A simplified model for portfolio analysis. Management science 9:277–293.

    Sillito, G. P. 1951. Interrelations between certain linear systematic statistics of sample fromany continuous population. Biometrika 38:377–382.

    Wang, H., and S. Xiang. 2012. On the convergence rate of Legendre approximations. Math-ematics of Computation 81(278):861–877.

    Zolotarev, V. M. 1997. Modern theory of summation of random variable. Brill AcademicPublishers.

    37

    WP_CQFIS_0.pdffms_v0.pdf�