Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work,...

26
High-dimensional risk-constrained dynamic asset allocation via Markov stochastic dual dynamic programming DaviVallad~ao Industrial Engineering Department, PUC-Rio, [email protected] Thuener Silva Industrial Engineering Department, PUC-Rio Marcus Poggi Informatics Department, PUC-Rio January 18, 2017 Abstract Dynamic portfolio optimization has a vast literature exploring dier- ent simplications by virtue of computational tractability of the prob- lem. Previous works provide solution methods considering unrealistic as- sumptions, such as no transactional costs, small number of assets, specic choices of utility functions and oversimplied price dynamics. Other more realistic strategies use heuristic solution approaches to obtain suitable investment policies. In this work, we propose a high-dimensional risk- constrained dynamic asset allocation model and eciently solve it using the stochastic dual dynamic programming algorithm. We consider multi- ple assets, transactional costs and a Markov factor model for asset returns. We impose one-period conditional value-at-risk (CVaR) constraints, argu- ing that it is reasonable to assume that an investor knows how much he is willing to lose in a given period. In contrast to dynamic risk measures as the objective function, our time-consistent model has relatively complete recourse and a straightforward lower bound, considering a maximization problem. We present empirical results for an illustrative 3-asset model comparing the optimal policy with selected benchmarks and solve a real- istic 100-asset model guaranteeing a suciently small optimality gap with high probability. To the best of our knowledge, this is the rst systematic approach for solving realistic high-dimensional dynamic stochastic asset allocation problems. 1

Transcript of Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work,...

Page 1: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

High-dimensional risk-constrained dynamic asset

allocation via Markov stochastic dual dynamic

programming

Davi ValladaoIndustrial Engineering Department, PUC-Rio, [email protected]

Thuener SilvaIndustrial Engineering Department, PUC-Rio

Marcus PoggiInformatics Department, PUC-Rio

January 18, 2017

Abstract

Dynamic portfolio optimization has a vast literature exploring differ-ent simplifications by virtue of computational tractability of the prob-lem. Previous works provide solution methods considering unrealistic as-sumptions, such as no transactional costs, small number of assets, specificchoices of utility functions and oversimplified price dynamics. Other morerealistic strategies use heuristic solution approaches to obtain suitableinvestment policies. In this work, we propose a high-dimensional risk-constrained dynamic asset allocation model and efficiently solve it usingthe stochastic dual dynamic programming algorithm. We consider multi-ple assets, transactional costs and a Markov factor model for asset returns.We impose one-period conditional value-at-risk (CVaR) constraints, argu-ing that it is reasonable to assume that an investor knows how much he iswilling to lose in a given period. In contrast to dynamic risk measures asthe objective function, our time-consistent model has relatively completerecourse and a straightforward lower bound, considering a maximizationproblem. We present empirical results for an illustrative 3-asset modelcomparing the optimal policy with selected benchmarks and solve a real-istic 100-asset model guaranteeing a sufficiently small optimality gap withhigh probability. To the best of our knowledge, this is the first systematicapproach for solving realistic high-dimensional dynamic stochastic assetallocation problems.

1

Page 2: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

1 Introduction

Portfolio selection is a complex dynamic decision process under uncertainty thatmust account for risk aversion, price dynamics, transaction costs and allocationconstraints. Mathematically represented as a stochastic dynamic programmingproblem, asset allocation suffers from the curse of dimensionality, particularlywhen transaction costs are not negligible. Although the simplifying assumptionsof previous works make dynamic asset allocation models more intuitive andcomputationally tractable, they are unreliable policies in practice.

The dynamic portfolio optimization literature begins with Mossin (1968),Samuelson (1969), and Merton (1969, 1971), where they show that a no trans-action cost problem is tractable and efficiently solvable. In a discrete-time frame-work, Constantinides (1979) address a dynamic portfolio selection with propor-tional transaction costs, but only for a two-asset problem. Davis and Norman(1990) and Shreve and Soner (1994) can be considered continuous-time exten-sions of the work of Constantinides (1979), whereas Muthuraman (2007) numer-ically solved a one-asset problem with proportional transaction costs. Brownand Smith (2011) used dual bounds proposed by Brown et al. (2010) to assessthe solution quality of proposed heuristic (non-optimal) allocation strategies fora multi-asset (up to ten assets) model with transaction costs. More recently,Bick et al. (2013) proposed a numerical procedure to obtain heuristic allocationstrategies as modifications of the closed-form solution of an idealized problem,and a performance bound to measure its quality. The works of Brown and Smith(2011) and Bick et al. (2013) reiterates the high computational burden of ob-taining the actual optimal allocation strategy and the importance of developingan efficient solution algorithm for this problem. To the best of our knowledge,no previous works efficiently solved a realistic high-dimensional dynamic assetallocation problem with multiple assets and transaction costs.

The stochastic dynamic programming literature, on the other hand, dealswith considerably more complex and large-scale problems in a discrete-timesetting. Decomposition methods with a full tree representation, such as pro-gressive hedging, see Rockafellar and Wets (1991), and L-shaped, see Birge andLouveaux (2011), solve medium-sized problems. Additionally, sampling-baseddecomposition methods, such as stochastic dynamic dual programming (SDDP)by Pereira and Pinto (1991), abridged nested decomposition (AND) by Donohueand Birge (2006) and convergent cutting-plane and partial-sampling algorithm(CUPPS) by Chen and Powell (1999), successfully solve large-scale problems.

Particularly important to our work, the SDDP algorithm can efficiently solvea large-scale stochastic dynamic problem in the context of the operation plan-ning of hydrothermal power systems. The SDDP overcomes the curse of di-mensionality assuming temporal (stage-wise) independence of the stochasticprocess. For a maximization problem, stage-wise independence guarantees aunique concave value function for each time stage regardless of the stochasticprocess realization. Augmenting the state-space with a past realization of thestochastic process, the SDDP framework also accommodates a linear temporaldependence in the right-hand side of a linear constraint (Shapiro et al., 2013).

2

Page 3: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Furthermore, the SDDP framework is also suitable for a discrete-state Markov-dependent stochastic process, as in Mo et al. (2001), Philpott and de Matos(2012) and Bruno et al. (2016), which is particularly helpful when solving astochastic dynamic asset allocation problem.

Finally, risk aversion is an important issue in a dynamic stochastic portfoliooptimization. In the portfolio literature, most works maximize the expectationof some parametric concave utility representing the investor’s attitude towardrisk. The importance of the utility theory notwithstanding, it is not intuitive todetermine a risk-averse parameter or even which utility function that suitablyrepresents the investor’s risk aversion. The latest approaches in the stochas-tic dynamic programming literature introduce risk aversion via time-consistentdynamic risk measures. The objective function is a recursive formulation ofa one-period coherent risk measure, generally the convex combination of ex-pectation and conditional value-at-risk (CV@R) (Philpott and de Matos, 2012;Shapiro, Tekaya, da Costa, and Soares, 2013). The recursive model ensurestime-consistent policies (Shapiro, 2009) and has a suitable economic interpreta-tion of a certainty equivalent (Rudloff, Street, and Valladao, 2014). In practicalapplications, however, a decision maker must define the relative weights betweenexpectation and CV@R to represent his risk aversion, which is a non-intuitiveuser-defined risk-aversion parameter.

In this work, we develop and efficiently solve a large-scale multi-asset stochas-tic dynamic asset allocation model motivated by the actual decision process inthe financial market. Hedge funds hire managers to propose trading strate-gies that maximize expected returns, while risk departments impose constraintsto strategies with a high level of risk. We focus our developments on risk-constrained models, arguing that it is reasonable to assume that an investorknows how much he is willing to lose in a given period. Our approach as-sumes a discrete-state Markov process to capture the dynamics of commonfactors of asset returns and imposes one-period CV@R constraints, ensuringa relatively complete recourse and computationally tractable time-consistentrisk-averse model.

Our main contribution is to develop and solve a dynamic stochastic assetallocation model in a discrete-time finite horizon that has the following charac-teristics:

• Realism: multiple assets (high-dimensional space), transactional costs andMarkov-dependent asset returns;

• Time consistency: expected return maximization with one-period CV@Rconstraints guarantees that the planned decisions are actually going to beimplemented in the future;

• Intuitive risk aversion: risk aversion controlled by a user-defined CV@Rlimit, intuitively interpreted as the maximum percentage loss allowed inone period;

• Flexibility: straight-forward extensions to incorporate operational restric-tions such as holdings and turnover constraints;

3

Page 4: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

• Computational tractability: solution obtained via stochastic dual dynamicprogramming with discrete-state Markov chain.

2 Theoretical Background

Let us assume a filtered probability space (Ω,F ,P), where F = FT and F0 ⊆. . . ⊆ FT , and a generic definition of a time-consistent dynamic stochastic pro-gramming model for asset allocation. At a given time (stage) t, an investorwishes to construct a portfolio ut = (u0t, . . . , uNt)

> determining allocation ona risk-free asset1 u0,t and on risky ones ui,t for i ∈ A = 1, . . . , N, each ofwhich has an uncertain return ri,t, with a vector notation rt = (r0t, . . . , rNt)

>.At a given time t, we define the feasible set of the asset allocations Ut(ut−1) asa function of the previous allocation ut−1. We also define the terminal wealthWT (uT−1) as a function of the last allocation and, for every t = 1, . . . , T − 1, aone-period preference functional ψt : L∞(Ft+1) → L∞(Ft). Following Rudloff,Street, and Valladao (2014) and Shapiro (2009), we ensure time-consistent poli-cies by defining the recursive setting

maxu1∈U1(u0)

ψ1

[max

u2∈U2(u1)ψ2

[· · · max

uT−1∈UT−1(uT−2)ψT−1

[WT (uT−1)

]]](1)

where xt and rt are Ft–measurable.This generic formulation accounts for different types of models, including

maximization of expected utility, minimization of a time-consistent dynamicrisk measure and a risk-neutral objective function with one-period risk con-straints. Previous works show two modeling issues that pose a significant trade-off between realism and computational tractability: transaction costs and timedependence of asset returns. For the purpose of gaining intuition and keepingthe paper self-contained, we review previous results for simplified models: Sub-section 2.1 depicts a myopic portfolio model with no transaction costs, while 2.2shows that SDDP efficiently solves a stage-wise independent model with trans-action costs. Finally, in 2.3, we discuss available alternatives to accommodatestemporal dependence on the SDDP framework.

2.1 No transaction cost model

Let us define the current wealth Wt = (1 + rt)>ut−1 as the accrued value of pre-

vious allocation2 and use it to reallocate the new portfolio, 1>ut = Wt. Hence,in (1), the feasible set would be Ut(ut−1) =

ut ∈ RN+1

+ | 1>ut = (1 + rt)>ut−1

and the terminal wealth WT (uT−1) = (1 + rT)>uT−1. Now, assuming thatshort selling is not allowed, let us define the model with the dynamic program-ming equations for t = T, . . . , 0

1The risk-free asset is indexed by i = 0, and it has null returns for all t = 2, . . . , T , i.e.,r0,t(ω) = 0,∀ω ∈ Ω.

2Where 1 is a column vector of ones with appropriate dimension.

4

Page 5: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Qt(Wt) = maxut≥0

ψt[Qt+1

((1 + rt+1)>ut

)]| 1>ut = Wt

(2)

where QT (WT ) = WT . Traditional dynamic programming techniques efficientlysolve no transaction cost portfolio problems when asset returns follow a low-dimensional factor model. For instance, (2) is computationally tractable if assetreturns follow a one-factor model log(1 + rt+1) = ar + brzt + εt+1, where thescalar factor follows a simple autoregressive process zt = az+bzzt+ηt+1, (Brownand Smith, 2011).

Following Blomvall and Shapiro (2006), if we consider a log-utility u(w) =log(w) and define ψT−1[.] = E[u(.)|FT−1] and ψt[.] = E[.|Ft],∀1 ≤ t ≤ T − 2,we obtain a myopic problem whose current decision ut can be obtained as thesolution of

maxut≥0

ψt[(1 + rt+1)>ut

]| 1>ut = Wt

(3)

a tractable two-stage problem only concerning financial gains of the followingperiod t+ 1. The additional assumption of stage-wise independent returns alsoleads to a myopic problem if one chooses a power utility u(w) = wη/η with0 < η ≤ 1 and for an alternative objective function based on a time-consistentdynamic risk measure, i.e., defining ψt[.] = −ρt[.], where ρt is a conditionalcoherent risk measure; see Cheridito et al. (2006).

2.2 Transaction costs and stage-wise independent model

Under transaction costs, dynamic asset allocation is no longer myopic but stillcomputationally tractable for stage-wise independence of asset returns. Al-though unreliable as an actual trading strategy, one can solve the stage-wiseindependent model using the stochastic dual dynamic programming (SDDP)algorithm (Kozmık and Morton, 2014).

Let us assume stage-wise independence and reformulate the problem with anaugmented state space for the value function Qt(xt), where xt is a vector thatrepresents the amount of money in each asset right before buying and sellingdecisions at time t. Note that we could easily extend our framework for anyconvex transactional cost function as in Brown and Smith (2011); however, forsimplicity, we assume proportional costs. We respectively denote buying andselling (dropping) decision variables by bi,t, di,t,∀i ∈ A, t ∈ 1, . . . , T − 1, with

the vector notation bt = (b1,t, . . . , bN,t)>

and dt = (d1,t, . . . , dN,t)>. Then, we

define the dynamic equations

Qt(xt) = maxut,bt,dt

ψt[Qt+1

(Rt+1 · ut

)](4)

s.t. u0,t + (1 + c)>bt − (1− c)>dt = x0,t (5)

ui,t − bi,t + di,t = xi,t, ∀i ∈ A (6)

ut,bt,dt ≥ 0, (7)

where QT (xT) = 1>xT, Rt = diag(1+ rt+1), c = c ·1, and c is the transactioncost proportion.

5

Page 6: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Gain𝑊

Figure 1: 95% V@R and CV@R of an uncertain wealth.

As before, the objective function (4) is a risk-adjusted terminal wealth. Con-straint (5) ensures that the allocation in the risk-free asset (cash) before (x0,t)and after (u0,t) buying and selling decisions correctly accounts for the cash bal-ance and transaction costs. Constraint (6) accounts for the reallocation of eachrisky asset, and (7) are non-negativity of buying and selling decisions and shortselling prohibition.

In most works, ψt is based on the conditional value-at-risk (CVaR). Consid-ering a random gain W , we defined

CVaRα[W ] = infz∈R

z +E[(

(−W )− z)+]

1− α

,with confidence level α ∈ (0, 1), where x+ = −max(x, 0). The concept isillustrated in Figure 1.

In particular, Philpott and de Matos (2012) and Shapiro (2011) propose thetime-consistent objective

ψt[.] = (1− λ) · E[.|Ft]− λ · CV aRα[.|Ft], (8)

and solve the stage-wise independent problem with the SDDP algorithm. Thisrecursive objective function does not allow for a straightforward computationof the lower bound for a maximization problem. Although Kozmık and Mor-ton (2014) propose a lower bound computation methodology based on impor-tance sampling, most works use ad hoc stopping criteria for solving such prob-lems. Notwithstanding Rudloff, Street, and Valladao (2014) intuitive certainty-equivalent interpretation of this recursive objective function, it is not clear howto determine λ to represent the preferences of an investor. Hence, this is a majorobstacle for practical implementation.

To the best of our knowledge, no work in the literature has addressed ahigh-dimensional dynamic asset allocation problem with time-dependent assetreturns.

6

Page 7: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

2.3 Stochastic dual dynamic programming and temporaldependence

The SDDP and other sampling-based methods assume stage-wise independenceto avoid having multiple value functions per stage. Considering stage-wise inde-pendence, SDDP allows for cut sharing and consequently approximates only onevalue function at each time stage. For a generic time dependence, a decomposi-tion algorithm, such as the L-shaped method, requires a differing value functionfor each realization of the stochastic process, which significantly increases thecomputational burden. Figure 2 illustrates an SDDP value function that consid-ers stage-wise independence, while Figure 3 illustrates differing value functionsconsidering a generic time dependence. For the latter, one would need the fullrepresentation of the tree to use Benders-based decomposition algorithms, suchas L-shaped, to solve a medium-sized problems. SDDP handles large-scale prob-lems, while other methods based on the full representation of the tree becomecomputationally intractable.

𝒙"𝒙"#$𝒙"#%

𝑄"(𝒙")

Figure 2: Illustrative SDDPvalue function consideringstage-wise independence.

𝒙"𝒙"#$𝒙"#%

𝑄"(𝒙", 𝜔$)

𝑄"(𝒙", 𝜔%)

𝑄"(𝒙", 𝜔+)

𝑄"(𝒙", 𝜔,)

Figure 3: Illustrative value functionsconsidering generic time dependence.

The SDDP framework accommodates some specific types of time depen-dence, such as a linear autoregressive process for some optimization problems,and, more generally, for a discrete-state Markov process. For instance, in thehydrothermal model, the state-space includes past rivers’ inflows changing theright-hand side accordingly to model a linear time-dependence as in (Infangerand Morton, 1996; Pereira and Pinto, 1985). For a more generic class of prob-lems, Mo et al. (2001) and Philpott and de Matos (2012) modify the model usingthe Markov chains preserving convexity, consequently ensuring optimality.

In the dynamic asset allocation problem, the stochastic returns are not right-hand-side coefficients and have a nonlinear dependence (Cont, 2001; Morettinand Toloi, 2006). For a generic time dependence, full tree representation mul-tistage models as in Valladao et al. (2014) may solve a medium-size dynamicportfolio optimization, but it is not computationally tractable for a large-scale

7

Page 8: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

problem including transaction costs, several time stages and multiple assets. Inour work, we assume a discrete-state Markov process, where given a state, theprobability distribution does not dependent on past asset returns. In this case,we have a different value function for each time stage and for each Markov stateof the system. In the following section, we propose a sequence of risk-constrainedmodels building up complexity and realism with the following assumptions: (i)no transaction costs and stage-wise independence; (ii) no transaction costs andlow-dimensional time dependent factors; (iii) transaction costs and stage-wiseindependence; and (iv) transaction costs and Markov time-dependent asset re-turns. For the latter, we adapt SDDP to account for a discrete-state Markovmodel and solve a realistic dynamic asset allocation problem within reasonablecomputation time.

3 Risk-Constrained Stochastic Dynamic AssetAllocation Models

In this section, we propose a class of asset allocation models motivated by theactual decision process in financial markets. Hedge funds hire managers to pro-pose trading strategies that maximize expected returns, while risk departmentsimpose constraints to strategies with a high level of risk. We focus our develop-ments on risk-constrained models, arguing that it is reasonable to assume thatan investor knows how much he is willing to lose in a given period.

Let us start by assuming no transaction costs and time (stage-wise) inde-pendence. For t = 0, . . . , T − 1, we define the dynamic programming equations

Qt(Wt) = maxut

E[Qt+1

((1 + rt+1)>ut

)](9)

s.t. ρ[r>t+1ut

]≤ γWt (10)

1>ut = Wt (11)

ut ≥ 0, (12)

where QT (WT ) = WT . The risk constraint (10) limits the maximum percentageloss γ determined by the investor, where it considers a coherent risk measureρ : L∞(F)→ R. Similar to Artzner et al. (1999), the risk measure respects thefollowing properties:

• Monotonicity: ρ(X) ≤ ρ(Y ), for all X,Y ∈ L∞(F), such that X ≥ Y ;

• Translation invariance: ρ(X + m) = ρ(X) −m, for all X ∈ L∞(F) andm ∈ R;

• Positive homogeneity: ρ(λX) = λ ρ(X), for all X ∈ L∞(F), λ ≥ 0;

• Subadditivity: ρ(X + Y ) ≤ ρ(X) + ρ(Y ), for all X,Y ∈ L∞(F).

Given some mild assumptions, the risk-constrained problem with no transac-tion costs (9–12) has relatively complete recourse. Indeed, a full allocation in the

8

Page 9: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

risk-free asset ensures feasibility regarding the risk constraint for a non-negativeγ.

Proposition 1. If P(ω ∈ Ω | rt+1(ω) < −1) = 0 and P(ω ∈ Ω | r0,t+1(ω) =0) = 1,∀t ∈ 0, . . . , T − 1, then (9–12) has relatively complete recourse forγ ≥ 0.

Proof. of Proposition 1. Assuming P(ω ∈ Ω | rt+1(ω) < −1) = 0 and theshort selling prohibition ut ≥ 0, then P(ω ∈ Ω | (Wt+1(ω) ≥ 0) = P

(ω ∈ Ω |

(1 + rt+1(ω))>ut ≥ 0

) = 1. Given a feasible (non-negative) current wealth,Wt ≥ 0, this model has relatively complete recourse for any γ ≥ 0 since arisk-free allocation, urf

t = (Wt, 0, . . . , 0)>, is always feasible. Indeed, for γ ≥ 0,ρt[r>t+1u

rft

]= 0 ≤ γWt given that P(ω ∈ Ω | r0,t(ω) = 0) = 1.

Additionally, (9–12) has a myopic solution because the first-stage decisionalso solves its one-period counterpart problem.

Proposition 2. The problem (9–12) is myopic, i.e., it has the same solutionof

Qt(Wt) = maxut

E[(1 + rt+1)>ut

](13)

s.t. ρ[r>t+1ut

]≤ γWt

1>ut = Wt

ut ≥ 0,

Proof. Proof of Proposition 2. By definition, QT (WT ) = QT (WT ) = WT , andconsequently, QT (WT ) = WT · QT (1). Employing the inductive hypothesis

Qt+1(Wt+1) = Wt+1

∏Tτ=t+1Qτ (1) in (9–12), we would have

Qt(Wt) =

(T∏

τ=t+1

Qτ (1)

)·max

ut

E[(

(1 + rt+1)>ut

)]s.t. ρ

[r>t+1ut

]≤ γWt

1>ut = Wt

ut ≥ 0.

Note that if the inductive hypothesis is true, the solution of the 2-stage problem(13) is also the solution of the original problem (9–12).

To prove that the inductive hypothesis holds, we divide all constraints bythe positive constant Wt and redefine decision variables as percentage allocationut = ut/Wt to obtain the equivalent problem

Qt(Wt) = Wt

(T∏

τ=t+1

Qτ (1)

)·max

ut

E[(1 + rt+1)>ut

]s.t. ρ

[r>t+1ut

]≤ γ

1>ut = 1

ut ≥ 0.

9

Page 10: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Then, we have that Qt(Wt) = Wt

∏Tτ=tQτ (1).

Now, let us take one step toward a less unrealistic asset allocation model.For simplicity, we assume proportional transaction costs, even though we couldconsider a more general convex transaction cost function. As in subsection 2.2,we define the state space as a vector with the amount of money allocated in eachasset right before buying and selling decisions at time t. We redefine the risk-constrained problem with transaction costs and stage-wise independent returnsfor t = 0, . . . , T − 1,

Qt(xt) = maxut,bt,dt

E[Qt+1

(Rt+1 · ut

)](14)

s.t. ρ[r>t+1ut

]+ c>(bt + dt) ≤ γ

(1>xt

)(15)

u0,t + (1 + c)>bt − (1− c)>dt = x0,t (16)

ui,t − bi,t + di,t = xi,t, ∀i ∈ A (17)

ut,bt,dt ≥ 0, (18)

where QT (xT) = 1>xT, Rt = diag(1 + rt+1) and c = c · 1. Note that (15)constrains incremental losses between times t and t + 1, which is equivalent toρ[−c>(bt + dt) + r>t+1ut

]≤ γ

(1>xt

), since ρ is translation invariant.

As in the previous subsection, (14–18) has relatively complete recourse whenthe maximum percentage loss is at least the transaction cost proportion.

Proposition 3. If P(ω ∈ Ω | rt+1(ω) < −1) = 0 and P(ω ∈ Ω | r0,t+1(ω) =0) = 1,∀t ∈ 0, . . . , T − 1, problem (14–18) has relatively complete recoursefor γ ≥ c.

Proof. Proposition 3. Assuming P(ω ∈ Ω | rt+1(ω) < −1) = 0 and the shortselling prohibition ut ≥ 0, then P(ω ∈ Ω | Rt+1(ω) · ut ≥ 0) = 1, giventhat Rt = diag(1 + rt+1). First, we show that there exists bt,dt such that1>(bt + dt) ≤ 1>xt and consequently c>(bt + dt) ≤ c>xt. If all risky assetsare sold, i.e., bt = 0 and di,t = xi,t,∀i ∈ A. Then,

1>(bt + dt) =∑i∈A

xi,t ≤∑

i∈A∪0

xi,t = 1>xt

Given that we sell all risky assets, ρ[r>t+1ut

]= 0. Then,

ρ[r>t+1ut

]+ c>(bt + dt) = c>(bt + dt) ≤ c

(1>xt

)≤ γ

(1>xt

),

if γ ≥ c.

Finally, our last step toward a realistic model incorporates price dynamicsby assuming that asset returns follow a discrete-state Markov model. Let usassume a Markov asset return with discrete state space K, where the multi-variate probability distribution of rt given the Markov state Kt at time t doesnot depend on past returns, i.e., rt−1, . . . , r1. The Markov state Kt could be

10

Page 11: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Neutral

Bull Bear

downstable

up downstable up

down

stable

up

0.10.2

0.7 0.60.30.1

0.2

0.6

0.2

0.2 0.2

0.2

0.1

0.1

0.1

0.6

0.7 0.8

Figure 4: Illustrative Markov asset return model.

interpreted as the financial market situation. For instance, in Figure 4, wehave three Markov states (bull, neutral or bear market) with three differingprobability distributions of asset returns.

For notation simplicity, we denote the state transition probability as

Pk|j =P (ω ∈ Ω|Kt(ω) = j,Kt+1(ω) = k)

P(ω ∈ Ω|Kt(ω) = j),

and, for a given time t = 0, . . . , T − 1 and j ∈ K, we define a different valuefunction for a given state Kt = j via dynamic programming equations

Qjt (xt) = maxut,bt,dt

∑k∈K

E[Qkt+1

(Rt+1 · ut

)| Kt+1 = k

]Pk|j (19)

s.t. ρ[r>t+1ut | Kt = j

]+ c>(bt + dt) ≤ γ

(1>xt

)u0,t + (1 + c)>bt − (1− c)>dt = x0,t

ui,t − bi,t + di,t = xi,t, ∀i ∈ Aut,bt,dt ≥ 0,

where QT (xT) = 1>xT, Rt = diag(1 + rt+1), and c = c · 1.The objective function is a weighted average of t + 1 value functions, as

illustrated in Figure 5. As a compromise between stage-wise independence,

11

Page 12: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

illustrated in Figure 2, and a generic time-dependent model, illustrated in Figure3, the Markov model remains computationally tractable for a reasonably smallnumber of Markov states and introduces the effect of price dynamics in theoptimal asset allocation policy.

𝒙"#$𝒙"𝒙"%$+

∗ ∗

ℙ 𝐾"#$ = 𝑘$ 𝐾" ℙ 𝐾"#$ = 𝑘, 𝐾"

𝑄"./(𝒙") 𝑄"

.2(𝒙")

Figure 5: Illustrative weighted average of value functions for Markovian assetreturn model.

For a practical and efficient implementation of the SDDP solution algo-rithm, let us develop a problem equivalent to (19) that explicitly considersintermediate earnings in the objective function. This model mitigates the ef-fects of a poor approximation of the future value function in early iterationsof the SDDP solution algorithm. Considering a sequence of decisions, we re-cursively represent the terminal wealth WT = 1>xT at a given time t as1>xT = 1>xt +

∑T−1τ=t

(−c>(bt + dt) + r>t+1ut

). One can reformulate (19)

accordingly using equivalent dynamic equations

V jt (xt) = maxut,bt,dt

− c>(bt + dt) +∑k∈K

E[r>t+1ut + V kt+1

(Rt+1 · ut

)| Kt+1 = k

]Pk|j

s.t. ρ[r>t+1ut | Kt = j

]+ c>(bt + dt) ≤ γ

(1>xt

)u0,t + (1 + c)>bt − (1− c)>dt = x0,t

ui,t − bi,t + di,t = xi,t, ∀i ∈ Aut,bt,dt ≥ 0.

Note that we neglect 1>xt in the objective function since it is a constant attime t. Additionally, we assume ρ to be the CV@R with confidence level αconditioned to the current Markov state Kt = j,

ρ [W | Kt = j] = infz∈R

z +E[(

(−W )− z)+ | Kt = j

]1− α

,12

Page 13: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

which is a coherent risk measure with a suitable economic interpretation easilyrepresented in a linear stochastic programming problem.

For computational tractability, let us assumes a discrete number of assetreturn scenarios and associate probabilities for each given Markov state. GivenKt+1 = k, we denote for each scenario s ∈ Sk, the asset return vector rt+1(s) =(r0,t+1(s), . . . , rN,t+1(s)

)>and the matrix Rt+1(s) = diag

(1 + rt+1(s)

). More-

over, we denote

ps|k =P (ω ∈ Ω|rt+1(ω) = rt+1(s))

P (ω ∈ Ω|Kt+1(ω) = k)as the conditional return probability and define for all s ∈ Sk auxiliary decisionvariables ys to represent CV@R in the deterministic equivalent linear dynamicstochastic programming problem

V jt (xt) = maxz,y,ut,bt,dt

− c>(bt + dt) +∑k∈K

∑s∈Sk

(rt+1(s)>ut + V kt+1 (Rt+1(s)· ut)

)Pk|j ps|k

s.t. z +

∑k∈K

∑s∈Sk

ys Pk|j ps|k

(1− α)+ c>(bt + dt) ≤ γ(1>xt)

u0,t + (1 + c)>bt − (1− c)>dt = x0,t

ui,t − bi,t + di,t = xi,t ∀i ∈ Art+1(s)>ut + z + ys ≥ 0, ∀k ∈ K, s ∈ Skys ≥ 0, ∀j ∈ K, s ∈ Skut,bt,dt ≥ 0 (20)

We argue that the dynamic asset allocation model (20) is: (i) realistic sinceit considers multiple assets, transactional costs and Markov-dependent assetreturns; (ii) time consistent since all planned decisions are actually going to beimplemented in the future; (iii) intuitive since its risk averse parameter can beinterpreted as the maximum percentage loss allowed in one period; (iv) flexiblesince additional constraints such as holdings and turnover restrictions can beincorporated by a set of linear inequalities; (v) computationally tractable sinceit can be efficiently solve by the MSDDP algorithm.

4 Computational Experiments

In this section, our objective is to compare the performance of the proposedrisk-constrained time-consistent dynamic asset allocation model against selectedbenchmark strategies considering in-sample and out-of-sample tests. We use adiscrete state Markov stochastic dual dynamic programming (MSDDP) algo-rithm to solve problems considering transactional costs and time dependence.We assume factor models for asset returns and argue that our framework is fairlygeneral even when factor dynamics are not theoretically defined by a discreteMarkov chain. For low-dimensional time-dependent factors, we can estimate aMarkov chain that represents either a given stochastic process or historical data.

13

Page 14: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

For all case studies, we estimate using simulations or historical data a factormodel for asset returns while representing factor dynamics by a Gaussian mix-ture in the hidden Markov model (HMM) scheme3. Ryden et al. (1998) showthat modeling the financial time series as a Gaussian mixture, as illustrated inFigure 6, according to the states of an unobserved Markov chain reproducesmost of the stylized facts for daily return series demonstrated by Granger andDing (1994). Thus, HMM is a natural and fairly general methodology for mod-eling financial time series (Elliott and Siu, 2014; Elliott and Van der Hoek, 1997;Hassan and Nath, 2005; Mamon and Elliott, 2007, 2014).

Figure 6: An illustrative example of a Gaussian mixture model.

More objectively, we divide our experiments into two case studies: in thefirst case study, we approximate a known one-factor model by a Markov chain,solve a 3-asset problem and empirically test its in-sample performance. In thesecond case study, we consider a 5-factor model (Fama and French, 2015, 2016)with time dependence to solve a 100-asset problem ensuring with 90 percentprobability an optimality gap smaller than or equal to 1 percent.

4.1 Illustrative 3-asset case study

Following Brown and Smith (2011), we consider a problem with three riskyassets where uncertain returns follow a factor model with its single factor as anautoregressive process. More objectively,

log(1 + rt+1) = ar + brzt + εt+1 and zt+1 = az + bzzt + ηt+1, (21)

3To fit HMM in the MSDDP framework, we assume, as an approximation, that the mostprobable Markov state is actually observable.

14

Page 15: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

where (εt, ηt) are independent and identically distributed multivariate normaldistribution with zero mean and covariance matrix Σεη. With no transactionalcosts, the stochastic dynamic program

Qt(Wt, zt) = maxut

E[Qt+1

((1 + rt+1)>ut, zt+1

)| zt]

s.t. ρ[rt+1

>ut | zt]≤ γWt

1>ut = Wt

ut ≥ 0.

can be accurately approximated by its sample average approximation (SAA)counterpart and efficiently solved using traditional dynamic programming tech-niques due to a low-dimensional state space. We use positive homogeneity, i.e.,Qt(Wt, zt) = Wt · Qt(1, zt), and simulate S realizations of (εt, ηt) denoted by(εt(s), ηt(s)),∀s = 1, . . . , S to solve the deterministic equivalent SAA problem

Qt(Wt, zt) = maxut

S−1S∑s=1

(1 + rt+1(s)

)>ut ·Qt+1 (1, zt+1(s)) (22)

s.t. z + (S(1− α))−1

S∑s=1

ys ≤ γWt

rt+1(s)>ut + z + ys ≥ 0, ∀s = 1, . . . , S

ys ≥ 0, ∀s = 1, . . . , S

1>ut = Wt

ut ≥ 0,

where log(1 + rt+1(s)) = ar + brzt + εt+1(s) and zt+1(s) = az + bzzt + ηt+1(s).To evaluate how many samples are necessary to accurately approximate the

true underlying problem, we solve problem (22) via stochastic dynamic program-ming (SDP), obtaining different policies for S ∈ 250, 500, 750, 1000. Given afixed risk-aversion parameter γ ∈ 0.01, 0.02, 0.05, 0.08, 0.10, we evaluate thesepolicies with 1000 out-of-sample paths randomly generated following the originalstochastic process (21). For each pair of policies with different S, we computethe expected return difference estimator and perform a pairwise t-test with thenull hypothesis that the difference is zero; see Table 1.

Table 1: Expected return difference p-values for pairwise SAA SDP policiesSAA SDP policies (S)

γ 250 vs 500 500 vs 750 750 vs 10000.02 0.1074 0.0431 0.13110.05 0.3719 0.5330 0.78120.08 0.5280 0.9794 0.75720.10 0.2958 0.7183 0.26030.20 0.5846 0.2985 0.3856

15

Page 16: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

For instance, we observe that for γ = 0.02, the difference of returns of thepolicies generated with S = 750 and S = 1000 has a p-value of 13.11 percent.Then, we cannot reject the hypothesis that these two policies have the sameexpected return, so we would choose S = 750 because it is computationally lessexpensive. Henceforth, we assume 750 samples, which is the highest S thataccurately approximates the true problem for different values of γ.

Considering now transaction costs, traditional dynamic programming tech-niques become computationally intractable due to the high-dimensional statespace. To efficiently solve the problem with MSDDP, we need to approximatefactor dynamics by a Markov chain and formulate it as in (20). For this purpose,we need to determine the number of Markov states required for an accurate ap-proximation. For the no transactional cost case, we know the optimal policyobtained by (22), and we use its optimal expected return as a benchmark tocompare with MSDDP policies with an increasing number of Markov states anddifferent values of γ. We evaluate the expected return difference estimator using1000 out-of-sample paths and perform a pairwise t-test with zero difference asthe null hypothesis; see the p-values in Table 2. For instance, for γ = 0.02 andK = 4, the p-value is 67.09 percent, and we cannot reject that the MSDDPexpected return is different from the optimal. Thus, we assume K = 4, whichis the highest K recommended by the t-test in Table 2 for a significance levelof 5 percent.

Table 2: p-values for the difference of expected return associated with SDP vsK-state Markov SDDP

Markov SDDP policiesγ K = 1 K = 2 K = 3 K = 4

0.02 0.0685 0.0463 0.0481 0.67090.05 0.3806 0.0447 0.3015 0.64790.08 0.0653 0.0423 0.5115 0.45200.10 0.1906 0.2112 0.3641 0.46280.20 0.5927 0.5927 0.5927 0.5927

For comparison purposes, we devise heuristic investment strategies as inBrown and Smith (2011) to compare with the MSDDP solution. For instance,a one-step strategy (OSS) that solves

Ht(xt, zt) = maxut,bt,dt

E[(1 + rt+1(zt))

>ut | zt]

(23)

s.t. ρ[rt+1(zt)

>ut | zt]

+ c>(bt + dt) ≤ γ(1>xt

)u0,t + (1 + c)>bt − (1− c)>dt = x0,t

ui,t − bi,t + di,t = xi,t, ∀i ∈ Aut,bt,dt ≥ 0,

would only optimize the next period return considering time dependence andtransaction costs and ignoring the value function. A modified one-step strategy

16

Page 17: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

(MOSS) that solves

Ht(xt, zt) = maxut,bt,dt

E[(1 + rt+1(zt))

>ut ·Qt+1 (1, zt+1(zt)) | zt]

(24)

s.t. ρ[rt+1(zt)

>ut | zt]

+ c>(bt + dt) ≤ γ(1>xt

)u0,t + (1 + c)>bt − (1− c)>dt = x0,t

ui,t − bi,t + di,t = xi,t, ∀i ∈ Aut,bt,dt ≥ 0,

would be similar to the one step with the simplified value function of problem(22) that disregards future transaction costs.

To compare the MSDDP solution4 with these alternative heuristic approaches(OSS and MOSS), we simulate all strategies using 1000 return scenarios simu-lated using (21). We present the results for γ = 0.05, 0.1, 0.2, 0.3 and trans-action cost c = 0.005, 0.013, 0.02 in Table 3.

Table 3: Monthly Expected Return for different methods, γ and transactionalcosts

Trans. cost γ MSDDP OSS MOSS

0.005

0.05 0.0465 0.0464 0.04640.1 0.0964 0.0962 0.09610.2 0.1297 0.1297 0.12970.3 0.1297 0.1297 0.1297

0.013

0.05 0.0418 0.0158 0.01540.1 0.0886 0.0319 0.03210.2 0.1207 0.0436 0.04360.3 0.1207 0.0443 0.0438

0.02

0.05 0.0384 0.0003 0.00030.1 0.0817 0.0004 0.00050.2 0.1131 0.0007 0.00070.3 0.1131 0.0008 0.0006

Note that for a small transaction cost rate, the expected return is almost thesame for MSDDP and the benchmark strategies. As transactional costs increase,the expected return significantly outperforms selected benchmarks for all risk-aversion levels γ. For better visualization of the results, we plot risk-returncurves (γ VS expected return) for all strategies and differing transactional costsin Figure 7.

The influence of the transactional costs is explicit in Figure 7. When it is low,the results are very similar, but for the highest values, the difference betweenthe approaches became very large. Thus, for c = 0.02 One Step and One StepFC allocate everything in the risk-free asset because of the costs negative effect.

4Here, we assume K = 4 Markov states and S = 750 for the sample average approximation.

17

Page 18: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

γ

0.0050.0130.02

Trans. cost

0.0 0.1 0.2 0.30.000

0.005

0.010

0.015

MO

SS

0.000

0.005

0.010

0.015

OS

S

0.000

0.005

0.010

0.015

MS

DD

PM

onth

lyE

xp

ecte

dR

etu

rn

Figure 7: Risk return curves for MSDDP and benchmarks strategies.

18

Page 19: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

MSDDP iteration

1 2 3 4 5 6

UBLB

0.010

0.015

0.020

0.025

0.030

0.035O

bje

ctiv

eva

lue

Figure 8: Convergence of the MSDDP for 100-asset.

Note that the heuristic strategies barely invest in risky assets when transactionalcosts are significantly high, while in the MSDDP, we observe only a small shiftin the efficient frontier.

4.2 Realistic 100-asset and 5-factor case study

Now, let us consider a realistic case study with 100 assets with transactionalcosts and a factor model for asset returns. We use Kenneth French monthlydata for 100 portfolios by book-to-market and size and 5 factors as in Famaand French (2015, 2016). We estimate a Markov chain with 3 discrete stateto represent factor dynamics and assume the estimate stochastic process is thetrue one; see Appendix B for details.

We solve the problem (20) for T = 6, S = 1000 and K = 3 with a probabilis-tic guarantee of 90 percent that the optimality GAP is smaller than 1 percent.We present the results for γ = 0.06 and c = 0.02 in Figure 8; the computationaltime was approximately 1 hours and 40 minutes. We solve this instance usinga single core of Intel i7-5820K 3.3 GHz with 64 GB of memory. The implemen-tation was performed in Julia(Bezanzon et al., 2012) using JuMP(Lubin andDunning, 2015) and CPLEX(V12.5.1) to solve linear programming problems.

Moreover, we observe in Figure 9 how the percentage allocation changes asMSDDP converges. Note that in the first iteration, the value function is notconsidered in the allocation, and consequently, it starts with 100 percent inthe risk-free asset, which is the first asset. As the approximation of the valuefunction improves, the allocation changes, converging to a 5-asset portfolio.

19

Page 20: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

2 4 6 8 10 12 14 16 18 20 22 24 26 28MSDDP iteration

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Percenta

geallocation

PortfoliosRFME1 OP5ME1 OP7ME5 OP9ME7 OP4

Figure 9: Percentage allocation by MSDDP iteration for 100-asset.

5 Conclusions and Future Works

In this work, we proposed a realistic dynamic asset allocation model that con-siders multiple assets, transactional costs and a Markov factor model for assetreturns. We introduce risk aversion using an intuitive user-defined parameterto limit the maximum percentage loss in one period. The model maximizes theexpected portfolio return with one-period CV@R constraints, guaranteeing atime-consistent optimal policy, i.e., planned decisions are actually going to beimplemented. We extend the literature results of myopic policies for our risk-constrained model if we assume no transactional costs and stage-wise (time)independent asset returns. Moreover, we proved a relatively complete recoursemodel when the maximum percentage loss limit is greater than or equal to thetransactional cost rate.

We efficiently solve the proposed model using Markov chain stochastic dualdynamic programming (MSDDP) for an illustrative 3-asset problem and a high-dimensional 100-asset dynamic portfolio problem. In the illustrative 3-assetproblem, we compare our optimal policy to selected (heuristic) benchmarksused in the literature. For these cases, we observe that these heuristics are closethe optimal when transactional costs are small but perform poorly otherwise.Indeed, we observe that, at a certain level of transactional costs, one-periodportfolio optimization does not invest in risky assets, while the optimal dynamicpolicy suggests significant allocation in the risky assets. To show the computa-tional tractability and scalability of our approach, we solve a high-dimensional100-asset model with 5 factors following a Markov chain with 3 discrete states.We provide convergence graphs that illustrate a probabilistic guarantee of op-timality, i.e., we ensure with 90 percent probability that the optimality gap issmaller than or equal to 1 percent. We also illustrate how first stage percentage

20

Page 21: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

asset allocation changes as the MSDDP converges. Indeed, the allocation in thefirst iterations of the algorithm is close to the static (one-period) counterpartand converges to the optimal solution of the dynamic problem.

To the best of our knowledge, this is the first systematic approach for opti-mizing a high-dimensional dynamic stochastic asset allocation. We argue thatour framework is flexible enough to approximately represent a relevant set of dy-namic portfolio models and operational constraints on holdings and turnovers.In addition, we believe that time-consistent risk-constrained policies are bettersuited for practical applications since its risk aversion parametrization is directand intuitive for all investors capable of determining a maximum percentageloss allowed in a given period.

References

Artzner P, Delbaen F, Eber JM, Heath D (1999) Coherent measures of risk. Mathe-matical finance 9(3):203–228.

Bezanzon J, Karpinski S, Shah V, Edelman A (2012) Julia: A fast dynamic languagefor technical computing. Lang.NEXT, URL http://julialang.org/images/

lang.next.pdf.

Bick B, Kraft H, Munk C (2013) Solving Constrained Consumption—InvestmentProblems by Simulation of Artificial Market Strategies. Management Science59(2):485–503.

Birge JR, Louveaux F (2011) Introduction to stochastic programming (Springer Science& Business Media).

Blomvall J, Shapiro A (2006) Solving multistage asset investment problems by the sam-ple average approximation method. Mathematical programming 108(2-3):571–595.

Brown DB, Smith JE (2011) Dynamic portfolio optimization with transaction costs:Heuristics and dual bounds. Management Science 57(10):1752–1770.

Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochasticdynamic programs. Operations research 58(4-part-1):785–801.

Bruno S, Ahmed S, Shapiro A, Street A (2016) Risk neutral and risk averse approachesto multistage renewable investment planning under uncertainty. European Jour-nal of Operational Research 250(3):979–989.

Chen ZL, Powell WB (1999) Convergent cutting-plane and partial-sampling algorithmfor multistage stochastic linear programs with recourse. Journal of OptimizationTheory and Applications 102(3):497–524.

Cheridito P, Delbaen F, Kupper M, et al. (2006) Dynamic monetary risk measures forbounded discrete-time processes. Electronic Journal of Probability 11(3):57–106.

Constantinides GM (1979) Multiperiod consumption and investment behavior withconvex transactions costs. Management Science 25(11):1127–1137.

Cont R (2001) Empirical properties of asset returns: stylized facts and statisticalissues. Quantitative Finance 1(2):223–236.

Davis MH, Norman AR (1990) Portfolio selection with transaction costs. Mathematicsof Operations Research 15(4):676–713.

21

Page 22: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Donohue CJ, Birge JR (2006) The abridged nested decomposition method for multi-stage stochastic linear programs with relatively complete recourse. AlgorithmicOperations Research 1(1).

Elliott RJ, Siu TK (2014) Strategic asset allocation under a fractional hidden markovmodel. Methodology and Computing in Applied Probability 16(3):609–626.

Elliott RJ, Van der Hoek J (1997) An application of hidden markov models to assetallocation problems. Finance and Stochastics 1(3):229–238.

Fama EF, French KR (2015) A five-factor asset pricing model. Journal of FinancialEconomics 116(1):1 – 22.

Fama EF, French KR (2016) Dissecting Anomalies with a Five-Factor Model. Reviewof Financial Studies 29(1):69–103.

Granger CW, Ding Z (1994) Stylized facts on the temporal and distributional prop-erties of daily data from speculative markets. UCSD Department of EconomicsDiscussion Paper 94–19.

Hassan MR, Nath B (2005) Stock market forecasting using hidden markov model:a new approach. Intelligent Systems Design and Applications, 2005. ISDA’05.Proceedings. 5th International Conference on, 192–196 (IEEE).

Infanger G, Morton DP (1996) Cut sharing for multistage stochastic linear programswith interstage dependency. Mathematical Programming 75(2):241–256.

Kozmık V, Morton DP (2014) Evaluating policies in risk-averse multi-stage stochasticprogramming. Mathematical Programming 1–26.

Lubin M, Dunning I (2015) Computing in operations research using julia. IN-FORMS Journal on Computing 27(2):238–248, URL http://dx.doi.org/10.

1287/ijoc.2014.0623.

Mamon RS, Elliott RJ, eds. (2007) Hidden Markov Models in Finance, volume 104of International Series in Operations Research & Management Science (Boston,MA: Springer US).

Mamon RS, Elliott RJ, eds. (2014) Hidden Markov Models in Finance, volume 209of International Series in Operations Research & Management Science (Boston,MA: Springer US).

Merton RC (1969) Lifetime portfolio selection under uncertainty: The continuous-timecase. The review of Economics and Statistics 247–257.

Merton RC (1971) Optimum consumption and portfolio rules in a continuous-timemodel. Journal of economic theory 3(4):373–413.

Mo B, Gjelsvik A, Grundt A (2001) Integrated risk management of hydro powerscheduling and contract management. Power Systems, IEEE Transactions on16(2):216–221.

Morettin PA, Toloi C (2006) Analise de series temporais (Blucher).

Mossin J (1968) Optimal multiperiod portfolio policies. The Journal of Business41(2):215–229.

Muthuraman K (2007) A computational scheme for optimal investment–consumptionwith proportional transaction costs. Journal of Economic Dynamics and Control31(4):1132–1159.

Pereira M, Pinto L (1985) Stochastic optimization of a multireservoir hydroelectricsystem: a decomposition approach. Water resources research 21(6):779–792.

22

Page 23: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Pereira MVF, Pinto LMVG (1991) Multi-stage stochastic optimization applied to en-ergy planning. Math. Program. 52(2):359–375.

Philpott AB, de Matos VL (2012) Dynamic sampling algorithms for multi-stagestochastic programs with risk aversion. European Journal of Operational Re-search 218(2):470–483.

Rockafellar RT, Wets RJB (1991) Scenarios and policy aggregation in optimizationunder uncertainty. Mathematics of operations research 16(1):119–147.

Rudloff B, Street A, Valladao DM (2014) Time consistency and risk averse dynamicdecision models: Definition, interpretation and practical consequences. EuropeanJournal of Operational Research 234(3):743–750.

Ryden T, Terasvirta T, Asbrink S (1998) Stylized facts of daily return series and thehidden markov model. Journal of applied econometrics 13(3):217–244.

Samuelson PA (1969) Lifetime portfolio selection by dynamic stochastic programming.The review of economics and statistics 239–246.

Shapiro A (2009) On a time consistency concept in risk averse multistage stochasticprogramming. Operations Research Letters 37(3):143–147.

Shapiro A (2011) Analysis of stochastic dual dynamic programming method. Euro-pean Journal of Operational Research 209(1):63–72, URL http://EconPapers.

repec.org/RePEc:eee:ejores:v:209:y:2011:i:1:p:63-72.

Shapiro A, Tekaya W, da Costa JP, Soares MP (2013) Risk neutral and risk aversestochastic dual dynamic programming method. European journal of operationalresearch 224(2):375–391.

Shreve SE, Soner HM (1994) Optimal investment and consumption with transactioncosts. The Annals of Applied Probability 609–692.

Valladao DM, Veiga A, Veiga G (2014) A multistage linear stochastic programmingmodel for optimal corporate debt management. European Journal of OperationalResearch 237(1):303–311.

23

Page 24: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

A Markov chained Stochastic Dual Dynamic Pro-gramming

To better understand how the Markov chained SDDP we will explain the mainpart os the algorithms in pseudo-code.

Algorithm 1 Markov chained Stochastic Dual Dynamic Programming

Require: kini, xini and Vkt T,Kt=1,k=1(Init. future value function)

Initialize: GAP = 100, x1 ← xiniwhile GAP > 1 and i < max do

Sample scenario: rt, jtTt=1

[v, xtTt=1]← Foward

VjtT,Kt=1,j=1 ← Backward

[Upper bound update]UB ← V kini

1 (x1) with Vk2

[Lower bound update]for sf = 1→ SLB do

Sample scenario: rt, jtTt=1

[LB[sf ], xtTt=1]← Fowardif sf ≥ SIniLB thenLBc ← mean(LB)− 1.28) ∗ std(LB)/sqrt(sf )GAP ← abs(100 ∗ (UB − LBc)/UB)

end ifend fori← i+ 1

end while

It is important to notice that the GAP is constructed using the lower limitof the lower bound. This suggest, that with confidence αlB the SAA problemGAP is equal or small then 1%.

Algorithm 2 Forward

x1 ← xinifor t = 1→ T − 1 dout ← argut

V jtt (xt) with Vkt+1

xt+1 ← Rt+1 ∗ utend forv ← V j11 (x1)return v, xtTt=1

24

Page 25: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

Algorithm 3 Backward

for t = T − 1→ 1 dofor j = 1→ K do

[v∗t , ut, πrt , π

ct ,π

at ]j ← V jt (xt)

for k = 1→ K doVkt−1 ← addcuts([v∗t , ut, π

rt , π

ct ,π

at ]j)

end forend for

end for

where τ, πc,πa, are respectively, dual variables for the first, second and thirdconstraints of the problem (20).

B Details of case studies

B.1 Illustrative 3-asset case study

The main goal of this experiment was to compare the SDP alternative approachwith MSDDP. Regression coefficients ar and br was the same as Brown andSmith (2011) as shown in Table 4.

Table 4: 3-asset regression model

Large-cap Mid-cap Small-cap Dividend YieldMean(ar, az) 0.0053 0.0067 0.0072 0.0000Regression coeff.(br, bz) 0.0028 0.0049 0.0061 0.9700

Covariance(Σev)Large-cap Mid-cap Small-cap Dividend Yield

Large-cap 0.002894 0.003532 0.00391 -0.000115Mid-cap 0.004886 0.005712 -0.000144

Small-cap 0.007259 -0.000163Dividend Yield 0.052900

and rf = 0.00042.

B.2 Realistic 100-asset and 5-factor case study

As it is a factor model the Markov states are only evaluated for the factors.To construct the returns regression it was used vectors ar and br that are ob-tained using linear regression. The estimates of factor probability distributions

25

Page 26: Optimization Online - High-dimensional risk-constrained ...In a discrete-time frame work, Constantinides (1979) address a dynamic portfolio selection with propor- tional transaction

conditioned to each Markov state is given by

f1 ∼ N

0.02320.0073−0.0135−0.0145−0.0024

,

−0.0004 0.0039−0.0015 −0.0008 0.0033−0.0001 −0.0017 0.0016 0.0029−0.0005 0.0004 0.0014 −0.0002 0.0014

,

f2 ∼ N

0.00700.00210.00240.00240.0017

,

0.00130.0003 0.0006−0.0002 0.0000 0.00050.0000 −0.0001 −0.0001 0.0002−0.0001 0.0000 0.0003 −0.0001 0.0003

,

f3 ∼ N

−0.0375−0.00130.01950.01250.0174

,

0.00620.0024 0.00230.0000 0.0005 0.0022−0.0007 −0.0001 0.0006 0.0013−0.0009 −0.0001 0.0012 0.0007 0.0012

,

where log(1 + rt+1) = ar + brFt + εt and Ft = fk if Kt = k.

26