Rubén Ruiz-Torrubiano and Alberto Suárezstatic.tongtianta.site/paper_pdf/b064862a-e1e5-11e... ·...

92 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2010 1556-603X/10/$26.00©2010IEEE

Rubén Ruiz-Torrubiano and Alberto Suárez Universidad Autónoma de Madrid, SPAIN

Digital Object Identifier 10.1109/MCI.2010.936308 © D

IGIT

AL

VIS

ION

I. Introduction

The selection of opti-

mal investment port-

folios is a problem of

great interest in the

area of quantitative finance and

has attracted much attention in

the scientific community [1]

[2] [3] [4] [5] [6] [7]. The

problem is of practical impor-

tance for investors, whose

objective is to allocate capital

in an optimal manner among

assets while respecting some

investment restrictions. Several

models and methods for solv-

ing this problem have been

proposed, mostly within the

classical framework developed

by H. Markowitz in 1952 [8].

In the work of Markowitz,

which sets the foundations of

modern portfolio optimization

theory, the returns of the prod-

ucts which compose the port-

folio are uncertain and are

therefore modeled as random

variables. The profit of the investment is the expected portfolio

return. The risk measure is the variance of the portfolio returns.

Therefore, portfolio selection

is a multiobjective optimiza-

tion task with two opposing

goals: the maximization of

profit and the minimization of

risk. Since these cannot be

simultaneously achieved, the

problem is to identify the effi-

cient frontier, which is defined as

the set of solutions that are

Pareto optimal. A portfolio is

in the efficient frontier if for a

given risk, no other portfolio

with a larger expected return

can be built by modifying the

investment weights. By duality,

this same portfolio is the one

that minimizes risk for that

value of expected return.

Without fur ther con-

straints the identification of

portfolios in the efficient

frontier is a quadratic optimi-

zation problem that can be

readily solved by standard

numerical techniques [9]. A

difficulty with this approach

is that the portfolios resulting from this unconstrained opti-

mization typically invest small amounts in large numbers of

products to take advantage of the benefits of diversification

and reduce the overall risk. This type of investment strategy

Abstract: A novel memetic algorithm that combines evolutionary algorithms, quadratic

programming, and specially devised pruning heu-ristics is proposed for the selection of cardinality-constrained optimal portfolios. The framework used is the standard Markowitz mean-variance for-mulation for portfolio optimization with con-straints of practical interest, such as minimum and maximum investments per asset and/or on groups of assets. Imposing limits on the number of differ-ent assets that can be included in the investment transforms portfolio selection into an NP-com-plete mixed-integer quadratic optimization prob-lem that is difficult to solve by standard methods. An implementation of the algorithm that employs a genetic algorithm with a set representation, an appropriately defined mutation operator and Ran-dom Assortment Recombination for crossover (RAR-GA) is compared with implementations using Simulated Annealing (SA) and various Esti-mation of Distribution Algorithms (EDAs). An empirical investigation of the performance of the portfolios selected with these different methods using financial data shows that RAR-GA and SA are superior to the implementations with EDAs in terms of both accuracy and efficiency. The use of pruning heuristics that effectively reduce the dimensionality of the problem by identifying and eliminating from the universe of investment assets that are not expected to appear in the optimal portfolio leads to significant improvements in per-formance and makes EDAs competitive with RAR-GA and SA.

MAY 2010 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 93

94 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2010

can be difficult to implement in practice: Portfolios com-

posed of a large number of assets are difficult to manage and

may incur in high transaction costs. To address this shortcom-

ing, several restrictions can be imposed on the allocation of

capital among assets. One can limit the total number of assets

in the final portfolio or impose lower and upper bounds to

the proportion of capital invested in each product. These

constraints make the problem difficult to solve by standard

optimization techniques. In fact, the problem becomes NP-

Complete [10], [7]. Nonetheless, heuristic optimization

methods can be applied to identify near-optimal solutions at

a reasonable computational cost. Several general optimization

techniques have been proposed to solve this problem: evolu-

tionary algorithms [11] [12] [1], simulated annealing [13] [3],

tabu search [14] [6], and other.

Optimization problems that share some of the challeng-

ing characteristics of the portfolio selection problem (mul-

tiple objectives, uncertainty and complex constraints) have

been the object of intense recent research efforts. In partic-

ular, there have been important contributions in the appli-

cation of evolutionary multiobjective optimization methods

to the solution of real-world problems requiring expensive

computations [15], [16], [17]. Memetic algorithms have

been identified as one of the most promising approaches to

complex optimization problems [18], [19]. Their strength

lies in the combination of population-based metaheuristic

search, which imparts generality to this methodology, with

periods of local search, whose objective is to provide a

mechanism for the individual improvement of the candi-

date solutions. In this manner, it is possible to introduce in

the evolutionary process domain-specific knowledge that

can help identify high-quality solutions with a reduced

computational effort.

In this work, we apply these ideas to construct a variety of

portfolio optimization algorithms by combining either a

genetic algorithm with a RAR crossover operator [7] [20],

simulated annealing or various estimation of distribution

algorithms [21], quadratic programming [9] and specially

designed pruning heuristics that effectively reduce the size of

the search space. The problem of finding the optimal subset

of assets for investment is addressed by the metaheuristic

technique. The continuous optimization problem of finding

the optimal weights given this subset is solved using a stan-

dard quadratic optimization algorithm. Different variants of

EDAs are investigated in this work, including models in

which all the variables are assumed to be statistically inde-

pendent [22] [23], and models in which

more complex interactions between the

variables are allowed [24]. The evaluation

and comparison of the different methods

are carried out on publicly available bench-

mark data from the OR-Library [25], which

includes data from assets included in major

world stock indexes. The experiments per-

formed show that efficient and accurate

solutions are obtained when special pruning heuristics are

applied. Pruning attempts to identify and eliminate from the

universe of assets available for investment products that are

not likely to appear in the optimal portfolio. The use of

pruning heuristics not only improves the results of the hybrid

methods based on EDA, but also enhances the efficiency of

SA and of genetic algorithms with a set representation

(RAR-GA) [7].

The presentation is organized as follows: Section II intro-

duces the Markowitz mean-variance framework for portfo-

lio selection. In Section III the hybrid optimization strategies

used to solve the problem are described. Section IV presents

the results of an empirical comparison of the methods pro-

posed and Section V summarizes the conclusions of the cur-

rent work.

II. Optimal Portfolio Selection

A. The Markowitz Mean-Variance ModelThe work published by H. Markowitz in 1952 [8] is consid-

ered as the foundation of modern portfolio theory. It provides

a mathematical framework for optimal portfolio selection.

The problem consists in how to allocate a fixed amount of

capital among different assets, whose evolution is subject to

uncertainty, in such a way that the expected return of the

investment is maximized and the risk associated with it is

minimized. The approach adopted by Markowitz is to model

the asset returns as a random vector whose distribution can

be fully characterized by a vector of expected returns and a

covariance matrix. If the returns of the assets were determin-

istic (zero variance) the optimal solution would be to invest

only in the asset with the highest expected return. In the

presence of uncertainty this solution need not be optimal.

The investor may prefer to settle for a lower expected return

provided that the uncertainty in the investment is reduced as

well. This establishes a rational basis for understanding the

potential benefits of diversification.

Let N be the number of assets in U, the universe of prod-

ucts available for investment. Consider the prices of each mar-

ket product at time instant t,

5Si 1 t 2 6 i51N .

The Markowitz framework considers the evolution of the

portfolio during the period 3t, t 1 Dt 4. The composition of the

portfolio (5ci6 i51N , where ci is the units of product i) is deter-

mined at the beginning of this period and held constant during

Portfolios composed of a large number of assets are difficult to manage and may incur in high transaction costs. To address this shortcoming, several restrictions can be imposed on the allocation of capital among assets.


the interval considered. The evolution of the value of the port-

folio during this period is

P 1t 2 5 aN

i51

ciSi 1t 2 , t [ 3t, t 1 Dt 2 . (1)

Since the composition of the portfolio is fixed, the changes in

portfolio value are due to changes in the market value of the

assets of which it is composed (i.e., market capitalization). The

return of the portfolio in the interval 3t, t 1 Dt 2 , is rP 5

P 1 t 1 Dt 2 2 P 1 t 2P 1 t 2 5 a

N

i51

wiri 5 wT # r,

where rT 5 1 r1, r2, c, rN 2 is the transposed vector of asset

returns

ri 5Si 1 t 1 Dt 2 2 Si 1 t 2

Si 1 t 2 , (2)

and wT 5 1w1, w2, c, wN 2 is the transposed vector of asset

weights

wi 5ciSi 1 t 2a

N

j51cjSj 1 t 2

; aN

i51

wi 5 1; 0 # wi # 1, (3)

which determines the fraction of capital invested in each asset.

Thus, the return of the portfolio is a convex combination of

the returns of the assets included in the portfolio. Following

Markowitz, we make the assumption that the distribution of

the returns is completely determined by the vector of means r

and the covariance matrix S. In this work these parameters are

assumed to be known and given as input to the model. Their

estimation from historical and market data is itself an active

research field [26], [27].

The expected value and the variance of the portfolio

returns can be expressed in matrix form as

E 1 rP 2 5 aN

i51

wiri 5 wT # r (4)

Var 1 rP 2 5 aN

i51aN

j51

wiwjsij 5 wT # S # w. (5)

The portfolio selection problem can be formulated as the

solution of two different optimization problems that have

common solutions. In the first formulation, the goal is to

minimize a measure of risk for a fixed target value R* of the

expected return

minw Var 1 rP 2 5 wT # S # w

(6)

s.t. wT # r 5 R*

aN

i51

wi 5 1 wi $ 0, i 5 1, c, N.

The dual of this convex optimization problem [28] consists in

maximizing the expected return while the variance of the

portfolio returns is held constant at a specified value 1s2 2 *, which is determined by the risk profile of the investor. The

solution involves optimizing a linear function with linear and

quadratic constraints:

maxw E 1 rP 2 5 wT # r

(7)

s.t. wT # S # w 5 1s2 2 * a

N

i51wi 5 1 wi $ 0, i 5 1, c, N.

There are several efficient quadratic optimization techniques

that can be used to minimize a quadratic form subject to linear

equality and inequality constraints [9]. For this reason the first

formulation has been chosen. The set of portfolios that are

optimal (i.e. that minimize the risk) for fixed values of the

expected return R*, where R* is allowed to vary in the interval 3Rmin, Rmax 4 is the efficient frontier. Each point in the efficient

frontier is said to be Pareto efficient. Points on the efficient fron-

tier correspond to minimum-risk portfolios for a given expect-

ed return, or, alternatively, portfolios that have the largest

expected return from a family of portfolios with equal risk.

B. Constraints in Portfolio Selection ProblemsThere are several ways of refining the standard Markowitz

model to incorporate constraints that are commonly used in

real-world portfolio selection problems. These restrictions are a

consequence of market rules and conditions for investment or

simply reflect different investor profiles and preferences. For

instance, constraints can be included to control the amount of

desired diversification, i.e., minimum and maximum bounds on

the investment on an individual asset or on a group of assets.

An investor may also want to limit the maximum number of

assets included in her portfolio, either to simplify the manage-

ment of the portfolio or to reduce transaction costs.

In this work, we assume that the investor constructs a port-

folio from scratch assuming that there are no transaction costs.

The constrained optimization problem is

minz,w Var 1 rP 2 5 w3z4T # S3z,z4 # w3z4 (8)

s.t. w3z4T # r 3z4 5 R* (9)

a3z4 # w3z4 # b3z4, a3z4 $ 0, b3z4 $ 0 (10)

l # A3z4 # w3z4 # u (11)

zT # 1 # K (12)

wT # 1 5 1, w $ 0. (13)

The elements of the binary vector z specify whether product i

is included in the final portfolio (zi 5 1) or not (zi 5 0). The

column vector w3z4 is obtained by removing from w compo-

nents i for which wi 5 0. Similarly, matrix A3z4 is obtained by

eliminating the i-th column of A whenever zi 5 0. Finally,

S3z,z4 is obtained by removing from S the rows and columns

for which the corresponding indicator is zero (zi 5 0). The

symbols 0 and 1 denote vectors whose entries are all equal to 0

or to 1, respectively.

Minimum and maximum investment constraints, which set a

lower and an upper bound on the investment of each asset in


the portfolio are captured by (10). Vectors a and b are N 3 1

column vectors with the lower and upper bounds on the port-

folio weights, respectively. The inequality (11) reflects the M

concentration of capital constraints. The m-th row of the M 3 N

matrix A is the vector of coefficients of the linear combination

that defines the constraint. The M 3 1 column vectors l and u

correspond to the lower and upper bounds of the M linear

restrictions, respectively. Concentration of capital constraints

can be used, for instance, to limit the amount of capital invested

in a group of assets, so that investor preferences on certain asset

classes can be formally expressed. Since these constraints are

linear, they do not increase the difficulty of the problem, which

can still be efficiently solved by quadratic programming.

Expression (12) corresponds to the cardinality constraint, which

sets a bound on the number of assets that can be included in

the final portfolio. This constraint transforms the problem into

a mixed- integer quadratic programming problem, which is no

longer convex. Finally, equation (13) is a budget constraint that

ensures that the whole amount of capital is invested in the

portfolio. Note that the portfolio selection problem with the

cardinality constraint as an inequality (12) can be solved by

selecting the best of the optimal solutions of a collection of

problems that use equality cardinality constraints aN

i51 zi 5 k,

for k 5 1, 2, c, K.

C. Previous WorkSeveral authors have addressed the problem of selecting optimal

portfolios with different approaches. Branch-and-Bound tech-

niques were used in [2] to solve the problem exactly. Despite

the fact that these techniques improved the efficiency of the

search, the time needed to find the global optimum still grows

exponentially with the number of assets available for invest-

ment. Genetic Algorithms have also been used to address this

problem. In [1], the performance of GAs was compared to

Simulated Annealing (SA) [13] and to Tabu Search [14]. In this

work the best results were obtained by pooling the results of

the different heuristics. In [3] Simulated Annealing was used to

explore the space of real-valued asset weights. Tabu Search was

employed in [6]. This work focused on the design of appropri-

ate neighborhood operators to improve the efficiency of the

search. In [29], a hybrid portfolio selection and forecasting

approach was designed by combining ARX, Grey systems and

rough set theory. The authors predicted stock market trends

and, making use of these predictions, selected an investment

portfolio based on automatically generated investment rules.

Several factors reflecting company wealth information were

also considered in their approach. In [30] [31] Multi-Objective

Evolutionary Algorithms (MOEAs) were used to address the

problem. These algorithms employed a hybrid encoding instead

of a pure continuous one. Heuristic operators were applied to

repair unfeasible individuals generated in the course of the evo-

lution. The impact of Lamarckism and of local search improve-

ments were also analyzed in this work. The authors concluded

that the hybrid encoding used improves the overall perfor-

mance of the algorithm. A hybrid optimization approach to the

problem of tracking a stock index by partial replication is also

proposed in [32], [33], [34]. Index tracking is closely related to

the selection of optimal portfolios. The goal is to build a track-

ing portfolio by investing in a subset of the assets that are

included in the index. In these articles, the discrete optimiza-

tion problem of calculating the optimal subset of assets to be

included in the portfolio, and the problem of determining the

weight of the selected assets by minimizing the tracking error

were handled separately. The subset selection problem was

solved using a genetic algorithm with different mutation and

crossover operators that are specially designed to ensure that

the cardinality constraints are satisfied by the chromosomes

generated by their application. In particular, the set represen-

tation and the RAR crossover operators introduced by [32]

are used in the current investigation. The comparison of

genetic algorithms that use a standard binary representation,

where the violations of the cardinality constraint are either

penalized or avoided by repairing unfeasible individuals, and a

genetic algorithm based on set representations of fixed cardi-

nality, as in [32], was performed in [7], [35]. These articles

showed that using penalty terms in the cost function is not an

effective method for handling unfeasible individuals when a

binary encoding is used. In contrast, algorithms in which the

generated individuals always remain feasible either by repair

techniques, or by using specially designed operators that pre-

serve the cardinality of the candidate solutions, generally per-

form well. In the current approach, we extend and improve

this work by considering additional metaheuristics and by

exploiting the advantages of pruning techniques that effec-

tively reduce the size of the universe of assets where the

search is conducted.

III. A Hybrid Optimization Approach for Portfolio SelectionWhen the cardinality constraint is relaxed, the problem of

optimal portfolio selection involves minimizing a quadratic

form with linear restrictions. This optimization problem can

be efficiently solved by standard quadratic programming algo-

rithms. Therefore, a possible approach to solving the portfolio

selection problem with cardinality constraints is to enumerate

all possible subsets of K or less assets and, for each of these

subsets, to solve the associated quadratic program that consid-

ers only the assets in the subset for the optimization. Of

course, such an exhaustive enumeration scheme is not practi-

cal in this context: For problems of moderate size, say

N 5 100 and K 5 10, assuming that each quadratic optimiza-

tion takes 1ms, we would need more than 500 years to find

the optimum. Instead of this exact method based on exhaus-

tive search, we propose to handle the combinatorial part of

the problem by an optimization metaheuristic. Each of the

candidate solutions to the optimization problem corresponds

to selecting a subset of assets of the appropriate cardinality.

The quadratic solver is then used to find the optimal weights

for a portfolio that invests only in the assets included in this

subset. In this work, three metaheuristic approaches to solve


the combinatorial part of the problem are compared: genetic

algorithms (GAs), simulated annealing (SA) and estimation of

distribution algorithms (EDAs).

A .Genetic AlgorithmsThe pseudocode for a general genetic algorithm is presented in

Figure 1. In this pseudocode, P denotes the size of the popula-

tion, nP the number of individuals selected for reproduction, pC

the probability of performing crossover and pM the probability

of mutation. In [7] a genetic algorithm [12] with a set repre-

sentation and specially designed crossover and mutation opera-

tors (RAR-GA) [32] was used to efficiently solve the problem.

In this representation, an individual is encoded as a set that

contains the labels of the products included in the portfolio.

For instance, the set 51, 2, 56 represents the optimal portfolio

where only investments in products labeled 1, 2 and 5 are con-

sidered. A mutation operator that preserves the cardinality of

the solution is defined as the interchange of a product in the

portfolio with another product not included in the portfolio.

The Random Assortment Recombination (RAR [20]) opera-

tor is used for crossover and preserves the cardinality of the

solution as well. RAR involves a positive integer parameter, w,

that controls the amount of common information from the

parents that is retained by the offspring. This operator makes

use of six sets: Set A is the intersection set, which contains

products that appear in both parents. Set B includes the prod-

ucts not present in any parent. Sets C and D contain the prod-

ucts present in only one parent. Set E is initially empty (E 5

~). An additional set G is then created with w copies of the

products from A and B and one copy from the products in C

and D. A child chromosome is generated by extracting one

product from G in each iteration. Consider the case that prod-

uct g is extracted from G: if it originally comes from A or C

and g o E, then it is added to the child. If g [ B or g [ D,

then it is included in set E. The process is terminated when the

child has the required cardinality or when G 5 ~. If the latter

happens, then the child is completed with assets not yet includ-

ed, which are selected at random. By this last step, the operator

is able to produce implicit mutations in the child with a small

but non-zero probability.

The fitness function of an individual is the negative of the

risk associated to the optimal portfolio that invests only in the

products included in the chromosome of the individual. To

implement selection, the individuals in the population are

ranked from lower to higher risk. In the crossover step, the par-

ents are selected in two binary tournaments. The worst individ-

ual of the previous population is then replaced by the offspring

generated by the RAR operator.

B. Simulated AnnealingThe simulated annealing (SA) metaheuristic is an optimiza-

tion technique inspired by the field of thermodynamics. It was

first introduced in [13], where it was applied to design of cir-

cuits and to the traveling salesman problem [36]. The main

idea is to mimic the process of slowly cooling a molten solid

to form a crystal, which correspond the lowest energy config-

uration at zero temperature. If the system is cooled too fast,

the state reached may not correspond to a regular lattice, char-

acteristic of a crystalline solid, but to an amorphous con-

densed phase (a glass), which is a metastable local minimum of

the free energy. To find the global energy minimum, the cool-

ing process must be sufficiently slow, so that the system is close

to equilibrium at all stages in the process. A high temperature

is used initially so that the configuration space can be effec-

tively explored. As the temperature is decreased, the search

focuses on regions of the configuration space with lower val-

ues of the free energy until an optimum is reached. At high

temperatures most transitions are accepted and the search is

mainly stochastic. Lowering the temperature favors configura-

tion changes that decrease the energy. At zero temperature, the

search proceeds by steepest descent and the only transitions

accepted are those that lower the total energy of the system.

Based on this analogy, a general optimization algorithm can

be designed. SA performs a biased local neighborhood search

which can in principle avoid getting trapped in local optima.

For sufficiently slow annealing schedules, assuming a connected

search space, it asymptotically converges to the global optimum.

Suppose we are given a function f 1z 2 , with z [ Z. (which

can be a discrete or a continuous space), and a neighborhood

operator N: Z S ` 1Z 2 , where ` 1Z 2 denotes the power set

of Z. The algorithm is detailed in Figure 2. In this algorithm

the parameter T0 denotes the initial temperature and L, the

number of annealing generations. Variable z1k2 encodes the can-

didate configuration after the current annealing step, and zl

denotes a configuration in the neighborhood of the current

candidate solution N 1z1k2 2 . Simulated annealing proceeds from

a given state by considering transitions to neighboring

configurations. If the change in configuration leads to a

decrease of the value of the objective function, the change is

accepted. If the proposed change increases the value of the

objective function, there is a nonzero probability of accepting

FIGURE 1 Pseudocode for the genetic algorithm (GA).


the new configuration. This probability is large at high temper-

atures and becomes smaller as the temperature is decreased. In

the limit of infinite temperature, all proposals are accepted and

the search is purely stochastic. In the limit of zero temperature

only changes that lower the value of objective function are

accepted and the search becomes greedy. The function Paccept

gives the probability of accepting a transition which does not

reduce the energy. It is a function of the current state, z1k2, of

the randomly generated configuration zl in the neighborhood

of z1k2 and of the temperature:

Paccept 1zl, z1k2;Tk 2 5 mina1, expa2

f 1zl 2 2 f 1z1k2 2Tk

bb.

The operator annealingSchedule takes as argument the tem-

perature in a given epoch Tk and returns the temperature Tk11

for the following epoch. A common choice is the geometrical

schedule [13]: Tk 5 aTk21, a [ 10, 1 2 . In principle, the closer

to 1 a is, the better the final solution is, at the expense of lon-

ger execution times. Small values a correspond to fast anneal-

ing, which typically leads to suboptimal local energy minima.

Simulated annealing can be used to solve the combinato-

rial part of the portfolio selection problem with cardinality

constraints. Candidate solutions are represented by sets of

fixed cardinality. The neighborhood operator is defined as the

interchange of one product in the portfolio with another one

not included in it. Note that this is exactly the same as the

mutation operator used in the RAR-GA approach. This

makes it possible to determine the differential contribution of

the RAR crossover operator in the identification of high-

quality solutions. The free energy function is the risk of the

optimal portfolio that invests only in the products included in

the set that defines the candidate solution. This function is

calculated by quadratic programming in the same way as the

fitness value in the RAR-GA approach.

C. Estimation of Distribution AlgorithmsEstimation of distribution algorithms (EDAs) are a class of

evolutionary algorithms in which diversity is generated by a

probabilistic sampling scheme [21]. These algorithms opti-

mize in either continuous or discrete domains by extracting

individuals from a probability distribution, which is modified

in each generation to obtain progressively better individuals

as the evolution proceeds. Candidate solutions are represented

by chromosomes. At each stage in the evolutionary process, a

probability distribution is used to characterize the genotype

of the generation in a statistical manner. A population of

individuals is obtained by generating random samples from

this distribution. Selection is used to identify the subset of

individuals in the sample that have the highest fitness. This

subset is then used to define a new probability distribution

for the genotype. Finally, new individuals are sampled from

this distribution and another generation begins. This process

is repeated until some convergence criteria are met. The

pseudocode for a generic EDA is detailed in Figure 3. In this

pseudocode, P denotes the size of the population and M # P

the number of individuals which are used to update the

probability distribution. The symbol z1ki2 is used to represent

the chromosome in the i-th position of the population in

generation k. The sets Dk and DkSe denote the population in

generation k and the subset of the population selected to

update the probability distribution, respectively.

Numerous studies have shown that EDAs can be competi-

tive and even outperform GAs in many different domains,

especially in optimization tasks where the interrelation between

variables is complex or unknown [21] [23] [37]. One of the

objectives of this work is to determine whether EDAs are also

effective in a complex combinatorial optimization task like

portfolio selection with cardinality constraints. As will be illus-

trated by the experiments performed, algorithms of the EDA

type have difficulties for problems where the search takes place

in a high-dimensional space. To address this limitation, we pro-

pose to reduce the dimensionality of the space of candidate

solutions using pruning techniques that discard assets whose

probability of being selected in the optimal solution is small.

Similarly as GAs and SA, EDAs can be used to to solve the

combinatorial optimization part of the portfolio selection

problem. Unless otherwise stated, we assume that binary

FIGURE 2 Pseudocode for simulated annealing (SA).

FIGURE 3 Pseudocode for a estimation of distribution algorithm (EDA).


variables zi [ 50, 16 are used and p 1zi 2 5 p 1zi 5 1 2 . The fit-

ness function F:BN S R measures the quality of a given can-

didate solution z as F 1z 2 , with B 5 50, 16. There are different

families of EDA algorithms that have been proposed in the lit-

erature [21]. They mainly differ in the form assumed for the

probability distribution of the genotype. We now describe the

particular EDAs considered in this research.

Univariate Marginal Distribution Algorithm (UMDA, 1)

[22]): This algorithm assumes that the distribution of the

binary variables can be factorized as the product of one-di-

mensional marginals. In generation k 1 1 the marginal proba-

bility pik11 1zi 2 is the empirical frequency estimate using the

best M individuals from the previous generation

pik11 1zi 2 5

1

MaM

m51

zi1kl1m22, (14)

where zi1kp2 [ 50, 16 denotes the value of the i-th gene in

individual with index p in generation k, and l 1m 2 is a func-

tion which returns the index of the individual in the m-th

position, assuming that the individuals are ranked according

to their fitness value.

This algorithm uses a binary representation, in which an

individual is encoded as a bit str ing of length N,

z 5 5z1, z2, c, zN6. The value zi 5 1 means that product i

is present in the portfolio. A value zi 5 0 indicates the absence

of that product from the portfolio. For instance, the string

01101 encodes an optimal portfolio that invests only in the

assets 2, 3 and 5 in a universe composed of 5 assets. The fitness

of this individual is calculated by finding the optimal portfolio

that considers for investment only the products specified in

the chromosome.

An advantage of UMDA is that it assumes a very simple sta-

tistical model in which the sampling of individuals and the esti-

mation of distr ibution parameters can be efficiently

implemented. However, the simplicity of the model also means

that the algorithm is at a disadvantage in problems where the

variables are strongly correlated.

Population-Based Incremental Learning (PBIL, 2) [23]): The

PBIL algorithm also makes the assumption that the variables

involved in the optimization process are statistically indepen-

dent. The difference with UMDA is that at a given generation

the marginal probabilities pi 1zi 2 are computed by combining

an estimate obtained from the best individuals in the current

population, with a weight a [ 10, 1 4, and the previous esti-

mation of the probabilities with a weight 11 2 a 2 . The same

binary encoding scheme as in UMDA is used. The equation

for the distribution update is:

pik11 1zi 2 5 11 2 a 2pi

k 1zi 2 1 a1

MaM

m51

zi1kl1m22. (15)

This estimate is the same as the one used in UMDA when

a 5 1.

The main advantages of PBIL are its simplicity and its

efficiency. The parameter 0 # a # 1 is a smoothing

parameter. Small values of a reduce spurious oscillations

in the evolution of the fitness function and typically yield

better solutions. However, low values of a also entail a

slower convergence.

PBIL3) c [38]: This algorithm is a continuous version of

PBIL, in which every allele in the chromosome can take arbi-

trary real values, zi1kp2 [ R. The joint distribution of the vari-

ables is assumed to be a multidimensional Gaussian

distribution with a diagonal covariance matrix:

p 1z 2 5 qN

i51

1

"2psi

expa21zi 2 mi 2 2

2si2 b. (16)

The update rule is similar to that for the discrete version, but

includes terms that bring the population mean mk11 closer to

the best two individuals and away from the worst one:

mk11 5 11 2 a 2mk 1 a 1z1kl1122 1 z1kl1222 2 z1kl1P22 2 . (17)

The adaptation of this real-valued chromosome representation

to a discrete solution space is made using a straightforward

transformation. First, the chromosome values are sorted in non-

increasing order

zl 1121kp2 $ zl 1221kp2 $ c $ zl1N21kp2 . (18)

The binary vector z1kp2 ; 0 is initialized to zero. Then, for each

i 5 1, c, K, we set zl 1i21kp2

5 1. This vector is then evaluated in

the same way as in UMDA or PBIL.

Estimation of Multivariate Normal Algorithm (EMNA, 4) [24]):

Some EDAs allow for correlations among the variables in the

model distribution. In particular, EMNA assumes that the

joint probability distribution is a multivariate Gaussian

N 1m, S 2 , with an arbitrary covariance matrix S

p 1z 2 51

"2p|S|1/2 exp a2

1

21z 2m 2T # S21 # 1z 2m 2 b .

(19)

In each generation k 1 1, the vector of means mk11 and the

covariance matrix Sk11 are estimated by maximum likelihood:

mik11 5

1

MaM

m51

zi1kl1m22 (20)

sijk11 5

1

MaM

m51

1zi1kl1m22 2 mi

k11 2 1zj1kl1m22 2 mj

k11 2 , (21)

where only the fittest M individuals of the previous genera-

tion are used to perform the estimation. This algorithm can

be used to perform combinatorial optimization in the same

way as PBILc, by means of the same continuous-to-discrete

transformation.

D. Pruning HeuristicsPruning heuristics can be used to improve the efficiency of

the hybrid optimization algorithms that have been designed.

The goal is to identify and then remove from the universe of


assets available for investment products that are not expected

to appear in the globally optimal portfolio. In this manner

the computational effort required to find the solution is

greatly reduced. To identify the assets that can be excluded a

relaxed version of the problem is solved, in which the cardi-

nality and the lower and upper bound constraints are ignored.

Products that are not included in the optimal portfolio for

the relaxed optimization are also unlikely to be present in the

solution of the constrained problem. This pruning procedure

defines a reduced subuniverse for optimization UP [ ` 1U 2 whose size is generally larger than the cardinality constraint

K. Therefore, the pruned problem is still a complex combi-

natorial optimization problem of the same type as the origi-

nal one, but in a reduced investment universe. There may be

cases in which, as a consequence of pruning, |UP| # K. If

this is the case, the constrained optimization involves only

products in UP and the solution can be obtained directly by

optimizing in this subuniverse.

Three pruning rules are investigated: weight pruning [35], iter-

ative pruning and greedy selection. Pruning improves the efficiency

of the portfolio selection process. By contrast, pruning can

never improve the quality of the final solution because it

involves a reduction of the number of parameters in the opti-

mization. Our objective is to determine to what extent these

pruning techniques are not detrimental to the quality of the

solution obtained.

Weight pruning: 1) Let w* 5 1w1*, w2

*, c, wN* 2 denote the

vector of asset weights in the portfolio that is the solution of a

relaxed optimization problem without cardinality and lower

and upper bounds constraints. Products whose weights in this

optimal portfolio are below a given threshold, P, are removed.

For a sufficiently small value of P, these products are eliminat-

ed because they are not likely to appear in the optimal solu-

tion to the constrained problem [35]. More formally, for each

i 5 1, c, N, if wi* , Pi, then product i is removed from the

investment universe. That is, the i-th row and column are

removed from S and component i is removed from r. Iterative pruning:2) This heuristic is an iterative procedure

based on eliminating from the optimization universe a single

asset at a time. In each of a fixed number of iterations the

asset with the lowest weight in a relaxed optimal solution

including only the products present in the previous iteration

is removed. The amount of pruning is determined by the

parameter T: Only K 1 T products are considered after

pruning, where K is the upper bound for the cardinality con-

straint. This pruning procedure is costlier than weight prun-

ing. It requires N 2 1K 1 T 2 quadratic optimizations, instead

of the single optimization needed for weight pruning. The

main disadvantage is that the external parameter T needs to

be fixed by the user. Obviously, the higher the value of T, the

lower the probability of removing an important product from

the optimization, but also the higher the computational cost.

Greedy selection:3) This last heuristic considered proceeds

incrementally by incorporating in each iteration the product

that reduces the risk of the resulting optimal portfolio the

most. The starting point for the algorithm is the optimal port-

folio that invests in two assets and whose expected return is

R*. Note that this selection only requires computing the risk

of N 1N 2 1 2 /2 candidate solutions of the form

wiri 1 11 2 wi 2 rj 5 R*, 0 # wi # 1, i, j 5 1, 2, c, N, j , i.

(22)

Then, the optimal portfolio with the two assets of the previous

iteration and one additional asset is calculated. This is repeated

until a portfolio that invests in K 1 T assets is obtained. The

parameter T, which determines the amount of pruning is spec-

ified by the user. This algorithm is more expensive than both

weight pruning and iterative pruning because it requires solv-

ing N AN 2 1B/ 2 1aK1T11

i52N AN 2 iB auxiliary quadratic

optimization problems.

IV. ExperimentsIn this section we analyze the performance of the different

portfolio selection methods in terms of optimality of the

selected portfolio and computational cost. Experiments are

performed on data from the OR-Library [25], which includes

a variety of benchmark problems in the field of Operations

Research. For portfolio selection, the data consist in expected

returns and covariances of returns for assets included in five

major world stock indexes. Namely, Hang Seng (Hong Kong,

N 5 31), DAX (Germany, N 5 85), FTSE (United Kingdom,

N 5 89), Standard & Poor’s (United States, N 5 98) and Nik-

kei (Japan, N 5 225). The weekly returns are estimated from

the series of stock prices from March 1992 to September

1997. Stocks with missing values are not considered. This is

the reason why, for instance, 89 products are considered for

the FTSE index whereas this index is actually composed of

100 assets. The optimizations are performed considering a

lower bound for the investment weights (li 5 0.014i) and

with a maximum cardinality constraint of K 5 10. For each

of the hybrid optimization methods the values of the hyper-

parameters, including the maximum number of generations

needed for convergence in the evolutionary algorithms, are

determined in exploratory experiments.

For the unconstrained problem, an efficient frontier of

NP 5 100 points is calculated. These points correspond to

optimal portfolios without cardinality or lower bound con-

straints, whose expected returns are evenly spaced between

the largest and the smallest values that can be achieved by

portfolios on the efficient frontier. For each of the values of

the expected return considered, the minimum risk portfolios

for various cardinality constraints (K 5 2, 3, c, 10) are

also computed. To reduce the probability of getting trapped

in a subpotimal solution, each point is computed by select-

ing the best of 5 executions of the hybrid optimization

algorithm analyzed.

When cardinality constraints are considered, the optimiza-

tion problem is no longer convex. As a consequence, the solu-

tions identified by minimizing risk for a fixed value of the


expected return need no longer be Pareto optimal. That is, it

may be possible to build portfolios with the same expected

risk and a higher expected return. This could introduce dis-

continuities in the shape of the efficient frontier [39]. The

anomalies are specially pronounced for small values of K, the

upper bound in the cardinality constraint. To see how this

phenomenon can occur, consider a problem with K 5 2. The

efficient frontier can be obtained by performing the union

among the efficient frontiers corresponding to all possible

pairs of assets. The union operation is not meant in the set-

theoretical sense, but in the portfolio-dominance sense: the

set of all non-dominated portfolios forms the final efficient

frontier. This operation removes any non-convex regions from

the efficient frontier (i.e. all dominated portfolios) and, as a

consequence, discontinuities can appear. These regions corre-

spond to portfolios that no rational investor would choose,

because there is at least one portfolio with a higher value of

expected return for the same risk. Nevertheless, for the pur-

pose of comparing the performance of the different algo-

rithms it is useful to consider all the portfolios identified by

the optimization algorithms, even if they do not belong to

the efficient frontier: For a given value of the expected return,

portfolios with a more stringent constraint (that is, with a

smaller value of K ) necessarily have a higher risk, indepen-

dently of whether they are Pareto optimal or not. In particu-

lar, better solutions of the constrained problem, will be closer

to the unconstrained efficient frontier, even they are not part

of the efficient frontier of the constrained problem.

Figures 4, 5 and 6 show a detailed comparison between

the minimum risk portfolios obtained with different values

for the cardinality constraint K 5 2, c, 10, and the uncon-

strained efficient frontier in the Nikkei index problem using a

RAR-GA algorithm with parameter w 5 1 for the RAR

crossover operator. In general, the higher the value of K the

closer the solutions of the cardinality-constrained and the

unconstrained problems are.

For the GA, a steady state substitution scheme is used in each

generation. No elitism is used, since this generally leads to pre-

mature convergence to local optima. The probabilities of cross-

over and mutation are pC 5 1.0 and pM 5 0.01, respectively.

The populations evolved are composed of 100 individuals.

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

0.00

02

0.00

04

0.00

06

0.00

08

0.00

1

0.00

12

0.00

14

Retu

rn

Risk

UnconstrainedK = 2K = 3K = 4

K = 5K = 6K = 7K = 8

K = 9K = 10

FIGURE 4 Comparison of constrained and unconstrained minimum risk portfolios for different values of the expected return in the Nikkei index problem.

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

0.00

035

0.00

04

0.00

045

0.00

05

0.00

055

0.00

06

0.00

065

0.00

07

0.00

075

Retu

rn

Risk


K = 5K = 6K = 7K = 8

K = 9K = 10

FIGURE 5 Detailed comparison of minimum risk portfolios for differ-ent values of the expected return in the Nikkei index problem.

FIGURE 6 A more detailed comparison of minimum risk portfolios for different values of the expected return in the Nikkei index problem.

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

0.0014

0.0016

0.0018

0.002

0.0022

0.0

0031

0.0

0032

0.0

0033

0.0

0034

0.0

0035

0.0

0036

0.0

0037

0.0

0038

0.0

0039

Retu

rn

Risk


K = 5K = 6K = 7K = 8

K = 9K = 10


For SA, a geometrical cooling scheme with coefficient

a 5 0.9 is used. The initial temperature is chosen following the

recommendations in [3]: The initial temperature is set to a

value such that x0, the probability of acceptance of new con-

figurations, is relatively high. In our experiments the value

x0 5 0.8 is chosen. The average increase D in the objective

function value is calculated for L 5 300 iterations and the ini-

tial temperature T0 is then estimated by

T0 52 D

lnx0

. (23)

For the EDA family, population sizes are generally higher

than in the case of the GA. This is because larger samples are

needed for an accurate estimation of the distribution parameters

in EDA. Thus, the population size is set to 300 individuals. The

fraction of individuals considered to estimate the parameters is

10% of the population in all cases except for EMNA, where

larger samples (15% of the original population) are needed for

the accurate estimation of the distribution parameters.

The quality of the solutions is measured in terms of the rel-

ative distances between the unconstrained solutions (without

cardinality or bound constraints) and the corresponding con-

strained solution with the same expected return

D 51

NPaNP

i51

sc, i 2 si*

si* , (24)

where sc,i is the risk of the best known solution in the con-

strained problem for the i-th frontier point, and si* is the optimal

risk for the unconstrained problem, obtained by quadratic opti-

mization. All the minimum risk portfolios are considered, inde-

pendently of whether they belong to the efficient frontier or not.

In this manner, the results for different algorithms and cardinality

constraints can be meaningfully compared. Note that every term

in D is greater than or equal to zero because the unconstrained

problem is a relaxation of the constrained one.

Tables 1–3 report the best values of D obtained by the dif-

ferent heuristics in the columns labeled Best D. The success rate,

which measures the percentage of trials in which the best

known solution is reached, is also displayed in the columns.

These tables also show the time (in seconds) required to calcu-

late the whole set of solutions and the number of quadratic

optimizations performed in the column whose header is Opti-

mizations. The experiments are carried out on a AMD Turion

64 768 MHz with 1 GB of RAM. The best known result in

each case is highlighted in boldface in every table.

A. Results Without PruningTable 1 summarizes the results of the RAR-GA approach with

w 5 1, SA and the different variations of EDA. A larger value

of the weight parameter w in the RAR operator would

increase the amount of common information from the parents

that is inherited by the offspring. Values w . 1 generally lead to

TABLE 1 Results for the RAR-GA, SA and EDA approaches.

ALGORITHM INDEX BEST D SUCCESS RATE TIME (S) OPTIMIZATIONS

RAR-GA (w 5 1)

HANG SENG 0.00321150 1.00 1200.8 2.77 # 107

DAX 2.53162860 1.00 3178.5 6.14 # 107

FTSE 1.92150019 0.95 6384.6 12.02 # 107

S&P 4.69373181 0.99 6575.6 12.34 # 107

NIKKEI 0.20197748 1.00 9893.3 14.17 # 107

SA

HANG SENG 0.00321150 1.00 1499.9 3.87 # 107

DAX 2.53162860 0.98 2877.3 7.63 # 107

FTSE 1.92205745 0.92 3610.4 8.87 # 107

S&P 4.69373181 0.91 3567.8 9.54 # 107

NIKKEI 0.20197748 0.95 4274.5 9.25 # 107

UMDA (EDA)

HANG SENG 0.00321150 1.00 1021.6 2.83 # 107

DAX 5.05951251 0.43 2450.2 4.69 # 107

FTSE 3.55519141 0.33 2483.5 4.90 # 107

S&P 8.16812536 0.35 2701.3 5.17 # 107

NIKKEI 6.97027831 0.25 2530.4 3.70 # 107

PBIL (EDA)

HANG SENG 0.00321150 1.00 2292.8 5.55 # 107

DAX 2.53162860 0.94 4489.1 7.70 # 107

FTSE 1.92208910 0.85 4782.3 8.06 # 107

S&P 4.69570006 0.88 5100.2 8.28 # 107

NIKKEI 0.30164777 0.43 7486.5 8.21 # 107

PBILC (EDA)

HANG SENG 0.00321150 1.00 3095.7 5.58 # 107

DAX 2.53162860 0.80 8122.6 7.71 # 107

FTSE 1.92670560 0.65 8735.1 8.02 # 107

S&P 4.70389481 0.67 9659.7 8.27 # 107

NIKKEI 0.36541190 0.32 17569.5 7.39 # 107

EMNA (EDA)

HANG SENG 0.00321150 1.00 9892.6 16.72 # 107

DAX 2.56612811 0.48 38667.6 22.37 # 107

FTSE 2.00176394 0.41 42505.9 23.15 # 107

S&P 4.91456924 0.35 49267.4 23.57 # 107

NIKKEI 743.55201383 0.26 68726.9 9.08 # 107


premature convergence due to insufficient genetic variability in

the population [7].

The results show that both RAR-GA and SA outperform

the EDA approaches, which obtain good results only in the

simpler Hang Seng and DAX problems. In the other problems

the results obtained by EDA are rather poor. The reason is

that EDA approaches are more sensitive to the dimensionality

of the problem than GAs and SA. The estimation of the prob-

ability distribution becomes more difficult as the dimension

of the search space increases. Among the different variations

of EDAs the best results are obtained with PBIL. GA and

SA scale better with the dimensionality of the problem than

EDAs, and therefore find better solutions in larger problems.

In a difficult problem like the FTSE, RAR-GA outper-

forms SA. This improvement can be explained by the effec-

tiveness of the RAR crossover operator in identifying small

well-diversified groups of assets that tend to be selected in

near-optimal portfolios.

B. Results with PruningA way to improve the efficiency of the algorithms proposed is

to reduce the dimensionality of the optimization space before

the search is performed. The universe of products is prepro-

cessed so that assets with a low probability of appearing in the

optimal solution are removed. The reduced problem can be

solved using either exhaustive search or the metaheuristics

described in the previous section.

The performance of the proposed heuristics strongly

depends on the values of the parameters: For the weight

pruning heuristic, the threshold P; for the iterative and greedy

selection heuristics the amount of additional products T. The

parameter P for the weight pruning heuristic is set to li/2. The

reason for this choice is that products whose weights in an

unconstrained optimization are under their lower bound li

have a lower probability of being included in the final solu-

tion. The parameter T is chosen after performing exploratory

tests with several values of T in the range T [ 30, 12 4. After

pruning, an exhaustive search is performed in the pruned uni-

verse. The results for these experiments are shown in Figure 7.

In general, for low values of T (T # 3) the performance of

iterative pruning is better than greedy selection. Above this

value, the results for the Hang Seng (7(a)) and Nikkei (7(e))

index problems do not show significant differences. For the

other problems, the quality of the results obtained by both

heuristics is similar for values above T 5 8. For the DAX

(7(b)) and the FTSE (7(c)) problems greedy selection is slight-

ly better for the range 3 , T , 8, while the opposite is true

for the S&P problem (7(d)). Therefore the value T 5 8 is cho-

sen for the final experiments.

Tables 2, 3, and 4 display the results of the experiments

using exhaustive search on the pruned optimization problem.

In these experiments, pruning is first applied to eliminate assets

that are not likely to appear in the globally optimal solution.

Then, the exact optimal solution of the reduced problem is

obtained by exhaustive search. This establishes a benchmark

with which to compare the effectiveness of the proposed meta-

heuristics in the reduced problem.

The results of using metaheuristic optimization instead of

exhaustive search in the reduced optimization universe using

the weight pruning heuristic, iterative pruning, and greedy

selection are displayed in Tables 5, 6, and 7, respectively. These

tables include a column labeled Speed-up factor that displays

the improvements in the times of computation achieved. Speed-

up factors are calculated as the ratio of execution times without

and with pruning.

The best overall results are obtained with weight pruning.

Weight pruning significantly reduces the computational cost

of all metaheuristic techniques. The improvements in efficien-

cy are fairly large, especially in the case of the Hang Seng

index. In this case, many portfolios that are optimal solutions

of the unconstrained problem also satisfy the cardinality con-

straint, and no combinatorial search is needed. The speed-up

factors range from < 9 (UMDA, Nikkei) to < 650 (SA,

Hang Seng). Regarding the quality of the solutions, in the

case of the RAR-GA approach, the best solution is also

reached (i.e., no product appearing in a best known solution

is pruned), except in the case of the FTSE index, where a sin-

gle product that is included in the best known solution is

actually pruned. Nonetheless, in this case, the difference

between the best known result and the result obtained after

pruning is almost negligible. The success rates remain of the

same order in all cases.

The improvements in performance are particularly signifi-

cant in the EDA-based approaches. When the investment

universe is pruned, PBIL achieves the same quality as RAR-

GA, although the execution times are higher. Even the

TABLE 3 Results for iterative pruning and exhaustive search.

INDEX D TIME (S) OPTIMIZATIONSHANG SENG 0.00321150 1.05 2.29 # 104

DAX 2.53162860 270.05 5.49 # 106

FTSE 1.92152413 265.8 5.56 # 106

S&P 4.71092502 300.72 6.35 # 106

NIKKEI 0.20197748 115.5 4.27 # 105

TABLE 2 Results for weight pruning and exhaustive search.


DAX 2.53162860 3643.6 6.69 # 107

FTSE 1.92152413 16136.5 2.90 # 108

S&P 4.69373181 79746.2 1.24 # 109

NIKKEI 0.20197748 47.8 4.17 # 105

TABLE 4 Results for greedy selection and exhaustive search.


DAX 2.53162860 211.55 4.62 # 106

FTSE 1.92152413 163.4 3.67 # 106

S&P 4.72021851 175.1 4.01 # 106

NIKKEI 0.20197748 102.4 3.28 # 106


worst-performing algorithm of the EDA family, EMNA,

identifies fairly good solutions with a reasonable computa-

tional cost when pruning is used: For the Nikkei index, the

best known solution is reached with a 99% success rate and a

lower execution time. Pruning also significantly improves the

quality of the solutions obtained with UMDA.

Iterative pruning eliminates products which should be pres-

ent in the optimal solution, as shown in Table 3 for the S&P

problem. This entails a deterioration in the quality of the final

solution that can be reached using metaheuristic approaches, as

shown in Table 6. As in the previous case RAR-GA achieves

the best possible results. The execution times are lower than

those achieved with the weight pruning heuristic, because on

average, weight pruning retains more products that iterative

pruning for some problems. The quality of the solutions

obtained using the EDA approaches is generally improved,

except for the S&P problem in cases where the best known

solution is obtained without pruning.

Greedy selection obtains worse results than iterative and

weight pruning, as shown in Table 4. The execution times

shown in Table 3 are typically higher than weight pruning and

of the same order as iterative pruning. The number of quadratic

optimizations is always higher than in the case of the other

pruning heuristics. In general, the same observations as in the

FIGURE 7 Results of experiments using exhaustive search and iterative and greedy selection. Above the chosen value T = 8 the results for both heuristics are very similar. (a) Hang Seng, (b) DAX, (c) FTSE, (d) S&P, and (e) Nikkei.

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0 2 4 6 8 10 12

Ave

rag

e R

ela

tive

Dis

tan

ce

(D

)

Pruning Parameter (T)

(e)

(d)

1.5

2

2.5

3

3.5

4

0 2 4 6 8 10 12

Ave

rag

e R

ela

tive

Dis

tan

ce

(D

)

Pruning Parameter (T)(c)

0

0.005

0.01

0.015

0.02

0.025

0.03

0 2 4 6 8 10 12

Ave

rag

e R

ela

tive

Dis

tan

ce

(D

)


(a)

Iterative PruningGreedy Pruning





4

5

6

7

8

9

10

0 2 4 6 8 10 12

Ave

rag

e R

ela

tive

Dis

tan

ce

(D

)


2.5

3

3.5

4

4.5

5

5.5

0 2 4 6 8 10 12

Ave

rag

e R

ela

tive

Dis

tan

ce

(D

)


(b)


TABLE 5 Results for the RAR-GA, SA and EDA approaches with weight pruning.

ALGORITHM INDEX BEST D SUCCESS RATE TIME (S) SPEED-UP OPTIMIZATIONS

RAR-GA (W = 1)

HANG SENG 0.00321150 1.00 3.81 315.17 4.92 # 104

DAX 2.53162860 1.00 142.5 22.31 1.96 # 106

FTSE 1.92152413 0.94 211.5 30.19 2.88 # 106

S&P 4.69373181 0.99 205.8 31.95 2.76 # 106

NIKKEI 0.20197748 1.00 241.5 40.97 2.75 # 106

SA

HANG SENG 0.00321150 1.00 2.31 649.31 3.85 # 104

DAX 2.53162860 0.96 17.3 166.32 3.17 # 105

FTSE 1.92270249 0.88 22.2 162.63 4.33 # 105

S&P 4.69759052 0.85 21.9 162.91 4.03 # 105

NIKKEI 0.20197748 0.99 47.0 90.95 2.61 # 105

UMDA (EDA)

HANG SENG 0.00321150 1.00 24.0 42.57 5.19 # 105

DAX 2.53162860 0.97 211.4 11.59 4.02 # 106

FTSE 1.92639746 0.84 243.9 10.18 4.47 # 106

S&P 4.71355872 0.77 237.9 11.35 4.28 # 106

NIKKEI 0.20197748 0.99 274.9 9.20 3.43 # 106

PBIL (EDA)

HANG SENG 0.00321150 1.00 22.9 100.12 4.92 # 105

DAX 2.53162860 0.99 203.0 22.11 3.81 # 106

FTSE 1.92152413 0.91 237.4 20.14 4.24 # 106

S&P 4.69373181 0.93 226.8 22.49 4.06 # 106

NIKKEI 0.20197748 1.00 263.1 28.45 3.26 # 106

PBILC

(EDA)

HANG SENG 0.00321150 1.00 19.1 162.08 3.72 # 105

DAX 2.53162860 0.99 164.9 49.26 2.88 # 106

FTSE 1.92396198 0.87 190.8 45.78 3.21 # 106

S&P 4.69373181 0.87 196.2 49.23 3.07 # 106

NIKKEI 0.20197748 1.00 183.7 95.64 2.46 # 106

EMNA (EDA)

HANG SENG 0.00321150 1.00 28.2 350.80 6.20 # 105

DAX 2.53517771 0.71 259.1 149.24 4.81 # 106

FTSE 1.94102813 0.60 302.6 140.47 5.35 # 106

S&P 4.74273727 0.55 311.8 158.01 5.12 # 106

NIKKEI 0.20197748 0.99 232.4 295.73 4.11 # 106

TABLE 6 Results for the RAR-GA, SA and EDA approaches with iterative pruning.

ALGORITHM INDEX BEST D SUCCESS RATE TIME (S) SPEED-UP OPTIMIZATIONSHANG SENG 0.00321150 1.00 3.9 307.90 4.92 # 104

DAX 2.53162860 1.00 54.5 58.32 6.97 # 105

RAR-GA FTSE 1.92152413 0.94 61.3 104.15 7.76 # 105

(W = 1) S&P 4.71092502 0.97 61.1 107.62 7.42 # 105

NIKKEI 0.20197748 1.00 154.1 64.20 6.03 # 105

HANG SENG 0.00321150 1.00 2.3 652.13 3.74 # 104

DAX 2.53162860 0.97 19.4 148.31 3.18 # 105

SA FTSE 1.92152413 0.88 24.7 146.17 4.21 # 105

S&P 4.71092502 0.93 25.6 139.37 4.05 # 105

NIKKEI 0.20197748 0.99 111.1 38.47 2.71 # 105

HANG SENG 0.00321150 1.00 24.2 42.21 5.19 # 105

DAX 2.53162860 0.98 212.0 11.56 4.02 # 106

UMDA FTSE 1.92352385 0.90 243.6 10.19 4.47 # 106

(EDA) S&P 4.71154380 0.95 239.7 11.27 4.28 # 106

NIKKEI 0.20197748 1.00 339.0 7.46 3.44 # 106

HANG SENG 0.00321150 1.00 22.9 100.12 4.92 # 105

DAX 2.53162860 1.00 203.5 22.06 3.81 # 106

PBIL FTSE 1.92152413 0.93 232.7 20.55 4.25 # 106

(EDA) S&P 4.71092502 0.98 225.8 22.59 4.06 # 106

NIKKEI 0.20197748 1.00 325.9 22.97 3.27 # 106

HANG SENG 0.00321150 1.00 19.1 162.08 3.73 # 105

DAX 2.53162860 1.00 160.4 50.64 2.88 # 106

PBIL C FTSE 1.92410411 0.89 181.3 48.18 3.21 # 106

(EDA) S&P 4.71092501 0.96 176.3 54.79 3.07 # 106

NIKKEI 0.20197748 1.00 235.8 74.51 2.47 # 106

HANG SENG 0.00321150 1.00 28.2 350.80 6.20 # 105

DAX 2.53276344 0.85 243.4 158.86 4.81 # 106

EMNA FTSE 1.92598749 0.85 273.5 155.41 5.35 # 106

(EDA) S&P 4.71092502 0.87 266.4 184.94 5.12 # 106

NIKKEI 0.20197748 0.98 294.8 233.13 4.12 # 106


previous case can be made. RAR-GA achieves the best possible

performance. Greedy selection also improves the performance

of algorithms in the EDA family. Nonetheless, the quality of

the solutions obtained is generally lower than in the case of

RAR-GA or SA.

Exhaustive search in the pruned space is in some cases

more efficient than the metaheuristic techniques. In particu-

lar, the results displayed in Tables 2–4 show that for the Hang

Seng and the Nikkei problems exhaustive search in the

pruned spaces has lower execution times than the GA, SA or

the variants of EDA. In these problems, pruning is very effec-

tive in reducing the dimensionality of the investment universe,

which renders exhaustive search feasible. By contrast, the sto-

chastic nature of the metaheuristic methods implies some

inefficiencies in the exploration, which are more noticeable in

smaller search spaces, such as the reduced investment universe

obtained after pruning.

V. ConclusionsIn this investigation several methods for efficiently solving

the portfolio selection problem with cardinality constraints

are presented. The algorithms investigated include simulated

annealing (SA) and several members of the estimation of

distribution algorithms (EDA) family. These are compared

with a genetic algor ithm (RAR-GA) that uses a set

representation and specially designed mutation and crossover

operators that preserve the cardinality of the solution candi-

dates. Additionally, preprocessing techniques are introduced

that significantly improve the quality and the efficiency of

the proposed algorithms. They proceed by eliminating prod-

ucts for which the probability of being included in the final

solution is small.

The method that uses SA to solve the combinatorial part

of the optimization problem identifies portfolios that are as

good as those obtained by RAR-GA at a similar or lower

computational cost. In contrast, EDA algorithms on their own

are not competitive and perform poorly. The algorithms of

the EDA family have difficulties with high-dimensional prob-

lems: When the number of products in the universe of the

optimization increases, sampling and estimation of the proba-

bility distributions that are evolved becomes increasingly dif-

ficult. A possible solution to this shortcoming is to identify

the assets that are not likely to be included in the optimal

portfolio and remove them from the investment universe.

These assets are identified by solving a relaxed optimization

problem. The efficiency of the search performed by the meta-

heuristic methods in the reduced space is significantly

improved. In particular, the EDA algorithms (especially PBIL)

become competitive with SA and RAR-GA. Pruning can in

fact be so effective that, in some cases (e.g. for the Hang Seng

TABLE 7 Results for the RAR-GA, SA and EDA approaches with greedy selection

ALGORITHM INDEX BEST D SUCCESS RATE TIME (S) SPEED-UP FACTOR OPTIMIZATIONSHANG SENG 0.00321150 1.00 8.1 148.25 1.30 # 105

DAX 2.53162860 1.00 67.4 47.16 1.17 # 106

RAR-GA FTSE 1.92152412 0.95 75.2 84.90 1.29 # 106

(W = 1) S&P 4.72021851 1.00 75.7 86.86 1.35 # 106

NIKKEI 0.20197748 1.00 144.3 68.56 3.46 # 106

HANG SENG 0.00321150 1.00 6.5 230.75 1.18 # 105

DAX 2.53162860 0.98 32.4 88.81 7.84 # 105

SA FTSE 1.92152413 0.89 39.0 95.57 9.35 # 105

S&P 4.72060631 0.91 42.0 84.95 1.04 # 106

NIKKEI 0.20197748 0.98 102.5 41.70 3.13 # 106

HANG SENG 0.00321150 1.00 28.2 32.23 6.00 # 105

DAX 2.53162860 0.97 225.6 10.86 4.49 # 106

UMDA FTSE 1.92336941 0.91 257.9 9.63 4.99 # 106

(EDA) S&P 4.72257456 0.96 252.6 10.69 4.89 # 103

NIKKEI 0.20197748 1.00 329.4 7.68 6.30 # 106

HANG SENG 0.00321150 1.00 27.2 84.29 5.73 # 105

DAX 2.53162860 1.00 216.3 20.75 4.29 # 106

PBIL FTSE 1.92234549 0.95 250.9 19.06 4.76 # 106

(EDA) S&P 4.72021851 0.99 241.5 21.12 4.68 # 106

NIKKEI 0.20197748 1.00 317.6 23.57 6.12 # 106

HANG SENG 0.00321150 1.00 23.2 133.44 4.53 # 105

DAX 2.53162860 1.00 171.9 47.25 3.36 # 106

PBILC FTSE 1.92340339 0.89 193.5 45.14 3.72 # 106

(EDA) S&P 4.72021851 0.96 188.3 51.30 3.69 # 106

NIKKEI 0.20197748 1.00 224.4 78.29 5.33 # 106

HANG SENG 0.00321150 1.00 32.4 305.32 7.01 # 105

DAX 2.53162860 0.88 254.7 151.82 5.28 # 106

EMNA FTSE 1.92289840 0.88 248.2 171.26 5.87 # 106

(EDA) S&P 4.72021851 0.88 276.4 178.25 5.73 # 106

NIKKEI 0.20197748 0.98 286.2 240.13 6.97 # 106


and the Nikkei problems), the exact solution of the reduced

optimization problem by exhaustive search in the pruned

space can be found at a lower computational cost than using

GA, SA or EDAs.

In summary, the conclusions of this research are:

Hybrid methods based on a set-encoding for the candidate ❏

solutions (RAR-GA and SA) are effective and efficient

methods for solving the portfolio selection problem with

cardinality constraints. The performance of these algo-

rithms does not deteriorate with the size of the problems

investigated. Both methods have been used to identify

near-optimal portfolios even in problems with hundreds

of assets.

EDA approaches, which are based on estimating the distri- ❏

bution of a population of individuals, perform poorly when

the universe of assets available for investment is very large,

due to the curse of dimensionality: As the number of variables

in the problem increases, the estimation and sampling of the

probability distributions is not accurate enough to provide

reliable guidance to the search.

The efficiency and accuracy of the different hybrid ❏

approaches to portfolio selection with cardinality constraints

considered can be dramatically improved when pruning

techniques are used to reduce the number of variables in

the problem. These pruning heuristics work by eliminating

products that have zero or low weights in optimal solutions

to relaxed versions of the problem that can be solved

numerically in an exact manner.

AcknowledgmentThis research was supported by Ministerio de Educación y

Ciencia (Spain) under project TIN2007-66862-C02-02. Rubén

Ruiz-Torrubiano acknowledges financial support from the

Universidad Autónoma de Madrid under an FPU grant.

References[1] T. J. Chang, N. Meade, J. E. Beasley, and Y. M. Sharaiha, “Heuristics for cardinal-ity constrained portfolio optimisation,” Comput. Oper. Res., vol. 27, pp. 1271–1302, 2000.

[2] D. Bienstock, “Computational study of a family of mixed-integer quadratic program-ming problems,” in Integer Programming and Combinatorial Optimization: 4th International IPCO Conference (Lecture Notes in Computer Science 920), E. Balas and J. Clausen, Eds. Berlin, Germany: Springer-Verlag, 1995.

[3] Y. Crama and M. Schyns, “Simulated annealing for complex portfolio selection prob-lems,” Groupe d’Etude des Mathematiques du Management et de l’Economie 9911, Universie de Liege., Tech. Rep., 1999.

[4] M. Gilli and E. Këllezi, “A global optimization heuristic for portfolio choice with VaR and expected shortfall,” in Computational Methods in Decision-Making, Economics and Finance. Norwell, MA: Kluwer, 2001, pp. 167–183.

[5] M. Lobo, M. Fazel, and S. Boyd, “Portfolio optimization with linear and fixed trans-action costs,” Ann. Oper. Res. (Special Issue on Financial Optimization), vol. 152, no. 1, pp. 376–394, July 2007.

[6] A. Schaerf, “Local search techniques for constrained portfolio selection problems,” Comput. Econ., vol. 20, pp. 177–190, 2002.

[7] R. Moral-Escudero, R. Ruiz-Torrubiano, and A. Suárez, “Selection of optimal in-vestment portfolios with cardinality constraints,” in Proc. IEEE World Congr. Evolutionary Computation, 2006, pp. 2382–2388.

[8] H. Markowitz, “Portfolio selection,” J. Finance, vol. 7, pp. 77–91, 1952.

[9] P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright, “Inertia-controlling meth-ods for general quadratic programming,” SIAM Rev., vol. 33, pp. 1–36, 1991.

[10] Y. Tabata and E. Takeda, “Bicriteria optimization problem of designing an index fund,” J. Oper. Res. Soc., vol. 46, no. 8, pp. 1023–1032, 1995.

[11] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. New York: Springer-Verlag, 1996.

[12] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, 1989.

[13] S. Kirkpatrick, C. D. Gelatt, and J. M. P. Vecchi, “Optimization by simulated anneal-ing,” Science, vol. 4598, pp. 671–679, 1983.

[14] F. Glover, “Future paths for integer programming and links to artif icial intelligence,” Comput. Oper. Res., vol. 13, pp. 533–549, 1986.

[15] C. C. Coello, “Evolutionary multi-objective optimization: A historical view of the f ield,” IEEE Comput. Intell. Mag., vol. 1, no. 1, pp. 28–36, Aug. 2006.

[16] G. Yen, “Evolutionary multiobjective optimization,” IEEE Comput. Intell. Mag., vol. 4, no. 3, p. 2, Aug. 2009.

[17] S. Jeong, S. Hasegawa, K. Shimoyama, and S. Obayashi, “Development and investi-gation of eff icient GA/PSO-hybrid algorithm applicable to real world design optimiza-tion,” IEEE Comput. Intell. Mag., vol. 4, no. 3, pp. 36–44, Aug. 2009.

[18] P. Moscato, “Memetic algorithms: A short introduction,” in New Ideas in Optimiza-tion, D. Corne, Ed. New York: McGraw-Hill, 1999, pp. 219–234.

[19] B. Sendhoff, “Emergent technology technical committee,” IEEE Comput. Intell. Mag., vol. 1, no. 4, pp. 5–8, Nov. 2007.

[20] N. J. Radcliffe, “Genetic set recombination,” in Foundations of Genetic Algorithms. San Mateo, CA: Morgan Kaufmann, 1993.

[21] P. Larrañaga and J. A. Lozano, Eds., Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Norwell, MA: Kluwer, 2002.

[22] H. Muehlenbein, “The equation for response to selection and its use for prediction,” Evol. Comput., vol. 5, pp. 303–346, 1998.

[23] S. Baluja, “Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning,” Carnegie Mellon Univ., Tech. Rep. CMU-CS-94-163, 1994.

[24] P. Larrañaga, J. A. Lozano, and E. Bengoetxea, “Estimation of distribution algo-rithms based on multivariate Normal and Gaussian networks,” Dept. Comput. Sci. and Artif. Intell., Univ. Basque Country, Tech. Rep., 2001.

[25] J. E. Beasley, “OR-library: Distributing test problems by electronic mail,” J. Oper. Res. Soc., vol. 41, no. 11, pp. 1069–1072, 1990.

[26] T. Leonard and J. Hsu, “Bayesian inference for a covariance matrix,” Ann. Statist., no. 4, pp. 1669–1696, 1992.

[27] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covari-ance matrices,” J. Multivariate Anal., vol. 88, no. 2, pp. 365–411, 2004.

[28] R. Fletcher, Practical Methods of Optimization. New York: Wiley, 2000.

[29] K. Huang and C. Jane, “A hybrid model for stock market forecasting and portfo-lio selection based on ARX, Grey system and RS theories,” Expert Syst. Applicat., pp. 5387–5392, 2009.

[30] F. Streichert, H. Ulmer, and A. Zell, “Evaluating a hybrid encoding and three cross-over operators on the constrained portfolio selection problem,” in Proc. Congr. Evolutionary Computation (CEC2004), 2004, vol. 1, pp. 932–939.

[31] F. Streichert and M. Tamaka-Tamawaki, “The effect of local search on the con-strained portfolio selection problem,” in Proc. IEEE World Congr. Evolutionary Computation (CEC2006), Vancouver, Canada, July 16–21, 2006, pp. 2368–2374.

[32] J. Shapcott, “Index tracking: Genetic algorithms for investment portfolio selection,” Parallel Comput. Centre, Edinburgh, Tech. Rep. EPCC-SS92-24, 1992.

[33] R. Jeurissen and J. van den Berg, “Index tracking using a hybrid genetic algorithm,” in Proc. ICSC Congr. Computational Intelligence Methods and Applications 2005.

[34] R. Jeurissen and J. van den Berg, “Optimized index tracking using a hybrid genet-ic algorithm,” in Proc. IEEE World Congr. Evolutionary Computation (CEC2008), 2008, pp. 2327–2334.

[35] R. Ruiz-Torrubiano and A. Suárez, “Use of heuristic rules in evolutionary methods for the selection of optimal investment portfolios,” in Proc. IEEE World Congr. Evolutionary Computation CEC 2007, Singapore, Sept. 25–28, 2007.

[36] T. H. Cormen, C. H. Leiserson, and R. L. Rivest, Introduction to Algorithms. Cam-bridge, MA: MIT Press, 1990.

[37] S. Baluja and R. Caruana, “Removing the genetics from the standard genetic algo-rithm,” in Proceedings of the International Conference on Machine Learning, A. Prieditis and S. Russell, Eds. San Mateo, CA: Morgan-Kaufmann, 1995, pp. 38–46.

[38] M. Sebag and A. Ducoulombier, “Extending population-based incremental learn-ing to continuous search spaces,” in Lecture Notes in Computer Science, 1998, vol. 1498, pp. 418–427.

[39] N. J. Jobst, M. D. Horniman, C. A. Lucas, and G. Mitra, “Computational aspects of alternative portfolio selection models in the presence of discrete asset choice constraints,” Quant. Finance, vol. 1, pp. 1–13, 2001.

Rubén Ruiz-Torrubiano and Alberto Suárezstatic.tongtianta.site/paper_pdf/b064862a-e1e5-11e... ·...

Documents

Transcript of Rubén Ruiz-Torrubiano and Alberto Suárezstatic.tongtianta.site/paper_pdf/b064862a-e1e5-11e... ·...