Mark Roberts October 16, 2006 - Colorado State Universitymcrob/publications/RobertsResearch... ·...

Harnessing Algorithm BiasA Study of Selection Strategies and Evaluation for Portfolios of Algorithms

Mark Roberts

October 16, 2006

Search algorithms are biased because many factors related to the algorithm design – such as rep-

resentation, decision points, search control, memory usage, heuristic guidance, and stopping criteria

– or the problem instance characteristics impact how a search algorithm performs. The variety of al-

gorithms and their parameterized variants make it difficult to select the most efficient algorithm for

a given problem instance. It seems natural to apply learning to the algorithm selection problem of

allocating computational resources among a portfolio of algorithms that may have complementing (or

competing) search technologies. Such selection is called the portfolio strategy.

This research exam studies the state-of-the-art in portfolio strategy by examining five recent papers,

listed below, from the AI community. The specific focus of the exam will identify the key issues in the

mechanism and evaluation of the portfolio strategy. But the discussion of these papers will also

include a summary of the author’s primary findings, a more specific context within its community, the

implications to the specific community where the paper is published, and the implications to the larger

AI community.

This set of papers is representative, but not exhaustive, of recent work in portfolios. All selected

portfolios model the runtime behavior of various algorithms for combinatorial problems. The papers also

represent the common learning methods applied in learning a portfolio strategy: case-based, Bayesian,

statistical regression, and induction.

• Gebruers C., Hnich B., Bridge D., and Freuder E, 2005. Using CBR to Select Solution Strategies

in Constraint Programming. In Proc. of the Sixth International Conference on Case Based

Reasoning (ICCBR-05), Chicago, IL., August 23-26, 2005, Lecture Notes in Computer Science

3620, Springer, 2005. pp 222-236.

• Carchrae, T. Beck, J.C., 2005 Applying Machine Learning to Low Knowledge Control of Opti-

mization Algorithms. In Computational Intelligence, 21(4), pp. 372-387.

• Guo, H., Hsu, W., 2004. A Learning-based Algorithm Selection Meta-reasoner for the Real-time

MPE Problem. In Proc. of the 17th Australian Joint Conference on Artificial Intelligence (AUS-

AI 2004), Cairns, Australia, December 4-6, 2004. Lecture Notes in Computer Science 3339,

Springer 2004. pp. 307-318.

• Vrakas, D., Tsoumakas, G., Bassiliades, N., and Vlahavas, I., 2003. Learning Rules for Adaptive

Planning. In Proc. of the 13th International Conference Automated Planning and Scheduling,

Trento, Italy, June 9-13, 2003. pp. 82-91.

• Horvitz, E., Ruan, Y. , Gomes, C., Kautz, H., Selman, B., and Chickering, M., 2001. A Bayesian

Approach to Tackling Hard Computational Problems. In Proc. of the 17th Conference on Uncer-

tainty in AI., University of Washington, Seattle, Washington, USA, August 2-5, 2001. Morgan

Kaufmann 2001. pp 235-244.

Contents

1 Introduction: Harnessing Search Algorithm Bias 2

1.1 Diversifying Search Bias: A Portfolio Approach . . . . . . . . . . . . . . . . . . 2

2 An Abstract Portfolio Model 3

2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Selection Mechanisms: A Detailed Look 8

3.1 Performance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Parallel Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Switching Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4 Predictive Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4.1 Static Predictive Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4.2 Dynamic Predictive Portfolios . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Evaluating Portfolio Performance: A Detailed Look 14

4.1 Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.3 Solution Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4 Hybrid Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Crosscutting Themes 18

6 Closing: Beyond Bias 20

7 Acknowledgments 21

1

Bias and prejudice are attitudes to be kept in hand, not attitudes to be avoided.

Charles Curtis

1 Introduction: Harnessing Search Algorithm Bias

Search algorithms are biased. Each algorithm has a specific way of “looking” at a problem

that shapes how it can (or cannot) find solutions. Further, each problem instance has unique

features that distinguish it from other instances. Both the algorithm design and the specific

problem instance impact search algorithm performance. Most realistic problems – practical

problems for which there are measurable societal or cost benefits to solving – are difficult

because of either the size of the search space or the inability to find an efficient algorithm.

The variety of algorithms and their parameterized variants make it difficult to select the most

efficient algorithm for a given problem instance.

Bias can be added by incorporating problem specific knowledge at the decision points of

the search: in the choice of representation, the transition operators, selecting which branch to

explore next, allowing stochasticity, choosing a restart policy, choosing variable and/or value

bindings, and selecting the appropriate heuristic selection or heuristic parameters are just a

few. Other forms of dynamic knowledge can be incorporated into search as it runs. Meta-

heuristic methods allow search to learn between successive restarts of local search methods.

For example, some methods learn the critical components of all solutions (as in the so-called

backbone [SGS00]). Other examples include keeping an elitist list of the best solutions to

encourage localized restarts (as in path relinking [GL00]), storing a list of recent moves to

avoid redundant cycling (as in Tabu Search [Glo89]), or making randomized moves from the

current position with decreasing probability (as in Simulated Annealing [KGV83]).

Bias can also be added in how the search algorithm makes its decisions. Some decisions

can be completely arbitrary with respect to problem information. A classic example of such

non-information is the use of stochasticity as in Las Vegas and Monte Carlo algorithms. More

recently, Gomes, Selman and Crato [GSC97] proved that stochastic algorithms can remove

infinite moments in the runtime distributions of search on combinatorial problems.

The bewildering array of algorithm design choices can be daunting to the uninitiated –

and the above listing just touched on the most surface level design decisions. A change to a

single decision can have dramatic impact on the search cost. Additionally, it is unclear how to

understand potential interactions and dependencies between the varieties of knowledge. Each

method adds a particular kind of bias to the search, and managing that bias to create a more

efficient search remains a central question for researchers in Artificial Intelligence.

1.1 Diversifying Search Bias: A Portfolio Approach

Portfolios are a well-known way to hedge against significant performance differences among

a set of choices given a finite resource such as time. Adding problem specific information to

search introduces bias. Human practitioners frequently manage such bias by acquiring a kind

of black art for applying AI algorithms to new problems. Human acquisition of this art is

costly (often taking years of study) and often lacks scientific rigor because the knowledge is

primarily learned by trial-and-error rather than guided by scientific discourse and deep insight.

2

It seems natural to apply learning to the algorithm selection problem of allocating computa-

tional resources among a portfolio of algorithms that may have complementing (or competing)

search technologies. Such selection is called the portfolio strategy. Algorithm portfolios can be

classified as fitting one of three types. Parallel portfolios orchestrate two or more algorithms

on a set of parallel processors. Switching portfolios counterpoint a suite of algorithms on a

single, serial processor. Predictive portfolios select the single algorithm with the strongest forte

that runs to completion.

Rice [Ric76] formulated what he called the “Algorithm Selection Problem.” He examines

the key issues for learning a function approximation of algorithm performance in order to

select the best algorithm for a new problem input. This early work was a foundation for

our abstract portfolio model we present in Section 2. Rice’s original formulation remained

somewhat obscured and hidden for the next 20 years.

Algorithm portfolios (re)surfaced in 1997 with the examination of restart strategies for

stochastic search algorithms by Gomes and Selman [GS97] and Huberman, Lukose, and Hogg

[HLH97] and also with the combination of complex algorithms for automated planning by

Howe et al. [HDH+99]. Since that time, the algorithm portfolio approach has received wider

attention across the sub-disciplines of Artificial Intelligence. The specific focus of this exam is

to answer two questions as they relate to these more recent portfolios:

1. Given a set algorithms and performance models of those algorithms, what selection mech-

anism will effectively choose between them?

2. Given a portfolio, how does one evaluate its performance?

To help frame the context and focus of the report, we present an abstract model of portfolio

design that encompasses the literature in this study. In describing the model we point out

how the features and algorithms are combined to make performance models. We then break

into a discussion of how each portfolio addresses the two questions. Interspersed throughout

the paper are related critical remarks and open questions; these remarks will be highlighted

by slanted text. Before closing, we present some of the cross-cutting themes of research in

portfolio selection and evaluation.

2 An Abstract Portfolio Model

Portfolio construction can be divided into three activities: First, a set of reliable features are

selected for the performance models. Next, some form of (ad hoc or principled) model selection

refines and combines the models for use by the portfolio. Last, the portfolio performs algorithm

selection upon presentation of a new problem. An important auxiliary step is evaluating the

output of the model, though it should be clear that evaluation occurs at each stage of the

portfolio construction.

There are important considerations to make in each of the three activities. To aid in

understanding how these activities interrelate, we provide an abstract portfolio in Figure 1.

Formally, we define a portfolio strategy as an ordered tuple 〈F ,A,S〉, where F is a set of

features the portfolio uses to model algorithm performance (Figure 1 in purple), A is the set

of algorithms (Figure 1 in blue), and S is the selection mechanism (Figure 1 in red). The

3

FeaturesModel

Selection

Problems

FeatureSelection

Algorithms

Performance Models

Individual

...

Selection Mechanism

PortfolioParallel

Portfolio

PortfolioPredictive

Switching

Aggregate

AlgorithmSelection

Figure 1: A diagram of an abstract portfolio strategy. The problem instances (in green) informthe features and performance models that are used. Features (in purple) may be subjected tofeature selection before their use in the performance models. The algorithms (in blue) mayprovide some features and are used in the final stage of the selection mechanism. Finally,the selection mechanism is composed of the performance models, the model selection, and thealgorithm selection. The selection mechanism works with the specific portfolio architecture(shown on the right).

features and algorithms are combined into performance models that are used by the selection

mechanism, which consists of those performance models, the model selection, and the algorithm

selection. This model combines and extends the two existing models in the literature of Guo

and Hsu [GH04] and Rice [Ric76]. In the following subsections, we briefly discuss the features

and algorithms as well as highlight some important considerations as they relate to portfolio

construction. Because it is a primary focus of the exam, we more thoroughly discuss the

selection mechanisms in Section 3.

2.1 Features

The features in a portfolio are employed by the selection mechanism to predict algorithm

performance. Each feature f ∈ F is specified as an ordered tuple f = 〈fc, fd, ft〉, where fcdenotes the feature extraction cost as problem size grows, fd indicates whether the feature is

a static or dynamic feature, and ft indicates what the feature measures (problem distribution,

problem instance structure, algorithm progress, or the heuristic in use).

We can broadly classify feature cost according to classical complexity analysis as problem

size increases. Static features are easily extracted from the problem definition. Examples of

simple static features include the number of objects in the problem or the number of goals

in the problem; such features are usually constant or linear. Other features might extract

information from the structural relationships in problem instance; such features are usually

polynomial. A structural feature in SAT would be the average degree of the connectedness of

clauses and variables as used by Nudelman et al. [NLBH+04].

A more detailed discussion of all 128 features across all portfolios is outside the scope of

this paper as it would necessitate a discussion of the problem domains (some features can

only make sense in the context of the domain). Table 1 shows the aggregate counts of feature

4

fcPortfolio Constant Linear Polynomial NP-probe TOTALCarchrae and Beck (2005) 1 1Gebruers et al. (2005) 12 12Guerri and Milano (2004) 4 4Guo and Hsu (2004) 2 4 6Horvitz et al. (2001) 43* 43Kautz et al. (2002) 2 2Leyton-Brown et al. (2003) 25* 25Vrakas et al. (2004) 22 13 35

TOTAL 15 28 85 128

Table 1: A summary of the feature complexity for the 128 features of portfolios represented.The values with asterisk indicate worst case bounds for all features because the authors werevague in their descriptions. The portfolios by Gomes and Selman [GS01] and Fink [Fin04] didnot use features for their runtime models.

ftPortfolio Distribution Instance Algorithm Heuristic TOTAL

Carchrae and Beck (2005) 1 1Gebruers et al. (2005) 12 12Guerri and Milano (2004) 4 4Guo and Hsu (2004) 6 6Horvitz et al. (2001) – – – 43Kautz et al. (2002) 1 1 2Leyton-Brown et al. (2003) 22 3 25Vrakas et al. (2004) 22 4 9 35

TOTAL – – – 128

Table 2: A summary of the feature types for the 128 features of portfolios represented.The values with dashes indicate estimates of features for which the authors provided vaguedescriptions. The portfolios by Gomes and Selman [GS01] and Fink [Fin04] did not use featuresfor their runtime models.

5

complexity by portfolio and Table 2 shows a summary of the primary feature types used

across the portfolios. The final features used in a portfolio are usually a subset of the original

features. The authors often begin with a wide selection of features and then refine the list

as the portfolio models mature. For this reason, many features of the final models are of the

flavor f = 〈fc = polynomial, fd = static, ft = structural〉.

A key question for feature extraction is how to balance model accuracy with feature ex-

traction cost. Clearly the model accuracy depends upon the kinds of features used to build the

model. As stated above, most authors select efficient (low-cost) features to build their models.

It is a common assumption that low-cost features are sufficient to build accurate models. For

this reason, there is a bias against using NP-probe features as evidenced in Table 1.

A promising future direction is to use more sophisticated probing features. Nudelman et

al. [NLBH+04] and Hutter et al. [HHHLB06] both present some features that are extracted

from algorithms that attempt to reduce the problem to its combinatorial core. Sometimes

these simplifying steps actually solve the problem. More sophisticated features may cost more

to extract, so they really should only be used when necessary. Leyton-Brown et al. [LBNA+03]

propose “smart feature computation” that partitions the features according to their extraction

cost. Models are constructed for each feature partition, and more costly features are avoided

until the portfolio needs more accurate predictive models.

Another key issue is the degree to which features should be problem specific in order to

obtain accurate models. Some features in the literature are so problem specific as to preclude

their use in any other problem. Such features might advance specific knowledge of the given

problem but fail to provide clues guiding better insight into producing more predictive models

for other problems. Problem-specific features also imply that a domain expert has recommend

suitable features. So the problem of building good algorithms has turned into the problem of

building good algorithm models; domain expertise is still needed but has just transfered to a

different application. Carchrae and Beck [Car05] propose using “low-knowledge” features for

their models in an effort to eliminate the transfer of expertise from algorithm design to algo-

rithm modeling. Their features are designed to problem-independent and they only measure

an algorithms progress relative to other algorithms.

2.2 Algorithms

Each algorithm a ∈ A is also defined as a tuple, a = 〈as, ac, at, an〉, where as specifies the

algorithm setup (static or parameterized), ac denotes whether the algorithm is complete (exact)

or incomplete (approximate), at indicates the primary search type (sampling, local search,

search, stochastic, etc.), and an denotes the algorithm’s common name (Gibbs, ant colony,

Tabu, etc.). If the algorithm is parameterized, then a number after as indicates the number of

possible parameter combinations; for example parameterized(3) means there are three variants

of the same algorithm.

Table 3 summarizes the kinds of algorithms used for each portfolio. Many of the algorithms

are designed for specific problems, but there are some common themes we can draw from the

list. One design is to pair a complete algorithm with an approximate algorithm. This choice

makes sense if some instances are easily solved but other instances are intractable. A perfect

example of this is the clause-to-variable ratio in SAT: some instances will be quickly found

6

Portfolio as ac at anCarchrae and Beck (2005) static E LS Tabu Search

static E search Constructive using texturestatic E search Limited Discrepancy Search

Fink (2005) param(3) A search Partial Order with Backward ChainingGebruers et al. (2005) param(12) A search Constraint Satisfaction Problem (CSP)Gomes and Selman (2001) static E SS randomized backtracking CSP

static E SS randomized B&B MIPGuerri and Milano (2004) static E search IP-based Branch and Bound

static A search IP-based with Linear Relaxationstatic A hybrid IP-based with Linear Relaxation plus CSP

Guo and Hsu (2004) static E search CliqueTreePropogationstatic A sampl Stochastic Sampling: Gibbsstatic A sampl Stochastic Sampling: Forward Samplingstatic A hybrid Ant Colony Optimizationstatic A LS Hill-climbing with Restartsstatic A LS Tabu Search

Horvitz et al. (2001) static A SS Stochastic SATz with Noisestatic A SS Stochastic CSP with Alldiff constraints

Kautz et al. (2002) static A SS Stochastic CSPstatic A SS SATz-Rand

Leyton Brown et al. (2003) static E search CPLEX by ILOGstatic E search Branch and Boundstatic E search Branch and Bound with a non-LP heuristic

Vrakas et al. (2004) param(864) A search Weighted A*

Table 3: The algorithm types used in each portfolio (left column) from the central papers.Column as denotes the algorithm setup of static or parameterized (param). Column ac denotesthe algorithm completeness as exact (E ) or approximate (A). Column at denotes the algorithmsearch technique as sampling (sampl), search, hybrid, local search LS, stochastic search (SS ),or stochastic local search (SLS ). Column an denotes the common name of the algorithm.

to have a contradiction or solution using backtracking search while problem instances near

the phase transition of this parameter could take very long to solve and might be best solved

by stochastic or approximate methods. Another design pairs algorithms of a similar type

(stochastic or exact). A third design uses algorithms with markedly different search techniques

(for example, Carchrae and Beck [Car05]).

Leyton-Brown et al. [LBNA+03] use the metaphor of “Boosting” (from ensemble learning)

to elucidate some key issues in selecting algorithms and designing new algorithms. The authors

present some general techniques to perform per-instance algorithm selection and discuss how to

evaluate their new approach. A main contribution of the paper is positing that algorithm design

should produce algorithms that focus on solving those problems which are most difficult for

the current set of algorithms. This is similar to the boosting paradigm, where each successive

learner receives a training set increasingly biased toward including misclassified instances. New

algorithms for a portfolio should focus on the most difficult problem instances.

The kinds of algorithms available to the portfolio correlates with the problem and the

purpose of the portfolio. Some problems have a long research lineage and may have many

algorithms. There may even be favored algorithms for solving particular instance families. In

7

Authors Portfolio type Model Selection

Gomes and Selman (2001) Parallel Agg: Statistical

Guo and Hsu (2004) Predictive Static Hyb: Decision TreeVrakas et al. (2004) Predictive Static Agg: Rule-based systemGebruers et al. (2005) Predictive Static Agg: CBRVrakas et al. (2004) Predictive Static Agg: kNNGuerri and Milano (2004) Predictive Static Agg: Decision TreeCarchrae and Beck (2005) Predictive Dynamic Agg: BayesianFink (2005) Predictive Dynamic Ind: Statistical Best gain

Gomes and Selman (2001) Switching Agg: Statistical Balance mean/varianceHorvitz et al. (2001) Switching Ind: Bayesian Weighted accuracy/probabilityKautz et al. (2002) Switching Ind: Bayesian Expected runtimeCarchrae and Beck (2005) Switching Ind: Cost Improvement Reinforcement Learning

Table 4: Selection mechanisms with their corresponding models.

such a case, it might be valuable to add only the algorithms that resolve a particular family

of instances that no other algorithm can solve. On the opposite end of the spectrum, a new

problem may only have a few algorithms and there may be scant conventional wisdom about

solving the problem. In this case, the portfolio may highlight the difficult instances and drive

further insight into algorithm design. The portfolio then becomes a vehicle for gaining insight

into the dependencies between algorithms and the problems.

2.3 Selection

The selection mechanism is the centerpiece of the portfolio. It combines the features and

algorithms to decide on an appropriate allocation of resources for the portfolio. We devote the

entire next section to our discussion of Selection.

3 Selection Mechanisms: A Detailed Look

Selection mechanisms are situated to provide the kind of decision needed according to the

portfolio architecture. There is a single selection mechanism in each portfolio that is composed

of a set of performance models, a model selection process, and an algorithm selection process.

In a parallel portfolio, the selection mechanism “recommends” a distribution of algorithms

across multiple processors. In a switching portfolio, the selection controls a set of serially

running algorithms by dynamically (re)allocating runtime as more information arrives. In a

predictive portfolio, the selection mechanism simply predicts the best algorithm that will run

to completion. The key distinction between predictive and switching portfolios is whether a

single prediction is made or whether multiple iterations of the decision are made. Predictive

selection mechanisms can be further split by the inclusion of runtime features (a dynamic

model) or only structural features (a static model). Table 4 shows the variety of available

selection mechanisms along with the papers that use them. Some authors compare multiple

selection mechanisms in their papers, so they are represented more than once in the table.

8

3.1 Performance Models

Though it is true that ultimately the algorithms themselves are “selected,” the selection mech-

anism must use some information to actually choose the algorithms. Thus it is more accurate

to say that that algorithm selection is performed on a set of performance models that sit be-

tween the algorithms and the selection decision. These models relate algorithm performance

to specific problems or problem distributions and are typically built from one or more features

delineating algorithm performance. As mentioned above, sometimes features are segmented

by extraction cost to create increasingly more accurate, but also more costly, models. Other

times, the models use the lowest cost features that yield reasonably accurate performance.

Model selection is an important design decision in portfolio construction. Figure 1 shows

the two kinds of model selection commonly used in the literature. Initial models in the litera-

ture used explicit models of each algorithm; we denote these as M(A1)..M(An); individualized

models are used by the portfolios with “Ind:” in the Model column of Table 4. The selection

mechanism needs to chose the algorithm with the most probable success or earliest runtime

as predicted by the models. A common method is to rank the algorithms according to the

prediction rate of their performance model and probabilistically select the best ranking. For

example, Gebruers et al. [GHBF05, GGHM04] use the prediction rate and execution time. Sim-

ilarly, Carchrae and Beck [Car05] weight the algorithms based on their previous performance.

A related method explicitly models the random variable associated with the runtime of the

algorithm. These statistical models are used by Gomes and Selman [GS01] and Fink [Fin04].

Model selection for these statistical models is still done using ranking or weighted random

procedure.

An alternative model selection uses an aggregate predictive model, denoted M(A1..An).

The aggregate model is a single model that selects the best algorithm based on past learning.

The selection mechanism for an aggregate model selection focuses on distinguishing between

algorithms by implicitly modeling the differences in performance. For example, Guo and

Hsu [GH04] use a decision tree to choose among multiple approximate algorithms based on a

set of features. Aggregate model selection is used by the portfolios denoted with “Agg:” in

the Model column of Table 4.

The individualized models predict various values depending on their use by the selection

mechanism. Some models perform regression on runtime as a continuous value, some classify

time as a discretely binned value, and others broadly characterize the predicted runtime as

binary (short or long). At an even higher level of abstraction, some models simply predict

whether the algorithm will succeed or fail. Portfolios are not bound to a single modeling type.

One portfolio by Guo and Hsu [GH04, GH02] uses both types of modeling (aggregate and

individual). The portfolio first uses an individual model to predict whether the exact algorithm

will succeed. If the answer is “no” then an aggregate model selects the best approximate model

based on problem instance features.

Two important considerations for building performance models are minimizing model cost

as well as minimizing model complexity. Each model takes time to extract features, collect

runtime data, learn the data, and use to predict a new value. For example, dynamic features

are costly than structural features because algorithms must run to extract the features for both

learning and prediction. Throughout the literature, model costs remain somewhat obscured

9

by the metrics and discussions in these papers. Carchrae and Beck [Car05] produce effective

control based on low-knowledge features, but they exclude a discussion of model construction

time. Fink [Fin04] clearly identifies the model cost for his portfolio, as does Leyton-Brown

et al. [LBNA+03]. Most authors are biased toward efficient features or toward incorporating

feature cost analysis is biased toward cheaper features. It doesn’t make sense to spend the

effort learning if the learning time would be better spent on searching. But the trade off is not

always clear.

A related problem to model cost is model complexity. If the performance of two mod-

els is equivalent then the simpler model should be preferred over the complex; this is fre-

quently referred to in the scientific community as Occam’s Razor. The portfolio by Horvitz et

al. [HRG+01] incorporates model accuracy in the selection process – as the accuracy increases,

so does the confidence in the prediction. Leyton-Brown et al. [LBNA+03] partition features

to bias model construction toward lower complexity. Carchrae and Beck [Car05] examine low-

cost models of runtime selection under the conjecture that high-cost models actually impair

technology transfer. But they only evaluate this conjecture by building low-cost models. A

more complete analysis needs to examine when it is appropriate to prefer more complex models

to outperform low-cost models.

Though we are mostly focused in this paper on portfolios that learn per-instance perfor-

mance models, learning problem distributions are also an important part of some portfolios.

It is fairly well known that problem instances within a problem class can be distinct. A clas-

sic example of differences in problem instances is the phase transition in SAT. Cheeseman,

Kanefsky, and Taylor [CKT91] show a phase transition in solvability as the ratio of clauses to

variables is varied. Another example is the job correlated and machine correlated jobs in the

flow shop problem studied by Watson et al. [WBWH02]. In such cases, modeling each dis-

tinct distribution might lead to more accurate predictions than a single model could produce.

Distribution modeling is a useful approach if some prior information about the problem distri-

butions is already known. This issue is addressed by Horvitz et al. [HRG+01], Leyton-Brown

et al. [LBNA+03], and Gomes and Selman [GS01].

In the next three subsections, we take a closer look at the three kinds of portfolios and

the selection mechanisms they use. We begin by examining parallel portfolios that distribute

algorithms on a set of parallel processors.

3.2 Parallel Portfolios

An early work on portfolios by Gomes and Selman [GS97, GS01] examined a portfolio consisting

of allocating two algorithms across 2, 5, 10, and 20 processors. The two algorithms are chosen

because one dominates early in search and the other dominates later in search. They find that

the best portfolio mix is highly dependent on the runtime distributions of the algorithms on

specific problems. A key contribution of this portfolio is to characterize algorithm runtimes

in terms their mean and variance (also called risk) and to show the trade-off associated with

these two metrics. Huberman, Lukose, and Hogg [HLH97] explored this trade-off as well.

Parallel portfolios have received less recent attention since they were introduced in the

literature. This type of portfolio architecture will become increasingly important as computer

architectures become more parallel and as more work extends our knowledge of stochastic algo-

10

rithms. A recent book by Hoos and Stutzle [HS04] brings together many aspects of stochastic

local search algorithms, but not a huge amount of work has applied these advances in portfolio

learning. For now, the parallel portfolio design remains an open question that needs more

exploration.

3.3 Switching Portfolios

A switching portfolio uses predictive modeling to allocate resources (computation time) to two

or more algorithms over a series of iterations; portfolios that perform restarts are included in

this type. One view is that the algorithms are competing with one another for time. The

iterative nature of the portfolio adds another layer of learning to the selection process that

must balance exploitation of known information and exploration of new states. The competitive

paradigm of a switching portfolio requires that algorithms can be compared on equal footing.

An early model for switching portfolios is given in Howe et al. [HDH+99]. This portfolio uses

individual algorithm models based on linear regression to predict the runtime for a problem.

The algorithms are then ranked according to the ratio PA(p)lA(p)

, where PA(p) is the probability

of algorithm A solving problem p and lA(p) is the predicted runtime of the planner solving p.

Simon and Kadane [SK75] prove that this ranking minimizes the expected length of time of

running a set of algorithms in series until one succeeds.

Gomes and Selman [GS97, GS01] use a selection mechanism that calculates the trade-off

associated with the expected runtime and variance (also called risk) of running a combination

of algorithms on a serial processor. They analytically determine the best restart strategy that

minimizes both metrics. In contrast to the parallel portfolio mentioned in Section 3.2, this

serial portfolio reveals the impact of context switching on a serial process; as the number of

simultaneous algorithms running on the processor increases, the mean runtime also increases.

This is an expected behavior that results from running a “parallel” process on a serial processor.

However, the experiment also highlights the need for determining an architecture dependent

strategy.

Horvitz et al. [HRG+01] model two algorithms with individual Bayesian Network models

as described by Chickering, Heckerman, and Meek [CHM97]. They allocate time to the algo-

rithms based on a weighted sum of model accuracy and probability of success. The goal is to

minimize the expected number of choice points, which correlates linearly with search cost. The

networks are trained to predict short or long runs. Kautz et al. [KHR+02] extend the work of

Horvitz et al. [HRG+01] by identifying a class of provably optimal dynamic restart policies. In

particular, they examine the case of dynamic restarts for dependent runs. They also empiri-

cally evaluate their polices as compared to another optimal policy described by Luby, Sinclair,

and Zucherman [LSZ93]. We explore this optimal policy below in our discussion of evaluation

(Section 4).

Carchrae and Beck [Car05] present a control method that measures improvement of three

algorithms by their cost of improvement per second (a single, low-knowledge feature). For

N iterations and a time limit per iteration ti, the total time limit for the portfolio is T =∑1≤i≤N ti. In each iteration, the three algorithms are randomly ordered and allocated time

proportional to their weights, which are adjusted based on the progress in previous iterations:

wi+1(a) = αpi(a) + (1 − α)wi(a), where a is the algorithm, α is the learning rate and pi is

11

the current (normalized) value of the performance of the algorithm for iteration i. Algorithm

weights are uniformly initialized to 1/|A|, and no weight updating is done until the second

iteration.

An important issue for switching portfolios is when one should stop an algorithm. As Fink

notes, “for most artificial-intelligence search algorithms, the probability of solving a problem

within the next second usually declines with the passage of search time” [Fin04, 10]. But

choosing an effective stopping (or restart) strategy is challenging because it might also be the

case that the algorithm is close to a solution but has not actually located it. Restart strategies

remain an open area of research in algorithms; we discuss this more fully in the Section 4.

The switching portfolio by Carchrae and Beck [Car05] allows the transfer of the best solution

between algorithms. Clearly, such transfer is possible only if the algorithms share the similar

representation or if there is a well-defined mapping between differing representations. It is well

known that algorithm progress depends on the connectedness of the search space - that is, the

way in which the algorithm can move from one solution (or solution distribution) to another.

Changing algorithms may mean changing representations as well as search operators; this

change implies modifying the connectedness and solution adjacency of the search landscape.

The literature characterizing the connectedness of the search space could help extend switching

portfolios and is an open area for research. There may be features of each landscape that could

“inform” the performance models. It might also be possible to characterize solution adjacency

so that a portfolio could change representations. An open question in this area is know when

such a change would be beneficial.

3.4 Predictive Portfolios

A predictive portfolio makes one-time decision between two or more algorithms. A distinction

can be made between models that do not use runtime features (static models) and those that

do (dynamic models).

3.4.1 Static Predictive Portfolios

Case-based systems provide a straightforward way to model performance and guide selection. A

predictive portfolio by Gebruers et al. [GHBF05] selects among a 12 variants of a parameterized

depth first search algorithm. Given a new instance, the case-based model finds the closest

matching previously learned case and predicts the performance using that case.

Vrakas et al. [VTBV05, TVBV04] present another case-based system that selects the best

configuration for a new problem instance based on the nearness of that instance to previously

seen cases. The selection is based on a weighted combination of the seven closest neighbors

(k-nearest neighbors (kNN) where k = 7). The portfolio is intended as a baseline portfolio

for evaluation, but it can actually be considered another selection mechanism. The authors

use batch training to populate the tables, but they do point out that the method could be

sequentially constructed as each instance arrives.

Two disadvantages of instance learning are the storage and retrieval of the previously

learned knowledge as well as the issue of dimensionality. The storage size and retrieval time

are both linear in the number of training samples because every point must be compared to

12

the sample using a distance metric. For small samples the storage is not so much of an issue,

but a large number of instances with many features can lead to an instance database that is

overwhelming. An example of this is the system by Vrakas et al. [VTBV05] where they need to

store values for 35 features for 864 different variants of their algorithm for every single training

instance. Clearly representation of the instances is a major design decision.

Another issue is how well the space is covered by the instances and can lead to the so-

called curse-of-dimensionality. Each additional feature has implications for effective retrieval.

Reducing dimensionality by feature selection or making the space more manageable by weighing

the instances are two ways to cope with the instance space. Mitchell [Mit97, 297] discusses

this topic in relation to instance-based learning.

Inductive models are well-suited to generalizing discrete decisions such as whether an algo-

rithm will succeed (a binary decision) or which algorithm might be best (an |A|-ary decision).

Guo and Hsu stage selection as two decision trees selecting between an exact algorithm and

incomplete algorithms. First a single decision tree has learned to predict whether the exact al-

gorithm will succeed given several polynomial features. If the answer is negative, then a second

decision tree selects the best approximate search algorithm from five variants that operate with

distinct search mechanisms. Guerri and Milano [GM04] construct a decision tree model that

selects between an integer programming or hybrid constraint programming approaches. The

authors refine the 25 features of Leyton Brown et al. [LBNS02] down to 4 essential features.

Rule-based systems are another inductive model similar to decision trees; in fact, many

rule-based algorithms begin by building a decision tree then converting the tree into a set

of rules. Vrakas et al. [VTBV05] present a rule-based system for automated planning that

chooses the best parameters among 864 variants of a parameterized planning system. The

rules are constructed by a rule-based system that induces classification rules by boosting the

rule learner. The classifier predicts “good” and “bad” runs.

Inductive models provide a way to overcome the storage limitations of case-based systems

mentioned above. Since the models learn general performance patterns, they are a reasonable

way to model performance. However, there are some issues with learning inductive models.

First, the most common models assume discrete distribution over the target values; in the case

of runtime this introduces potential model errors. Leyton-Brown et al. [LBNA+03] discuss

classification/regression issues related to predicting runtime. They point out that the error

metric for classification may not appropriately penalize errors. For example, most classifica-

tion algorithms will equally penalize off-by-one-bin errors and off-by-many-bins errors. But

appropriate regression models will penalize large misclassification as worse than small. Even

with this valid criticism, it seems possible to build reasonably accurate models that can be

used as “weak learners” in a portfolio to identify the more difficult problems.

3.4.2 Dynamic Predictive Portfolios

Carchrae and Beck evaluate a predictive portfolio based on a Bayesian model to predict the

best algorithm out of three. The portfolio is given a time limit (in seconds) T = |A|tp + tr,

where tp indicates how long to run each algorithm to collect informative runtime features and

tr indicates how long to run the selected algorithm. For example, if |A| = 3, T = 1200, and

tp = 90 then tr = 1200 − 3(90) = 930; the authors empirically identified that the best tp is 90

13

seconds.

Fink [Fin04, Fin98] explores the application of statistical selection among three search

algorithms for the PRODIGY planner. Similar to Carchrae and Beck [Car05], this portfolio

uses a model that is trained on-line. Model selection is done by choosing the algorithm with

the highest payoff as measured by the difference between a reward predicted runtime (see

Section 4.2).

One key question in dynamic portfolios is how long one should run a predictive phase, tp,

of the algorithms. This is a critical question because one may be faced with an overall time

limit as in the portfolio above. The only method tried in the literature was to empirically

evaluate the best choice. But it seems reasonable to consider that one might automatically

learn a value for tp. For example, one might use either the statistical methods of Horvitz et

al. or the optimal bounds provided by Luby et al. to arrive at a value for tp by assuming the

runs are capped at a value that is some fraction of the total time T .

4 Evaluating Portfolio Performance: A Detailed Look

There are three ways that current portfolios are assessed: 1) using prediction accuracy of the

aggregate model, 2) using runtime, or 3) using solution quality. Comparison is usually done to

an optimum. In the case of prediction accuracy, the best is implied to be 100 percent accuracy.

For solution quality and runtime, comparison can be made to an empirical best solution from

any algorithm, or to the best given by a straightforward approach, or to a provable optimum

given specific assumptions.

Some authors create hybrid metrics by combining runtime and solution quality. There can

be ambiguity if time is critical in one domain but solution quality is imperative in another, and

these two metrics may even be competing for some problem domains. Consider a scheduling

domain where one can arrive at a (vastly) suboptimal solution by relaxing constraints (such as

the the minimum makespan) then further refine the solution at the cost of runtime. It might

make sense to combine the both metrics to allow for a portfolio that finds the best solutions

as quickly as possible.

Designating a clear winner is not always easy and the difference between algorithm perfor-

mances might be so slight as to have negligible impact on the overall portfolio performance.

For example, some problem instances may be easily solved by two of the three algorithmic

approaches. In such a case, it might be preferable to allow a “band of success” within which

algorithms are recognized as reasonably competitive. In this fashion, the key issue is avoiding

(all of) the inefficient algorithms and preferring (any of) the efficient algorithms. To accom-

plish the “band of success” some authors allow a deviation – which is often a factor – about the

optimum around which a portfolio is said to be competitive. In fact, Gebruers et al. [GHBF05]

organize their results with the factor above optimum as the dependent variable in their plots,

which show that an algorithm’s relative performance changes as the requirement of choosing

the very best algorithm is relaxed.

14

4.1 Predictive Accuracy

Guerri and Milano [GM04] also use prediction accuracy to evaluate their system which uses a

decision tree to select between two algorithms. Guo and Hsu also evaluate the exact models

using classification accuracy.

Horvitz et al. [HRG+01] compare their results in two ways depending on what is being

measured. For models that attempt to predict the runtime (short, long) of two different

algorithms, they compare the classification accuracy of the model against the marginal model

and they also compute the average log score for comparison.

4.2 Runtime

Gebruers et al. [GHBF05, GGHM04] evaluate their method using two metrics: prediction rate

and total (summed) execution time. They convincingly argue that prediction rate alone is

misleading because: “there would be little utility to a technique that picked the best strategy

for 90% of the instances if these were ones where solving time was short but which failed to

pick the best strategy for the remaining 10% of instances if these were ones where the solving

time exceeded the total for the other 90% [GHBF05, 7].” They compare the performance of

the winner-take-all, random, and weighted random algorithms.

The dynamic measure of gain given by Fink [Fin04] is intended to capture the amount of

progress that the selection algorithm provides. A more complete description of the metric is

provided in later work by Fink [Fin03]. Gain is defined as the difference between a reward (R)

and predicted runtime (time) of an algorithm. The intuition is that the selection mechanism

is paying for runtime and that this runtime counts against any reward. For example, the gain

will be R − time if the algorithm is successful, but it will be −time if the algorithm hits the

time bound (because R = 0). Clearly, it is preferable to select algorithms for which R > time.

Gomes and Selman [GS01] compare portfolios using a combined measure of expected run-

time and variance, also called risk. The combined metric provides more information than mean

runtime alone and gives a sense of the trade-off between runtime and risk.

For the switching portfolio evaluation, Horvitz et al. [HRG+01] show that the expected

runtime of a selection mechanism incorporating model accuracy will return a lower expected

runtime over the optimal “no-information” bound given by Luby et al. [LSZ93].

The earlier work of Luby et al. [LSZ93] on Las Vegas algorithms proved an optimal bound

of a universal strategy for the expected runtime under the two assumptions: 1) that little or

no information is known about the runtime of algorithm, A, and 2) that we are only running

algorithm A on a specific problem p ∈ P until we obtain a single answer [LSZ93]. They present

what they call an optimal universal strategy, Suniv, characterized as:

Suniv = (1, 1,2, 1,1,2,4, 1,1,2,1,1,2,4,8, 1,1,2,1,1,2,4,1,1,2,1,1,2,4,8,16, 1, . . .).

If we denote the expected runtime of an algorithm A on a problem p ∈ P as lA(p), then they

prove that the expected runtime of Suniv is O(lA(p) lg(lA(p))) – that is, the runtime of the

universal policy is within a logarithmic factor of the optimal policy that could be obtained

with exact knowledge of the runtime distribution for algorithm A.

15

More extensive comparison to Luby’s optimal bounds is also done by Kautz et al. [KHR+02].

The authors point out that the two key assumptions of the optimal bounds are typically vio-

lated. They show that it is possible to achieve dynamic runtime bounds – bounds that include

information about the runtime or problem distribution – that are more accurate than the Uni-

versal log-optimal policy and the fixed optimal policy. They also add information by allowing

a decision tree to predict the distribution from which the problem came prior to calculating

the bounds. All bound methods dominate the median runtime distributions, and the more

informed dynamic bounds dominate the those of Luby, Sinclair, and Zucherman [LSZ93].

4.3 Solution Quality

Carchrae and Beck [Car05] compare all portfolios (predictive and switching) to the best single

winner-take-all algorithm using two metrics that characterize the average amount an algorithm

finds worse or better solutions than the best or worst solution. The Mean Relative Error (MRE)

states the average degree an algorithm underperformed the best. Given a set of independent

runs, R, on a set, K, of instances, let c(a, k, r) denote the best solution found by a particular

algorithm a in instance k during run r. Also let c∗(k) indicate the best solution found by any

algorithm for instance k. Then,

MRE(a,K,R) =1

|R|

1

|K|

∑

r∈R,k∈K

c(a, k, r) − c∗(k)

c∗(k),

Another metric, the Mean Fraction Best (MFB), describes how often an algorithm found

the best solution. Given a set of independent runs, R, on a set, K, of instances, let best(a,K, r)

be the set of solutions for k ∈ K where c∗(k) = c(a, k, r) during run r. Then,

MFB(a,K,R) =1

|R|

∑

r∈R

|best(a,K, r)|

|K|,

Results for each algorithm on the training problems show that no algorithm dominates any

other across all problems. The results highlight two important details. One, a winner-take-all

approach will under-perform a learned model that is able to identify the best algorithm for an

instance. Two, the results partly justify spending the effort trying to learn a selection model

provided it beats winner-take-all strategy.

Guo and Hsu [GH04] evaluate the approximate model by grouping the data into three sets

(large, medium, small) according to the structure of the problem and then measure the overall

MPE measure; MPE is similar to a log-likelihood in that larger values are more preferable.

Evaluation according to MPE reveals that the approximate model selection outperformed all

other single algorithms except multi-start hill climbing on the medium problems. On a set of

13 realistic problems, the exact algorithm selector correctly predicts the exact algorithm for 11

of them yielding 100% accuracy. The remaining two problems appear to be predicted correctly

by the approximate algorithm selector. Incidentally, the three-fold division of the problems

after the learning is a curious evaluation step because it could have easily been included as a

feature for the approximate model.

16

4.4 Hybrid Metrics

One hybrid metric by Vrakas et al. [VTBV05] relates the normalized solution length, S, and

runtime, T , of the parameterized search engine. Formally, for the ith problem and jth search

configuration, they calculate the normalized solution length, Snormij as Smini /Sij and the nor-

malized solution time, T normij = Tmini /Tij . If the search algorithm timed out at 60 seconds,

then Snormij and Tnormij are set to zero. The evaluation metric, Mij , combines these values as

a weighted sum of these two normalized metrics Mij = wsSnormij + wtT

normij . The authors fail

to clearly state the weight values, but it appears they weight them equally. With this metric,

the authors compare the performance of the rule configured planner against a kNN algorithm

(k = 7), the best average configuration and the best per-instance “oracle” configuration. In

earlier work, Vrakas et al. [VTBV03] also compared the parameterized planner to the best five

static configurations, though there are less parameters for that earlier planner.

Discussion of portfolio evaluation in the literature often focuses on evaluating the portfolio

after it is constructed. This view is very limited because evaluation – critical evaluation –

actually happens at almost every stage of portfolio construction: in the feature design and

selection, the algorithm design and selection, the problem selection and instance creation, the

model creation and selection, and finally in the algorithm selection. Much of the literature

is focused on providing a descriptive model of the design decisions. As a start, descriptive

models are reasonable way to proceed into a new area of research. However, such models fail

to advance our understanding of why the decisions are interesting. A more informative model

would justify why such design decisions are important or relevant by producing predictive

models of the algorithms and portfolios.

An illustrative example might help explain the concept of how a predictive model could

further research more than a descriptive model. We will select a recent portfolio paper that

had many positive contributions to portfolio research; the criticism we apply to this one paper

could be easily found in other papers. Carchrae and Beck [Car05] describe a portfolio that uses

low-knowledge (low-cost) features to model algorithm performance. They implicitly assume,

but do not actually evaluate, that high-cost features are unnecessary to build reasonably ac-

curate models. It would be somewhat straightforward to begin with a clear statement of this

assumption as a research hypothesis, build low-cost and high-cost models, then statistically

evaluate the hypothesis. A clear methodology allows subsequent researchers to build on the

knowledge. If only low-cost features are needed for reasonable models, it would be worthwhile

if such a hypothesis was confirmed as a matter of clear research in any portfolio research paper.

There is an appalling lack of evaluation metrics in the literature, and many of the existing

metrics lack measures of statistical significance. This isn’t to say that there are not good models

to follow. More recent work, particularly that of Fink [Fin04] and Carchrae and Beck [BF04],

incorporates measures of significance into the evaluation; other portfolio analyses rely on the

reader to infer statistical significance. The lack of variety in the metrics could result from

portfolios only recently gaining attention. But there are certainly a variety of existing metrics

that would be easy to borrow or modify from other areas of research. For example, Witten

and Frank [WF05] discuss the different measurements one can use to evaluate classification

algorithms in Machine Learning.

The hybrid metrics seem like an important step in this direction, especially as humans and

17

computers work together in more combined environments and the algorithms are required to

balance multiple objectives in a problem. Portfolios are one way to humanize machine intelli-

gence through approaches that incorporate mixed-initiative and multi-objective evaluation as

well as approaches that have an anytime quality, where the algorithm(s) produce a feasible

solution quickly but then improve on the solution as search progresses. One can picture a

portfolio that uses different search algorithms depending on the specific evaluative needs of

the user. The array of literature on hybrid metrics and their metrics for evaluation could (and

probably should) be brought to use in portfolio research.

One could get creative and incorporate metrics from other fields as well. For example,

information retrieval metrics may be applicable to highlighting performance differences in

terms of solution quality. One could rate the “precision” of an algorithm by how well it locates

good solutions on problem distributions and how well it “recalls” those solutions on multiple

runs.

5 Crosscutting Themes

Some themes impact all portfolios regardless of the selection mechanism or the evaluation

metric. Hooker’s [Hoo95] criticism of “algorithmic track meets” as a poor methodology for

advancing state-of-the-art in algorithms could also be applied to portfolios. The competitive

testing paradigm provides a useful starting point for our discussion of the crosscutting themes.

Selecting the “right” or “best” algorithm for a problem is challenging. As mentioned in

the introduction, there is still a great deal of black art to algorithm design and portfolio

construction. A portfolio seems like a straightforward way to automate the artform. But

stopping there does no further advance our understanding than training a new generation of

researchers with the endeavor to compete in the track meet with more efficient algorithms.

Fortunately, the portfolio approach has merits beyond increasing the robustness or reducing

the search cost of algorithms by harness algorithm bias. We can actually deconstruct the

portfolio’s constituent parts to derive deeper insight into identifying the dependencies between

the domains, heuristics, algorithms, and runtime dynamics. Leyton-Brown, Nudelman, and

Shoham [LBNS02] take promising steps in this direction, but there is a lack of such effort

in most portfolio research. A frequent approach seems to be of the very kind that Hooker

criticized - an algorithmic track meet where the portfolio is simply another algorithm that

does better. To advance knowledge about heuristic search, we need to look deeper into the

portfolio. As Leyton-Brown et al. [LBNA+03] point out:

Since the purpose of designing new algorithms is to reduce the time that it will take

to solve problems, designers should aim to produce new algorithms that complement

an existing portfolio (p. 9).

An excellent way to complement an existing approach is to identify the complement, hypothe-

size that the complement will add to the portfolio, verify that the complementing algorithm did

(or did not) actually enhance the portfolio, and finally explain why the change was observed.

As a side-note, one can get carried away with a vision of an ever increasingly robust

portfolio. It might at first appear that the portfolio approach is headed toward creating a

18

General Problem Solver that inspired early AI researchers. But worst-case complexity results

given by Guo [Guo03] as well as No Free Lunch proofs by Wolpert and Macready [WM95]

preclude Algorithm Selection as a panacea.

Clearly another issue that one must keep in mind for any meta-learning method is whether

the performance boost justifies the effort. Beck and Frueder [BF04] examine the best-possible

gain for improving a search algorithm and found that some cases may not warrant the pro-

grammatic effort it would take to produce better performance. Recent advances in stochas-

tic algorithms and restart strategies deal with heavy-tailed phenomena in runtime very well.

Though it might not be nearly as creative to simply add a random number generator to an

algorithm rather than create a whole portfolio, it could be the very thing that is needed. Clear

analysis and hypothesis driven insight lay an appropriate foundation for determining the value

of harnessing algorithm bias with a portfolio approach.

There seems to be little effort to move this technology toward application in the business

world. In part, this could be the novelty of the approach; it appears that portfolios still need

some answers to the above questions before the technology is completely applicable. Then

again, perhaps this is a naive view. Successful portfolios are leading the way in understanding

the rich substructure of problems. In tandem with research on algorithm design, portfolios may

well lead to greater insight into the inner workings of various search processes. The design of

an effective portfolio’s allows for pragmatic solutions that follow with scientific research. A

portfolio also abstracts away algorithms in a manner similar to the Strategy design pattern

canonized by Gamma et al. [GHJV95]. Under such a pattern, the inherent modularity of a

portfolio allows speedier adoption of new algorithms into an existing piece of software.

The issue of time limits surfaces in many areas of portfolio construction and application:

when to quit altogether, when to restart, and how long to run an algorithm for prediction. In

setting an appropriate time length for individual algorithms, many researchers find that the

best algorithm changes over time. Presumably this has to do with when in the search process

a particular algorithm does better. The total time of the portfolio is closely related to the time

limit of its individual algorithms as well as any on-line modeling costs. It isn’t always clear

how much time it took to construct the models or whether this time should even be reported;

it is certainly impressive when the portfolio, including modeling time, has a better average

time than the best single algorithm. More work needs to be done in exploring the issue of time

limits for portfolios.

Two relatively unexplored areas of research in portfolios is how much information to transfer

between algorithms and how much learning algorithms themselves might perform as they

search. Solution transfer is rare because many algorithms use a unique solution representation

unshared by other algorithms; translation between solutions is impossible or undefined. Only

one portfolio used solution transfer, but other portfolios may have benefited from such a

approach. Adding solution transfer makes a portfolio appear similar to algorithms that perform

global search then refine the solution with some form of local search. Solution transfer also

begins to look like some forms of memory that adaptive algorithms use, which may beg the

question of simply including adaptive algorithms in the portfolio. Certainly it is possible

to consider constructing portfolios using meta search techniques. But it is not clear how a

portfolio might best interact with complex algorithms that incorporate their own learning and

19

branching techniques. The issue is only cursorily discussed in the portfolio literature, but

should probably get more attention.

6 Closing: Beyond Bias

We began our discussion by pointing out that selecting an appropriate algorithm is a sort of

black art. We then looked at a way to harness algorithm bias through a portfolio approach.

We examined several portfolios from the literature to get a sense of the different ways that

researchers have harnessed algorithm bias; in particular, we focused on two questions related

to portfolio strategy:

1. Given a set algorithms and performance models of those algorithms, what selection mech-

anism will effectively choose between them?

2. Given a portfolio, how does one evaluate its performance?

With respect to the selection mechanisms, we identified three primary types (parallel,

switching, and predictive) that closely followed the portfolio architectures. With respect to

evaluation, we showed that authors rely on measures of runtime or solution quality to determine

if the portfolio outperformed individual algorithms.

Unfortunately, we haven’t really solved the issue of the “black art” that began our venture

into this topic. Portfolios are a way to harness the bias of algorithms, and portfolios can

be used to drive deeper insight into understanding algorithm performance. But they are not

often used that way. More frequently, the research in portfolio approach leaves more questions

unanswered than answered because the “art” is performed by a machine rather than a human.

20

7 Acknowledgments

In June, July and August, the author of this report worked with Dr. Adele Howe to refine

the scope and focus of the exam. The student asked for and received feedback about three

revisions of the proposal from Dr. Howe.

September 5: Mark discussed his approach to the exam with Dr. Howe; Landon Flom

was present for this meeting. Mark expressed a particular concern about the scope of written

portion. Dr. Howe reminded the student that the scope of this project was limited to 10

papers and that the tone should be similar to that of a literature review in a dissertation; she

referred Mark to Dr. Laura Barbulescu’s dissertation as a possible model. She also suggested

that Mark spend some time putting together a list of dependencies for the exam.

September 7: Mark sent a proposed time-line for completing the exam by mid-October.

Dr. Howe responded that it might be a little ambitious, but otherwise approved the proposal.

September 19: Mark showed Dr. Howe, Landon Flom and Christie XX a copy of his

proposed outline for the paper. Dr. Howe offered some suggestions about the outline.

September 26: Mark showed Dr. Howe his 4 page revised outline and introduction. Dr.

Howe expressed that the introduction began a little far afield from the primary contents of the

outline

October 4: Mark emailed his 6000 word draft. In a meeting, Dr. Howe reviewed the overall

structure of the document (the table of contents) and made some critique of the abstract

portfolio model; we focused our discussion on improving the figure.

October 8: Mark emailed a revised 7000 word draft that incorporated the suggestions from

Dr. Howe. She reviewed the document in a meeting on October 10 to determine if the work

was close enough to be ready to set a date for the oral presentation. We set the date of October

31, 2006.

October 11: To obtain peer-level feedback about the document, Mark also emailed the

draft to Monte Lunacek, Keith Bush, Arif Albarak, Crystal Redman, Andrew Sutton, Andy

Curtis, and Scott Lunberg. He received direct feedback from Monte Lunacek, Andrew Sutton,

and Andy Curtis. All printed copies of the reviewers’ notes are available to the committee

upon request.

October 24: Mark intends to show his slides to Dr. Howe to get feedback.

October 25: Mark plans to present a mock defense to the Graduate Student Association.

21

References

[BF04] J.C. Beck and E.C. Freuder. Simple rules for low-knowledge algorithm selection.In Jean-Charles Rgin and Michel Rueher, editors, Proc. of the First Conference onthe Integration of AI and OR Techniques in Constraint Programming for Com-binatorial Optimization Problems (CPAIOR 2004), Lecture Notes in ComputerScience 3011, pages 50–64, Nice, France, April 20-22 2004. Springer.

[Car05] J.C. Carchrae, T. & Beck. Applying machine learning to low knowledge controlof optimization algorithms. Computational Intelligence, 21(4):373–387, 2005.

[CHM97] David Maxwell Chickering, David Heckerman, and Christopher Meek. A Bayesianapproach to learning Bayesian networks with local structure. In Proc. of Thir-teenth Conference on Uncertainty in Artificial Intelligence (UAI-97), pages 80–89,Providence, RI, August 1997. Morgan Kaufmann.

[CKT91] P. Cheeseman, B. Kanefsky, and W.M. Taylor. Where the Really hard problemsare. In John Mylopoulos and Raymond Reiter, editors, Proceedings of the TwelfthInternational Joint Conference on Artificial Intelligence (IJCAI-91), pages 331–337, Sydney, Australia, August 24-30 1991.

[Fin98] Eugene Fink. How to solve it automatically: Selection among problem-solvingmethods. In Reid G. Simmons, Manuela M. Veloso, and Stephen Smith, editors,Proceedings of the Fourth International Conference on Artificial Intelligence Plan-ning Systems (AIPS-98), pages 128–136, Pittsburgh Pennsylvania, 1998. AAAI.

[Fin03] Eugene Fink. Changes of Problem Represenation: Theory and Experiments. Stud-ies in Fuziness and Soft Computing. Springer-Verlag, 2003.

[Fin04] Eugene Fink. Automatic evaluation and selection of problem-solving methods:Theory and experiments. Journal of Experimental and Theoretical Artificial In-telligence, 16(2):73–105, 2004.

[GGHM04] Cormac Gebruers, Alessio Guerri, Brahim Hnich, and Michela Milano. Makingchoices using structure at the instance level within a case based reasoning frame-work. In Jean-Charles Régin and Michel Rueher, editors, CPAIOR, volume 3011of Lecture Notes in Computer Science, pages 380–386. Springer, 2004.

[GH02] H. Guo and W. Hsu. A survey of algorithms for real-time bayesian network in-ference. In Proceedings of the Eighteenth National Conference on Artificial Intel-ligence (AAAI-02), Edmonton, Alberta, Canada, July 28 - August 1 2002. AAAIPress.

[GH04] Haipeng Guo and William H. Hsu. A learning-based algorithm selection meta-reasoner for the real-time mpe problem. In Proc. of the 17th Australian Confer-ence on Artificial Intelligence (AUS-AI 2004), pages 307–318, Cairns, Australia,December 4-6 2004.

[GHBF05] Cormac Gebruers, Brahim Hnich, Derek G. Bridge, and Eugene C. Freuder. UsingCBR to select solution strategies in constraint programming. In Héctor Muñoz-Avila and Francesco Ricci, editors, Proc. of the Sixth Conference on Case Based

22

Reasong (ICCBR-05), volume 3620 of Lecture Notes in Computer Science, pages222–236, Chicago, IL, August 23-26 2005. Springer.

[GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Pat-terns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.

[GL00] Fred Glover and Manuel Laguna. Fundamentals of scatter search and path relink-ing. Control and Cybernetics, 29:653–684, 2000.

[Glo89] F. Glover. Tabu search – Part I. ORSA Journal on Computing, 1(3):190–206,1989.

[GM04] A. Guerri and M. Milano. Learning techniques for automatic algorithm port-folio selection. In Proc. of the 16th Biennial European Conference on ArtificialIntelligence (ECAI 2004), pages 475–479, Valencia, Spain, August 2004.

[GS97] C.P. Gomes and B. Selman. Algorithm portfolio design: Theory vs. practice. InDan Geiger and Prakash P. Shenoy, editors, Proc. of the Thirteenth Conferenceon Uncertainty in Artificial Intelligence (UAI-97), pages 190–197, Providence, RI,August 1-3, 1997 1997. Brown University, Morgan Kaufmann.

[GS01] Carla Gomes and Bart Selman. Algorithm portfolios. Artificial Intelligence Jour-nal, 126:43–62, 2001.

[GSC97] Carla P. Gomes, Bart Selman, and Nuno Crato. Heavy-tailed distributions incombinatorial search. In Gert Smolka, editor, Proc. of the Third InternationalConference on the Principles and Practice of Constraint Programming (CP97),Lecture Notes in Computer Science 1330, pages 121–135, Linz, Austria, October29 - November 1 1997. Springer.

[Guo03] H. Guo. Algorithm Selection for Sorting and Probabilistic Inference: A MachineLearning-Based Approach. PhD thesis, Department of Computing and Informa-tion Sciences, Kansas State University, August 2003.

[HDH+99] Adele Howe, Eric Dahlman, C. Hansen, A. vonMayrhauser, and M. Scheetz. Ex-ploiting competitive planner performance. In Proceedings of Fifth European Con-ference on Planning (ECP-99), Durham, England, September 1999.

[HHHLB06] Frank Hutter, Youssef Hamadi, Holger H. Hoos, and Kevin Leyton-Brown. Per-formance prediction and automated tuning of randomized and parametric algo-rithms. In Principles and Practice of Constraint Programming (CP-06), page toappear, 2006.

[HLH97] B. A. Huberman, R. Lukose, and T. Hogg. An economics approach to hardcomputational problems. Science., 275(5296):51–54, January 3 1997.

[Hoo95] J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33–42,1995.

[HRG+01] Eric Horvitz, Yongshao Ruan, Carla P. Gomes, Henry Kautz, Bart Selman, andDavid Maxwell Chickering. A bayesian approach to tackling hard computational

23

problems. In Proc. of the Seventeenth Conference on Uncertainty in Artificial In-telligence (UAI-01), pages 235–244, Seattle, WA, August 2001. Morgan KaufmannPublishers.

[HS04] Holger Hoos and Thomas Stuetzle. Stochastic Local Search. Morgan Kaufmann,San Francisco, CA, 2004.

[KGV83] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated an-nealing. Science, Number 4598, 13 May 1983, 220, 4598:671–680, 1983.

[KHR+02] Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes, and Bart Selman.Dynamic restart policies. In Proceedings of the Eighteenth National Conferenceon Artificial Intelligence, Edmonton, Alberta, July 2002. AAAI Press.

[LBNA+03] Kevin Leyton-Brown, Eugene Nudelman, Galen Andrew, Jim McFadden, andYoav Shoham. Boosting as a metaphor for algorithm design. In Francesca Rossi,editor, Proc. of the 9th International Conference on the Principles and Practiceof Constraint Programming (CP 2003), Lecture Notes in Computer Science 2833,Kinsale, Ireland, September 29 - October 3 2003. Springer.

[LBNS02] Kevin Leyton-Brown, Eugene Nudelman, and Yoav Shoham. Learning the em-pirical hardness of optimization problems: the case of combinatorial auctions. InPascal Van Hentenryck, editor, Proc. of the 8th International Conference on thePrinciples and Practice of Constraint Programming (CP 2002), volume 2470 ofLecture Notes in Computer Science, Ithaca, NY, September 9-13 2002. Springer.

[LSZ93] M. Luby, A. Sinclair, and D. Zucherman. Optimal speedup of las vegas algorithms.Information Prcesssing Letters, pages 173–180, 1993.

[Mit97] Tom Mitchell. Machine Learning. McGraw-Hill, 19997.

[NLBH+04] Eugene Nudelman, Kevin Leyton-Brown, Holger Hoos, Alex Devkar, and YoavShoham. Understanding random SAT: Beyond the clauses-to-variables ratio. InMark Wallace, editor, Proc. of the 10th International Conference on the Principlesand Practice of Constraint Programming (CP 2004), Lecture Notes in ComputerScience 3258, Toronto, Canada, September 27 - October 1 2004. Springer.

[Ric76] J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65–118,1976.

[SGS00] J. Singer, I.P. Gent, and A. Smaill. Backbone fragility and the local search costpeak. Journal of Artificial Intelligence Research, 12:235–270, 2000.

[SK75] H.A. Simon and J.B. Kadane. Optimal problem-solving search: All-or-none solu-tions. Artificial Intelligence, 6:235–247, 1975.

[TVBV04] G. Tsoumakas, D. Vrakas, N. Bassiliades, and I. Vlahavas. Using the K nearestproblems for adaptive multicriteria planning. In George A. Vouros and ThemisPanayiotopoulos, editors, Proc. on the Methods and Applications of Artificial In-telligence in the Third Helenic Conference on AI (SETN 2004), Lecture Notes inComputer Science 3025, Samos, Greece, May 5-8 2004. Springer.

24

[VTBV03] D. Vrakas, G. Tsoumakas, N. Bassiliades, and I. Vlahavas. Learning rules foradaptive planning. In Proceedings of the 13th International Conference on Auto-mated Planning and Scheduling (ICAPS03), pages 82–91, Trento, Italy, June 9-132003.

[VTBV05] D. Vrakas, G. Tsoumakas, N. Bassiliades, and I. Vlahavas. Intelligent Techniquesfor Planning, chapter Machine Learning for Adaptive Planning, pages 90–120.IDEA GROUP PUBLISHING., 2005.

[WBWH02] J.P. Watson, L. Barbulescu, L.D. Whitley, and A.E. Howe. Contrasting structuredand random permutation flow-shop scheduling problems: Search space topologyand algorithm performance. INFORMS Journal on Computing, 14(2):98–123,2002.

[WF05] Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning toolsand techniques. Number ISBN 0-12-088407-0. Morgan Kaufmann, San Francisco,2nd edition, 2005.

[WM95] David H. Wolpert and W. G. Macready. No free lunch theorems for search.Technical Report SFI-TR-02-010, Sante Fe Institute, Sante Fe, NM, July 1995.

25

Mark Roberts October 16, 2006 - Colorado State Universitymcrob/publications/RobertsResearch... ·...

Documents

Transcript of Mark Roberts October 16, 2006 - Colorado State Universitymcrob/publications/RobertsResearch... ·...