Mark Roberts October 16, 2006 - Colorado State Universitymcrob/publications/RobertsResearch... ·...

26
Harnessing Algorithm Bias A Study of Selection Strategies and Evaluation for Portfolios of Algorithms Mark Roberts October 16, 2006 Search algorithms are biased because many factors related to the algorithm design – such as rep- resentation, decision points, search control, memory usage, heuristic guidance, and stopping criteria – or the problem instance characteristics impact how a search algorithm performs. The variety of al- gorithms and their parameterized variants make it difficult to select the most efficient algorithm for a given problem instance. It seems natural to apply learning to the algorithm selection problem of allocating computational resources among a portfolio of algorithms that may have complementing (or competing) search technologies. Such selection is called the portfolio strategy. This research exam studies the state-of-the-art in portfolio strategy by examining five recent papers, listed below, from the AI community. The specific focus of the exam will identify the key issues in the mechanism and evaluation of the portfolio strategy. But the discussion of these papers will also include a summary of the author’s primary findings, a more specific context within its community, the implications to the specific community where the paper is published, and the implications to the larger AI community. This set of papers is representative, but not exhaustive, of recent work in portfolios. All selected portfolios model the runtime behavior of various algorithms for combinatorial problems. The papers also represent the common learning methods applied in learning a portfolio strategy: case-based, Bayesian, statistical regression, and induction. Gebruers C., Hnich B., Bridge D., and Freuder E, 2005. Using CBR to Select Solution Strategies in Constraint Programming. In Proc. of the Sixth International Conference on Case Based Reasoning (ICCBR-05), Chicago, IL., August 23-26, 2005, Lecture Notes in Computer Science 3620, Springer, 2005. pp 222-236. Carchrae, T. Beck, J.C., 2005 Applying Machine Learning to Low Knowledge Control of Opti- mization Algorithms. In Computational Intelligence, 21(4), pp. 372-387. Guo, H., Hsu, W., 2004. A Learning-based Algorithm Selection Meta-reasoner for the Real-time MPE Problem. In Proc. of the 17th Australian Joint Conference on Artificial Intelligence (AUS- AI 2004), Cairns, Australia, December 4-6, 2004. Lecture Notes in Computer Science 3339, Springer 2004. pp. 307-318. Vrakas, D., Tsoumakas, G., Bassiliades, N., and Vlahavas, I., 2003. Learning Rules for Adaptive Planning. In Proc. of the 13th International Conference Automated Planning and Scheduling, Trento, Italy, June 9-13, 2003. pp. 82-91. Horvitz, E., Ruan, Y. , Gomes, C., Kautz, H., Selman, B., and Chickering, M., 2001. A Bayesian Approach to Tackling Hard Computational Problems. In Proc. of the 17th Conference on Uncer- tainty in AI., University of Washington, Seattle, Washington, USA, August 2-5, 2001. Morgan Kaufmann 2001. pp 235-244.

Transcript of Mark Roberts October 16, 2006 - Colorado State Universitymcrob/publications/RobertsResearch... ·...

  • Harnessing Algorithm BiasA Study of Selection Strategies and Evaluation for Portfolios of Algorithms

    Mark Roberts

    October 16, 2006

    Search algorithms are biased because many factors related to the algorithm design – such as rep-

    resentation, decision points, search control, memory usage, heuristic guidance, and stopping criteria

    – or the problem instance characteristics impact how a search algorithm performs. The variety of al-

    gorithms and their parameterized variants make it difficult to select the most efficient algorithm for

    a given problem instance. It seems natural to apply learning to the algorithm selection problem of

    allocating computational resources among a portfolio of algorithms that may have complementing (or

    competing) search technologies. Such selection is called the portfolio strategy.

    This research exam studies the state-of-the-art in portfolio strategy by examining five recent papers,

    listed below, from the AI community. The specific focus of the exam will identify the key issues in the

    mechanism and evaluation of the portfolio strategy. But the discussion of these papers will also

    include a summary of the author’s primary findings, a more specific context within its community, the

    implications to the specific community where the paper is published, and the implications to the larger

    AI community.

    This set of papers is representative, but not exhaustive, of recent work in portfolios. All selected

    portfolios model the runtime behavior of various algorithms for combinatorial problems. The papers also

    represent the common learning methods applied in learning a portfolio strategy: case-based, Bayesian,

    statistical regression, and induction.

    • Gebruers C., Hnich B., Bridge D., and Freuder E, 2005. Using CBR to Select Solution Strategies

    in Constraint Programming. In Proc. of the Sixth International Conference on Case Based

    Reasoning (ICCBR-05), Chicago, IL., August 23-26, 2005, Lecture Notes in Computer Science

    3620, Springer, 2005. pp 222-236.

    • Carchrae, T. Beck, J.C., 2005 Applying Machine Learning to Low Knowledge Control of Opti-

    mization Algorithms. In Computational Intelligence, 21(4), pp. 372-387.

    • Guo, H., Hsu, W., 2004. A Learning-based Algorithm Selection Meta-reasoner for the Real-time

    MPE Problem. In Proc. of the 17th Australian Joint Conference on Artificial Intelligence (AUS-

    AI 2004), Cairns, Australia, December 4-6, 2004. Lecture Notes in Computer Science 3339,

    Springer 2004. pp. 307-318.

    • Vrakas, D., Tsoumakas, G., Bassiliades, N., and Vlahavas, I., 2003. Learning Rules for Adaptive

    Planning. In Proc. of the 13th International Conference Automated Planning and Scheduling,

    Trento, Italy, June 9-13, 2003. pp. 82-91.

    • Horvitz, E., Ruan, Y. , Gomes, C., Kautz, H., Selman, B., and Chickering, M., 2001. A Bayesian

    Approach to Tackling Hard Computational Problems. In Proc. of the 17th Conference on Uncer-

    tainty in AI., University of Washington, Seattle, Washington, USA, August 2-5, 2001. Morgan

    Kaufmann 2001. pp 235-244.

  • Contents

    1 Introduction: Harnessing Search Algorithm Bias 2

    1.1 Diversifying Search Bias: A Portfolio Approach . . . . . . . . . . . . . . . . . . 2

    2 An Abstract Portfolio Model 3

    2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3 Selection Mechanisms: A Detailed Look 8

    3.1 Performance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3.2 Parallel Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.3 Switching Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.4 Predictive Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.4.1 Static Predictive Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.4.2 Dynamic Predictive Portfolios . . . . . . . . . . . . . . . . . . . . . . . . 13

    4 Evaluating Portfolio Performance: A Detailed Look 14

    4.1 Predictive Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.2 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.3 Solution Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.4 Hybrid Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    5 Crosscutting Themes 18

    6 Closing: Beyond Bias 20

    7 Acknowledgments 21

    1

  • Bias and prejudice are attitudes to be kept in hand, not attitudes to be avoided.

    Charles Curtis

    1 Introduction: Harnessing Search Algorithm Bias

    Search algorithms are biased. Each algorithm has a specific way of “looking” at a problem

    that shapes how it can (or cannot) find solutions. Further, each problem instance has unique

    features that distinguish it from other instances. Both the algorithm design and the specific

    problem instance impact search algorithm performance. Most realistic problems – practical

    problems for which there are measurable societal or cost benefits to solving – are difficult

    because of either the size of the search space or the inability to find an efficient algorithm.

    The variety of algorithms and their parameterized variants make it difficult to select the most

    efficient algorithm for a given problem instance.

    Bias can be added by incorporating problem specific knowledge at the decision points of

    the search: in the choice of representation, the transition operators, selecting which branch to

    explore next, allowing stochasticity, choosing a restart policy, choosing variable and/or value

    bindings, and selecting the appropriate heuristic selection or heuristic parameters are just a

    few. Other forms of dynamic knowledge can be incorporated into search as it runs. Meta-

    heuristic methods allow search to learn between successive restarts of local search methods.

    For example, some methods learn the critical components of all solutions (as in the so-called

    backbone [SGS00]). Other examples include keeping an elitist list of the best solutions to

    encourage localized restarts (as in path relinking [GL00]), storing a list of recent moves to

    avoid redundant cycling (as in Tabu Search [Glo89]), or making randomized moves from the

    current position with decreasing probability (as in Simulated Annealing [KGV83]).

    Bias can also be added in how the search algorithm makes its decisions. Some decisions

    can be completely arbitrary with respect to problem information. A classic example of such

    non-information is the use of stochasticity as in Las Vegas and Monte Carlo algorithms. More

    recently, Gomes, Selman and Crato [GSC97] proved that stochastic algorithms can remove

    infinite moments in the runtime distributions of search on combinatorial problems.

    The bewildering array of algorithm design choices can be daunting to the uninitiated –

    and the above listing just touched on the most surface level design decisions. A change to a

    single decision can have dramatic impact on the search cost. Additionally, it is unclear how to

    understand potential interactions and dependencies between the varieties of knowledge. Each

    method adds a particular kind of bias to the search, and managing that bias to create a more

    efficient search remains a central question for researchers in Artificial Intelligence.

    1.1 Diversifying Search Bias: A Portfolio Approach

    Portfolios are a well-known way to hedge against significant performance differences among

    a set of choices given a finite resource such as time. Adding problem specific information to

    search introduces bias. Human practitioners frequently manage such bias by acquiring a kind

    of black art for applying AI algorithms to new problems. Human acquisition of this art is

    costly (often taking years of study) and often lacks scientific rigor because the knowledge is

    primarily learned by trial-and-error rather than guided by scientific discourse and deep insight.

    2

  • It seems natural to apply learning to the algorithm selection problem of allocating computa-

    tional resources among a portfolio of algorithms that may have complementing (or competing)

    search technologies. Such selection is called the portfolio strategy. Algorithm portfolios can be

    classified as fitting one of three types. Parallel portfolios orchestrate two or more algorithms

    on a set of parallel processors. Switching portfolios counterpoint a suite of algorithms on a

    single, serial processor. Predictive portfolios select the single algorithm with the strongest forte

    that runs to completion.

    Rice [Ric76] formulated what he called the “Algorithm Selection Problem.” He examines

    the key issues for learning a function approximation of algorithm performance in order to

    select the best algorithm for a new problem input. This early work was a foundation for

    our abstract portfolio model we present in Section 2. Rice’s original formulation remained

    somewhat obscured and hidden for the next 20 years.

    Algorithm portfolios (re)surfaced in 1997 with the examination of restart strategies for

    stochastic search algorithms by Gomes and Selman [GS97] and Huberman, Lukose, and Hogg

    [HLH97] and also with the combination of complex algorithms for automated planning by

    Howe et al. [HDH+99]. Since that time, the algorithm portfolio approach has received wider

    attention across the sub-disciplines of Artificial Intelligence. The specific focus of this exam is

    to answer two questions as they relate to these more recent portfolios:

    1. Given a set algorithms and performance models of those algorithms, what selection mech-

    anism will effectively choose between them?

    2. Given a portfolio, how does one evaluate its performance?

    To help frame the context and focus of the report, we present an abstract model of portfolio

    design that encompasses the literature in this study. In describing the model we point out

    how the features and algorithms are combined to make performance models. We then break

    into a discussion of how each portfolio addresses the two questions. Interspersed throughout

    the paper are related critical remarks and open questions; these remarks will be highlighted

    by slanted text. Before closing, we present some of the cross-cutting themes of research in

    portfolio selection and evaluation.

    2 An Abstract Portfolio Model

    Portfolio construction can be divided into three activities: First, a set of reliable features are

    selected for the performance models. Next, some form of (ad hoc or principled) model selection

    refines and combines the models for use by the portfolio. Last, the portfolio performs algorithm

    selection upon presentation of a new problem. An important auxiliary step is evaluating the

    output of the model, though it should be clear that evaluation occurs at each stage of the

    portfolio construction.

    There are important considerations to make in each of the three activities. To aid in

    understanding how these activities interrelate, we provide an abstract portfolio in Figure 1.

    Formally, we define a portfolio strategy as an ordered tuple 〈F ,A,S〉, where F is a set of

    features the portfolio uses to model algorithm performance (Figure 1 in purple), A is the set

    of algorithms (Figure 1 in blue), and S is the selection mechanism (Figure 1 in red). The

    3

  • FeaturesModel

    Selection

    Problems

    FeatureSelection

    Algorithms

    Performance Models

    Individual

    ...

    Selection Mechanism

    PortfolioParallel

    Portfolio

    PortfolioPredictive

    Switching

    Aggregate

    AlgorithmSelection

    Figure 1: A diagram of an abstract portfolio strategy. The problem instances (in green) informthe features and performance models that are used. Features (in purple) may be subjected tofeature selection before their use in the performance models. The algorithms (in blue) mayprovide some features and are used in the final stage of the selection mechanism. Finally,the selection mechanism is composed of the performance models, the model selection, and thealgorithm selection. The selection mechanism works with the specific portfolio architecture(shown on the right).

    features and algorithms are combined into performance models that are used by the selection

    mechanism, which consists of those performance models, the model selection, and the algorithm

    selection. This model combines and extends the two existing models in the literature of Guo

    and Hsu [GH04] and Rice [Ric76]. In the following subsections, we briefly discuss the features

    and algorithms as well as highlight some important considerations as they relate to portfolio

    construction. Because it is a primary focus of the exam, we more thoroughly discuss the

    selection mechanisms in Section 3.

    2.1 Features

    The features in a portfolio are employed by the selection mechanism to predict algorithm

    performance. Each feature f ∈ F is specified as an ordered tuple f = 〈fc, fd, ft〉, where fcdenotes the feature extraction cost as problem size grows, fd indicates whether the feature is

    a static or dynamic feature, and ft indicates what the feature measures (problem distribution,

    problem instance structure, algorithm progress, or the heuristic in use).

    We can broadly classify feature cost according to classical complexity analysis as problem

    size increases. Static features are easily extracted from the problem definition. Examples of

    simple static features include the number of objects in the problem or the number of goals

    in the problem; such features are usually constant or linear. Other features might extract

    information from the structural relationships in problem instance; such features are usually

    polynomial. A structural feature in SAT would be the average degree of the connectedness of

    clauses and variables as used by Nudelman et al. [NLBH+04].

    A more detailed discussion of all 128 features across all portfolios is outside the scope of

    this paper as it would necessitate a discussion of the problem domains (some features can

    only make sense in the context of the domain). Table 1 shows the aggregate counts of feature

    4

  • fcPortfolio Constant Linear Polynomial NP-probe TOTALCarchrae and Beck (2005) 1 1Gebruers et al. (2005) 12 12Guerri and Milano (2004) 4 4Guo and Hsu (2004) 2 4 6Horvitz et al. (2001) 43* 43Kautz et al. (2002) 2 2Leyton-Brown et al. (2003) 25* 25Vrakas et al. (2004) 22 13 35

    TOTAL 15 28 85 128

    Table 1: A summary of the feature complexity for the 128 features of portfolios represented.The values with asterisk indicate worst case bounds for all features because the authors werevague in their descriptions. The portfolios by Gomes and Selman [GS01] and Fink [Fin04] didnot use features for their runtime models.

    ftPortfolio Distribution Instance Algorithm Heuristic TOTAL

    Carchrae and Beck (2005) 1 1Gebruers et al. (2005) 12 12Guerri and Milano (2004) 4 4Guo and Hsu (2004) 6 6Horvitz et al. (2001) – – – 43Kautz et al. (2002) 1 1 2Leyton-Brown et al. (2003) 22 3 25Vrakas et al. (2004) 22 4 9 35

    TOTAL – – – 128

    Table 2: A summary of the feature types for the 128 features of portfolios represented.The values with dashes indicate estimates of features for which the authors provided vaguedescriptions. The portfolios by Gomes and Selman [GS01] and Fink [Fin04] did not use featuresfor their runtime models.

    5

  • complexity by portfolio and Table 2 shows a summary of the primary feature types used

    across the portfolios. The final features used in a portfolio are usually a subset of the original

    features. The authors often begin with a wide selection of features and then refine the list

    as the portfolio models mature. For this reason, many features of the final models are of the

    flavor f = 〈fc = polynomial, fd = static, ft = structural〉.

    A key question for feature extraction is how to balance model accuracy with feature ex-

    traction cost. Clearly the model accuracy depends upon the kinds of features used to build the

    model. As stated above, most authors select efficient (low-cost) features to build their models.

    It is a common assumption that low-cost features are sufficient to build accurate models. For

    this reason, there is a bias against using NP-probe features as evidenced in Table 1.

    A promising future direction is to use more sophisticated probing features. Nudelman et

    al. [NLBH+04] and Hutter et al. [HHHLB06] both present some features that are extracted

    from algorithms that attempt to reduce the problem to its combinatorial core. Sometimes

    these simplifying steps actually solve the problem. More sophisticated features may cost more

    to extract, so they really should only be used when necessary. Leyton-Brown et al. [LBNA+03]

    propose “smart feature computation” that partitions the features according to their extraction

    cost. Models are constructed for each feature partition, and more costly features are avoided

    until the portfolio needs more accurate predictive models.

    Another key issue is the degree to which features should be problem specific in order to

    obtain accurate models. Some features in the literature are so problem specific as to preclude

    their use in any other problem. Such features might advance specific knowledge of the given

    problem but fail to provide clues guiding better insight into producing more predictive models

    for other problems. Problem-specific features also imply that a domain expert has recommend

    suitable features. So the problem of building good algorithms has turned into the problem of

    building good algorithm models; domain expertise is still needed but has just transfered to a

    different application. Carchrae and Beck [Car05] propose using “low-knowledge” features for

    their models in an effort to eliminate the transfer of expertise from algorithm design to algo-

    rithm modeling. Their features are designed to problem-independent and they only measure

    an algorithms progress relative to other algorithms.

    2.2 Algorithms

    Each algorithm a ∈ A is also defined as a tuple, a = 〈as, ac, at, an〉, where as specifies the

    algorithm setup (static or parameterized), ac denotes whether the algorithm is complete (exact)

    or incomplete (approximate), at indicates the primary search type (sampling, local search,

    search, stochastic, etc.), and an denotes the algorithm’s common name (Gibbs, ant colony,

    Tabu, etc.). If the algorithm is parameterized, then a number after as indicates the number of

    possible parameter combinations; for example parameterized(3) means there are three variants

    of the same algorithm.

    Table 3 summarizes the kinds of algorithms used for each portfolio. Many of the algorithms

    are designed for specific problems, but there are some common themes we can draw from the

    list. One design is to pair a complete algorithm with an approximate algorithm. This choice

    makes sense if some instances are easily solved but other instances are intractable. A perfect

    example of this is the clause-to-variable ratio in SAT: some instances will be quickly found

    6

  • Portfolio as ac at anCarchrae and Beck (2005) static E LS Tabu Search

    static E search Constructive using texturestatic E search Limited Discrepancy Search

    Fink (2005) param(3) A search Partial Order with Backward ChainingGebruers et al. (2005) param(12) A search Constraint Satisfaction Problem (CSP)Gomes and Selman (2001) static E SS randomized backtracking CSP

    static E SS randomized B&B MIPGuerri and Milano (2004) static E search IP-based Branch and Bound

    static A search IP-based with Linear Relaxationstatic A hybrid IP-based with Linear Relaxation plus CSP

    Guo and Hsu (2004) static E search CliqueTreePropogationstatic A sampl Stochastic Sampling: Gibbsstatic A sampl Stochastic Sampling: Forward Samplingstatic A hybrid Ant Colony Optimizationstatic A LS Hill-climbing with Restartsstatic A LS Tabu Search

    Horvitz et al. (2001) static A SS Stochastic SATz with Noisestatic A SS Stochastic CSP with Alldiff constraints

    Kautz et al. (2002) static A SS Stochastic CSPstatic A SS SATz-Rand

    Leyton Brown et al. (2003) static E search CPLEX by ILOGstatic E search Branch and Boundstatic E search Branch and Bound with a non-LP heuristic

    Vrakas et al. (2004) param(864) A search Weighted A*

    Table 3: The algorithm types used in each portfolio (left column) from the central papers.Column as denotes the algorithm setup of static or parameterized (param). Column ac denotesthe algorithm completeness as exact (E ) or approximate (A). Column at denotes the algorithmsearch technique as sampling (sampl), search, hybrid, local search LS, stochastic search (SS ),or stochastic local search (SLS ). Column an denotes the common name of the algorithm.

    to have a contradiction or solution using backtracking search while problem instances near

    the phase transition of this parameter could take very long to solve and might be best solved

    by stochastic or approximate methods. Another design pairs algorithms of a similar type

    (stochastic or exact). A third design uses algorithms with markedly different search techniques

    (for example, Carchrae and Beck [Car05]).

    Leyton-Brown et al. [LBNA+03] use the metaphor of “Boosting” (from ensemble learning)

    to elucidate some key issues in selecting algorithms and designing new algorithms. The authors

    present some general techniques to perform per-instance algorithm selection and discuss how to

    evaluate their new approach. A main contribution of the paper is positing that algorithm design

    should produce algorithms that focus on solving those problems which are most difficult for

    the current set of algorithms. This is similar to the boosting paradigm, where each successive

    learner receives a training set increasingly biased toward including misclassified instances. New

    algorithms for a portfolio should focus on the most difficult problem instances.

    The kinds of algorithms available to the portfolio correlates with the problem and the

    purpose of the portfolio. Some problems have a long research lineage and may have many

    algorithms. There may even be favored algorithms for solving particular instance families. In

    7

  • Authors Portfolio type Model Selection

    Gomes and Selman (2001) Parallel Agg: Statistical

    Guo and Hsu (2004) Predictive Static Hyb: Decision TreeVrakas et al. (2004) Predictive Static Agg: Rule-based systemGebruers et al. (2005) Predictive Static Agg: CBRVrakas et al. (2004) Predictive Static Agg: kNNGuerri and Milano (2004) Predictive Static Agg: Decision TreeCarchrae and Beck (2005) Predictive Dynamic Agg: BayesianFink (2005) Predictive Dynamic Ind: Statistical Best gain

    Gomes and Selman (2001) Switching Agg: Statistical Balance mean/varianceHorvitz et al. (2001) Switching Ind: Bayesian Weighted accuracy/probabilityKautz et al. (2002) Switching Ind: Bayesian Expected runtimeCarchrae and Beck (2005) Switching Ind: Cost Improvement Reinforcement Learning

    Table 4: Selection mechanisms with their corresponding models.

    such a case, it might be valuable to add only the algorithms that resolve a particular family

    of instances that no other algorithm can solve. On the opposite end of the spectrum, a new

    problem may only have a few algorithms and there may be scant conventional wisdom about

    solving the problem. In this case, the portfolio may highlight the difficult instances and drive

    further insight into algorithm design. The portfolio then becomes a vehicle for gaining insight

    into the dependencies between algorithms and the problems.

    2.3 Selection

    The selection mechanism is the centerpiece of the portfolio. It combines the features and

    algorithms to decide on an appropriate allocation of resources for the portfolio. We devote the

    entire next section to our discussion of Selection.

    3 Selection Mechanisms: A Detailed Look

    Selection mechanisms are situated to provide the kind of decision needed according to the

    portfolio architecture. There is a single selection mechanism in each portfolio that is composed

    of a set of performance models, a model selection process, and an algorithm selection process.

    In a parallel portfolio, the selection mechanism “recommends” a distribution of algorithms

    across multiple processors. In a switching portfolio, the selection controls a set of serially

    running algorithms by dynamically (re)allocating runtime as more information arrives. In a

    predictive portfolio, the selection mechanism simply predicts the best algorithm that will run

    to completion. The key distinction between predictive and switching portfolios is whether a

    single prediction is made or whether multiple iterations of the decision are made. Predictive

    selection mechanisms can be further split by the inclusion of runtime features (a dynamic

    model) or only structural features (a static model). Table 4 shows the variety of available

    selection mechanisms along with the papers that use them. Some authors compare multiple

    selection mechanisms in their papers, so they are represented more than once in the table.

    8

  • 3.1 Performance Models

    Though it is true that ultimately the algorithms themselves are “selected,” the selection mech-

    anism must use some information to actually choose the algorithms. Thus it is more accurate

    to say that that algorithm selection is performed on a set of performance models that sit be-

    tween the algorithms and the selection decision. These models relate algorithm performance

    to specific problems or problem distributions and are typically built from one or more features

    delineating algorithm performance. As mentioned above, sometimes features are segmented

    by extraction cost to create increasingly more accurate, but also more costly, models. Other

    times, the models use the lowest cost features that yield reasonably accurate performance.

    Model selection is an important design decision in portfolio construction. Figure 1 shows

    the two kinds of model selection commonly used in the literature. Initial models in the litera-

    ture used explicit models of each algorithm; we denote these as M(A1)..M(An); individualized

    models are used by the portfolios with “Ind:” in the Model column of Table 4. The selection

    mechanism needs to chose the algorithm with the most probable success or earliest runtime

    as predicted by the models. A common method is to rank the algorithms according to the

    prediction rate of their performance model and probabilistically select the best ranking. For

    example, Gebruers et al. [GHBF05, GGHM04] use the prediction rate and execution time. Sim-

    ilarly, Carchrae and Beck [Car05] weight the algorithms based on their previous performance.

    A related method explicitly models the random variable associated with the runtime of the

    algorithm. These statistical models are used by Gomes and Selman [GS01] and Fink [Fin04].

    Model selection for these statistical models is still done using ranking or weighted random

    procedure.

    An alternative model selection uses an aggregate predictive model, denoted M(A1..An).

    The aggregate model is a single model that selects the best algorithm based on past learning.

    The selection mechanism for an aggregate model selection focuses on distinguishing between

    algorithms by implicitly modeling the differences in performance. For example, Guo and

    Hsu [GH04] use a decision tree to choose among multiple approximate algorithms based on a

    set of features. Aggregate model selection is used by the portfolios denoted with “Agg:” in

    the Model column of Table 4.

    The individualized models predict various values depending on their use by the selection

    mechanism. Some models perform regression on runtime as a continuous value, some classify

    time as a discretely binned value, and others broadly characterize the predicted runtime as

    binary (short or long). At an even higher level of abstraction, some models simply predict

    whether the algorithm will succeed or fail. Portfolios are not bound to a single modeling type.

    One portfolio by Guo and Hsu [GH04, GH02] uses both types of modeling (aggregate and

    individual). The portfolio first uses an individual model to predict whether the exact algorithm

    will succeed. If the answer is “no” then an aggregate model selects the best approximate model

    based on problem instance features.

    Two important considerations for building performance models are minimizing model cost

    as well as minimizing model complexity. Each model takes time to extract features, collect

    runtime data, learn the data, and use to predict a new value. For example, dynamic features

    are costly than structural features because algorithms must run to extract the features for both

    learning and prediction. Throughout the literature, model costs remain somewhat obscured

    9

  • by the metrics and discussions in these papers. Carchrae and Beck [Car05] produce effective

    control based on low-knowledge features, but they exclude a discussion of model construction

    time. Fink [Fin04] clearly identifies the model cost for his portfolio, as does Leyton-Brown

    et al. [LBNA+03]. Most authors are biased toward efficient features or toward incorporating

    feature cost analysis is biased toward cheaper features. It doesn’t make sense to spend the

    effort learning if the learning time would be better spent on searching. But the trade off is not

    always clear.

    A related problem to model cost is model complexity. If the performance of two mod-

    els is equivalent then the simpler model should be preferred over the complex; this is fre-

    quently referred to in the scientific community as Occam’s Razor. The portfolio by Horvitz et

    al. [HRG+01] incorporates model accuracy in the selection process – as the accuracy increases,

    so does the confidence in the prediction. Leyton-Brown et al. [LBNA+03] partition features

    to bias model construction toward lower complexity. Carchrae and Beck [Car05] examine low-

    cost models of runtime selection under the conjecture that high-cost models actually impair

    technology transfer. But they only evaluate this conjecture by building low-cost models. A

    more complete analysis needs to examine when it is appropriate to prefer more complex models

    to outperform low-cost models.

    Though we are mostly focused in this paper on portfolios that learn per-instance perfor-

    mance models, learning problem distributions are also an important part of some portfolios.

    It is fairly well known that problem instances within a problem class can be distinct. A clas-

    sic example of differences in problem instances is the phase transition in SAT. Cheeseman,

    Kanefsky, and Taylor [CKT91] show a phase transition in solvability as the ratio of clauses to

    variables is varied. Another example is the job correlated and machine correlated jobs in the

    flow shop problem studied by Watson et al. [WBWH02]. In such cases, modeling each dis-

    tinct distribution might lead to more accurate predictions than a single model could produce.

    Distribution modeling is a useful approach if some prior information about the problem distri-

    butions is already known. This issue is addressed by Horvitz et al. [HRG+01], Leyton-Brown

    et al. [LBNA+03], and Gomes and Selman [GS01].

    In the next three subsections, we take a closer look at the three kinds of portfolios and

    the selection mechanisms they use. We begin by examining parallel portfolios that distribute

    algorithms on a set of parallel processors.

    3.2 Parallel Portfolios

    An early work on portfolios by Gomes and Selman [GS97, GS01] examined a portfolio consisting

    of allocating two algorithms across 2, 5, 10, and 20 processors. The two algorithms are chosen

    because one dominates early in search and the other dominates later in search. They find that

    the best portfolio mix is highly dependent on the runtime distributions of the algorithms on

    specific problems. A key contribution of this portfolio is to characterize algorithm runtimes

    in terms their mean and variance (also called risk) and to show the trade-off associated with

    these two metrics. Huberman, Lukose, and Hogg [HLH97] explored this trade-off as well.

    Parallel portfolios have received less recent attention since they were introduced in the

    literature. This type of portfolio architecture will become increasingly important as computer

    architectures become more parallel and as more work extends our knowledge of stochastic algo-

    10

  • rithms. A recent book by Hoos and Stutzle [HS04] brings together many aspects of stochastic

    local search algorithms, but not a huge amount of work has applied these advances in portfolio

    learning. For now, the parallel portfolio design remains an open question that needs more

    exploration.

    3.3 Switching Portfolios

    A switching portfolio uses predictive modeling to allocate resources (computation time) to two

    or more algorithms over a series of iterations; portfolios that perform restarts are included in

    this type. One view is that the algorithms are competing with one another for time. The

    iterative nature of the portfolio adds another layer of learning to the selection process that

    must balance exploitation of known information and exploration of new states. The competitive

    paradigm of a switching portfolio requires that algorithms can be compared on equal footing.

    An early model for switching portfolios is given in Howe et al. [HDH+99]. This portfolio uses

    individual algorithm models based on linear regression to predict the runtime for a problem.

    The algorithms are then ranked according to the ratio PA(p)lA(p)

    , where PA(p) is the probability

    of algorithm A solving problem p and lA(p) is the predicted runtime of the planner solving p.

    Simon and Kadane [SK75] prove that this ranking minimizes the expected length of time of

    running a set of algorithms in series until one succeeds.

    Gomes and Selman [GS97, GS01] use a selection mechanism that calculates the trade-off

    associated with the expected runtime and variance (also called risk) of running a combination

    of algorithms on a serial processor. They analytically determine the best restart strategy that

    minimizes both metrics. In contrast to the parallel portfolio mentioned in Section 3.2, this

    serial portfolio reveals the impact of context switching on a serial process; as the number of

    simultaneous algorithms running on the processor increases, the mean runtime also increases.

    This is an expected behavior that results from running a “parallel” process on a serial processor.

    However, the experiment also highlights the need for determining an architecture dependent

    strategy.

    Horvitz et al. [HRG+01] model two algorithms with individual Bayesian Network models

    as described by Chickering, Heckerman, and Meek [CHM97]. They allocate time to the algo-

    rithms based on a weighted sum of model accuracy and probability of success. The goal is to

    minimize the expected number of choice points, which correlates linearly with search cost. The

    networks are trained to predict short or long runs. Kautz et al. [KHR+02] extend the work of

    Horvitz et al. [HRG+01] by identifying a class of provably optimal dynamic restart policies. In

    particular, they examine the case of dynamic restarts for dependent runs. They also empiri-

    cally evaluate their polices as compared to another optimal policy described by Luby, Sinclair,

    and Zucherman [LSZ93]. We explore this optimal policy below in our discussion of evaluation

    (Section 4).

    Carchrae and Beck [Car05] present a control method that measures improvement of three

    algorithms by their cost of improvement per second (a single, low-knowledge feature). For

    N iterations and a time limit per iteration ti, the total time limit for the portfolio is T =∑1≤i≤N ti. In each iteration, the three algorithms are randomly ordered and allocated time

    proportional to their weights, which are adjusted based on the progress in previous iterations:

    wi+1(a) = αpi(a) + (1 − α)wi(a), where a is the algorithm, α is the learning rate and pi is

    11

  • the current (normalized) value of the performance of the algorithm for iteration i. Algorithm

    weights are uniformly initialized to 1/|A|, and no weight updating is done until the second

    iteration.

    An important issue for switching portfolios is when one should stop an algorithm. As Fink

    notes, “for most artificial-intelligence search algorithms, the probability of solving a problem

    within the next second usually declines with the passage of search time” [Fin04, 10]. But

    choosing an effective stopping (or restart) strategy is challenging because it might also be the

    case that the algorithm is close to a solution but has not actually located it. Restart strategies

    remain an open area of research in algorithms; we discuss this more fully in the Section 4.

    The switching portfolio by Carchrae and Beck [Car05] allows the transfer of the best solution

    between algorithms. Clearly, such transfer is possible only if the algorithms share the similar

    representation or if there is a well-defined mapping between differing representations. It is well

    known that algorithm progress depends on the connectedness of the search space - that is, the

    way in which the algorithm can move from one solution (or solution distribution) to another.

    Changing algorithms may mean changing representations as well as search operators; this

    change implies modifying the connectedness and solution adjacency of the search landscape.

    The literature characterizing the connectedness of the search space could help extend switching

    portfolios and is an open area for research. There may be features of each landscape that could

    “inform” the performance models. It might also be possible to characterize solution adjacency

    so that a portfolio could change representations. An open question in this area is know when

    such a change would be beneficial.

    3.4 Predictive Portfolios

    A predictive portfolio makes one-time decision between two or more algorithms. A distinction

    can be made between models that do not use runtime features (static models) and those that

    do (dynamic models).

    3.4.1 Static Predictive Portfolios

    Case-based systems provide a straightforward way to model performance and guide selection. A

    predictive portfolio by Gebruers et al. [GHBF05] selects among a 12 variants of a parameterized

    depth first search algorithm. Given a new instance, the case-based model finds the closest

    matching previously learned case and predicts the performance using that case.

    Vrakas et al. [VTBV05, TVBV04] present another case-based system that selects the best

    configuration for a new problem instance based on the nearness of that instance to previously

    seen cases. The selection is based on a weighted combination of the seven closest neighbors

    (k-nearest neighbors (kNN) where k = 7). The portfolio is intended as a baseline portfolio

    for evaluation, but it can actually be considered another selection mechanism. The authors

    use batch training to populate the tables, but they do point out that the method could be

    sequentially constructed as each instance arrives.

    Two disadvantages of instance learning are the storage and retrieval of the previously

    learned knowledge as well as the issue of dimensionality. The storage size and retrieval time

    are both linear in the number of training samples because every point must be compared to

    12

  • the sample using a distance metric. For small samples the storage is not so much of an issue,

    but a large number of instances with many features can lead to an instance database that is

    overwhelming. An example of this is the system by Vrakas et al. [VTBV05] where they need to

    store values for 35 features for 864 different variants of their algorithm for every single training

    instance. Clearly representation of the instances is a major design decision.

    Another issue is how well the space is covered by the instances and can lead to the so-

    called curse-of-dimensionality. Each additional feature has implications for effective retrieval.

    Reducing dimensionality by feature selection or making the space more manageable by weighing

    the instances are two ways to cope with the instance space. Mitchell [Mit97, 297] discusses

    this topic in relation to instance-based learning.

    Inductive models are well-suited to generalizing discrete decisions such as whether an algo-

    rithm will succeed (a binary decision) or which algorithm might be best (an |A|-ary decision).

    Guo and Hsu stage selection as two decision trees selecting between an exact algorithm and

    incomplete algorithms. First a single decision tree has learned to predict whether the exact al-

    gorithm will succeed given several polynomial features. If the answer is negative, then a second

    decision tree selects the best approximate search algorithm from five variants that operate with

    distinct search mechanisms. Guerri and Milano [GM04] construct a decision tree model that

    selects between an integer programming or hybrid constraint programming approaches. The

    authors refine the 25 features of Leyton Brown et al. [LBNS02] down to 4 essential features.

    Rule-based systems are another inductive model similar to decision trees; in fact, many

    rule-based algorithms begin by building a decision tree then converting the tree into a set

    of rules. Vrakas et al. [VTBV05] present a rule-based system for automated planning that

    chooses the best parameters among 864 variants of a parameterized planning system. The

    rules are constructed by a rule-based system that induces classification rules by boosting the

    rule learner. The classifier predicts “good” and “bad” runs.

    Inductive models provide a way to overcome the storage limitations of case-based systems

    mentioned above. Since the models learn general performance patterns, they are a reasonable

    way to model performance. However, there are some issues with learning inductive models.

    First, the most common models assume discrete distribution over the target values; in the case

    of runtime this introduces potential model errors. Leyton-Brown et al. [LBNA+03] discuss

    classification/regression issues related to predicting runtime. They point out that the error

    metric for classification may not appropriately penalize errors. For example, most classifica-

    tion algorithms will equally penalize off-by-one-bin errors and off-by-many-bins errors. But

    appropriate regression models will penalize large misclassification as worse than small. Even

    with this valid criticism, it seems possible to build reasonably accurate models that can be

    used as “weak learners” in a portfolio to identify the more difficult problems.

    3.4.2 Dynamic Predictive Portfolios

    Carchrae and Beck evaluate a predictive portfolio based on a Bayesian model to predict the

    best algorithm out of three. The portfolio is given a time limit (in seconds) T = |A|tp + tr,

    where tp indicates how long to run each algorithm to collect informative runtime features and

    tr indicates how long to run the selected algorithm. For example, if |A| = 3, T = 1200, and

    tp = 90 then tr = 1200 − 3(90) = 930; the authors empirically identified that the best tp is 90

    13

  • seconds.

    Fink [Fin04, Fin98] explores the application of statistical selection among three search

    algorithms for the PRODIGY planner. Similar to Carchrae and Beck [Car05], this portfolio

    uses a model that is trained on-line. Model selection is done by choosing the algorithm with

    the highest payoff as measured by the difference between a reward predicted runtime (see

    Section 4.2).

    One key question in dynamic portfolios is how long one should run a predictive phase, tp,

    of the algorithms. This is a critical question because one may be faced with an overall time

    limit as in the portfolio above. The only method tried in the literature was to empirically

    evaluate the best choice. But it seems reasonable to consider that one might automatically

    learn a value for tp. For example, one might use either the statistical methods of Horvitz et

    al. or the optimal bounds provided by Luby et al. to arrive at a value for tp by assuming the

    runs are capped at a value that is some fraction of the total time T .

    4 Evaluating Portfolio Performance: A Detailed Look

    There are three ways that current portfolios are assessed: 1) using prediction accuracy of the

    aggregate model, 2) using runtime, or 3) using solution quality. Comparison is usually done to

    an optimum. In the case of prediction accuracy, the best is implied to be 100 percent accuracy.

    For solution quality and runtime, comparison can be made to an empirical best solution from

    any algorithm, or to the best given by a straightforward approach, or to a provable optimum

    given specific assumptions.

    Some authors create hybrid metrics by combining runtime and solution quality. There can

    be ambiguity if time is critical in one domain but solution quality is imperative in another, and

    these two metrics may even be competing for some problem domains. Consider a scheduling

    domain where one can arrive at a (vastly) suboptimal solution by relaxing constraints (such as

    the the minimum makespan) then further refine the solution at the cost of runtime. It might

    make sense to combine the both metrics to allow for a portfolio that finds the best solutions

    as quickly as possible.

    Designating a clear winner is not always easy and the difference between algorithm perfor-

    mances might be so slight as to have negligible impact on the overall portfolio performance.

    For example, some problem instances may be easily solved by two of the three algorithmic

    approaches. In such a case, it might be preferable to allow a “band of success” within which

    algorithms are recognized as reasonably competitive. In this fashion, the key issue is avoiding

    (all of) the inefficient algorithms and preferring (any of) the efficient algorithms. To accom-

    plish the “band of success” some authors allow a deviation – which is often a factor – about the

    optimum around which a portfolio is said to be competitive. In fact, Gebruers et al. [GHBF05]

    organize their results with the factor above optimum as the dependent variable in their plots,

    which show that an algorithm’s relative performance changes as the requirement of choosing

    the very best algorithm is relaxed.

    14

  • 4.1 Predictive Accuracy

    Guerri and Milano [GM04] also use prediction accuracy to evaluate their system which uses a

    decision tree to select between two algorithms. Guo and Hsu also evaluate the exact models

    using classification accuracy.

    Horvitz et al. [HRG+01] compare their results in two ways depending on what is being

    measured. For models that attempt to predict the runtime (short, long) of two different

    algorithms, they compare the classification accuracy of the model against the marginal model

    and they also compute the average log score for comparison.

    4.2 Runtime

    Gebruers et al. [GHBF05, GGHM04] evaluate their method using two metrics: prediction rate

    and total (summed) execution time. They convincingly argue that prediction rate alone is

    misleading because: “there would be little utility to a technique that picked the best strategy

    for 90% of the instances if these were ones where solving time was short but which failed to

    pick the best strategy for the remaining 10% of instances if these were ones where the solving

    time exceeded the total for the other 90% [GHBF05, 7].” They compare the performance of

    the winner-take-all, random, and weighted random algorithms.

    The dynamic measure of gain given by Fink [Fin04] is intended to capture the amount of

    progress that the selection algorithm provides. A more complete description of the metric is

    provided in later work by Fink [Fin03]. Gain is defined as the difference between a reward (R)

    and predicted runtime (time) of an algorithm. The intuition is that the selection mechanism

    is paying for runtime and that this runtime counts against any reward. For example, the gain

    will be R − time if the algorithm is successful, but it will be −time if the algorithm hits the

    time bound (because R = 0). Clearly, it is preferable to select algorithms for which R > time.

    Gomes and Selman [GS01] compare portfolios using a combined measure of expected run-

    time and variance, also called risk. The combined metric provides more information than mean

    runtime alone and gives a sense of the trade-off between runtime and risk.

    For the switching portfolio evaluation, Horvitz et al. [HRG+01] show that the expected

    runtime of a selection mechanism incorporating model accuracy will return a lower expected

    runtime over the optimal “no-information” bound given by Luby et al. [LSZ93].

    The earlier work of Luby et al. [LSZ93] on Las Vegas algorithms proved an optimal bound

    of a universal strategy for the expected runtime under the two assumptions: 1) that little or

    no information is known about the runtime of algorithm, A, and 2) that we are only running

    algorithm A on a specific problem p ∈ P until we obtain a single answer [LSZ93]. They present

    what they call an optimal universal strategy, Suniv, characterized as:

    Suniv = (1, 1,2, 1,1,2,4, 1,1,2,1,1,2,4,8, 1,1,2,1,1,2,4,1,1,2,1,1,2,4,8,16, 1, . . .).

    If we denote the expected runtime of an algorithm A on a problem p ∈ P as lA(p), then they

    prove that the expected runtime of Suniv is O(lA(p) lg(lA(p))) – that is, the runtime of the

    universal policy is within a logarithmic factor of the optimal policy that could be obtained

    with exact knowledge of the runtime distribution for algorithm A.

    15

  • More extensive comparison to Luby’s optimal bounds is also done by Kautz et al. [KHR+02].

    The authors point out that the two key assumptions of the optimal bounds are typically vio-

    lated. They show that it is possible to achieve dynamic runtime bounds – bounds that include

    information about the runtime or problem distribution – that are more accurate than the Uni-

    versal log-optimal policy and the fixed optimal policy. They also add information by allowing

    a decision tree to predict the distribution from which the problem came prior to calculating

    the bounds. All bound methods dominate the median runtime distributions, and the more

    informed dynamic bounds dominate the those of Luby, Sinclair, and Zucherman [LSZ93].

    4.3 Solution Quality

    Carchrae and Beck [Car05] compare all portfolios (predictive and switching) to the best single

    winner-take-all algorithm using two metrics that characterize the average amount an algorithm

    finds worse or better solutions than the best or worst solution. The Mean Relative Error (MRE)

    states the average degree an algorithm underperformed the best. Given a set of independent

    runs, R, on a set, K, of instances, let c(a, k, r) denote the best solution found by a particular

    algorithm a in instance k during run r. Also let c∗(k) indicate the best solution found by any

    algorithm for instance k. Then,

    MRE(a,K,R) =1

    |R|

    1

    |K|

    r∈R,k∈K

    c(a, k, r) − c∗(k)

    c∗(k),

    Another metric, the Mean Fraction Best (MFB), describes how often an algorithm found

    the best solution. Given a set of independent runs, R, on a set, K, of instances, let best(a,K, r)

    be the set of solutions for k ∈ K where c∗(k) = c(a, k, r) during run r. Then,

    MFB(a,K,R) =1

    |R|

    r∈R

    |best(a,K, r)|

    |K|,

    Results for each algorithm on the training problems show that no algorithm dominates any

    other across all problems. The results highlight two important details. One, a winner-take-all

    approach will under-perform a learned model that is able to identify the best algorithm for an

    instance. Two, the results partly justify spending the effort trying to learn a selection model

    provided it beats winner-take-all strategy.

    Guo and Hsu [GH04] evaluate the approximate model by grouping the data into three sets

    (large, medium, small) according to the structure of the problem and then measure the overall

    MPE measure; MPE is similar to a log-likelihood in that larger values are more preferable.

    Evaluation according to MPE reveals that the approximate model selection outperformed all

    other single algorithms except multi-start hill climbing on the medium problems. On a set of

    13 realistic problems, the exact algorithm selector correctly predicts the exact algorithm for 11

    of them yielding 100% accuracy. The remaining two problems appear to be predicted correctly

    by the approximate algorithm selector. Incidentally, the three-fold division of the problems

    after the learning is a curious evaluation step because it could have easily been included as a

    feature for the approximate model.

    16

  • 4.4 Hybrid Metrics

    One hybrid metric by Vrakas et al. [VTBV05] relates the normalized solution length, S, and

    runtime, T , of the parameterized search engine. Formally, for the ith problem and jth search

    configuration, they calculate the normalized solution length, Snormij as Smini /Sij and the nor-

    malized solution time, T normij = Tmini /Tij . If the search algorithm timed out at 60 seconds,

    then Snormij and Tnormij are set to zero. The evaluation metric, Mij , combines these values as

    a weighted sum of these two normalized metrics Mij = wsSnormij + wtT

    normij . The authors fail

    to clearly state the weight values, but it appears they weight them equally. With this metric,

    the authors compare the performance of the rule configured planner against a kNN algorithm

    (k = 7), the best average configuration and the best per-instance “oracle” configuration. In

    earlier work, Vrakas et al. [VTBV03] also compared the parameterized planner to the best five

    static configurations, though there are less parameters for that earlier planner.

    Discussion of portfolio evaluation in the literature often focuses on evaluating the portfolio

    after it is constructed. This view is very limited because evaluation – critical evaluation –

    actually happens at almost every stage of portfolio construction: in the feature design and

    selection, the algorithm design and selection, the problem selection and instance creation, the

    model creation and selection, and finally in the algorithm selection. Much of the literature

    is focused on providing a descriptive model of the design decisions. As a start, descriptive

    models are reasonable way to proceed into a new area of research. However, such models fail

    to advance our understanding of why the decisions are interesting. A more informative model

    would justify why such design decisions are important or relevant by producing predictive

    models of the algorithms and portfolios.

    An illustrative example might help explain the concept of how a predictive model could

    further research more than a descriptive model. We will select a recent portfolio paper that

    had many positive contributions to portfolio research; the criticism we apply to this one paper

    could be easily found in other papers. Carchrae and Beck [Car05] describe a portfolio that uses

    low-knowledge (low-cost) features to model algorithm performance. They implicitly assume,

    but do not actually evaluate, that high-cost features are unnecessary to build reasonably ac-

    curate models. It would be somewhat straightforward to begin with a clear statement of this

    assumption as a research hypothesis, build low-cost and high-cost models, then statistically

    evaluate the hypothesis. A clear methodology allows subsequent researchers to build on the

    knowledge. If only low-cost features are needed for reasonable models, it would be worthwhile

    if such a hypothesis was confirmed as a matter of clear research in any portfolio research paper.

    There is an appalling lack of evaluation metrics in the literature, and many of the existing

    metrics lack measures of statistical significance. This isn’t to say that there are not good models

    to follow. More recent work, particularly that of Fink [Fin04] and Carchrae and Beck [BF04],

    incorporates measures of significance into the evaluation; other portfolio analyses rely on the

    reader to infer statistical significance. The lack of variety in the metrics could result from

    portfolios only recently gaining attention. But there are certainly a variety of existing metrics

    that would be easy to borrow or modify from other areas of research. For example, Witten

    and Frank [WF05] discuss the different measurements one can use to evaluate classification

    algorithms in Machine Learning.

    The hybrid metrics seem like an important step in this direction, especially as humans and

    17

  • computers work together in more combined environments and the algorithms are required to

    balance multiple objectives in a problem. Portfolios are one way to humanize machine intelli-

    gence through approaches that incorporate mixed-initiative and multi-objective evaluation as

    well as approaches that have an anytime quality, where the algorithm(s) produce a feasible

    solution quickly but then improve on the solution as search progresses. One can picture a

    portfolio that uses different search algorithms depending on the specific evaluative needs of

    the user. The array of literature on hybrid metrics and their metrics for evaluation could (and

    probably should) be brought to use in portfolio research.

    One could get creative and incorporate metrics from other fields as well. For example,

    information retrieval metrics may be applicable to highlighting performance differences in

    terms of solution quality. One could rate the “precision” of an algorithm by how well it locates

    good solutions on problem distributions and how well it “recalls” those solutions on multiple

    runs.

    5 Crosscutting Themes

    Some themes impact all portfolios regardless of the selection mechanism or the evaluation

    metric. Hooker’s [Hoo95] criticism of “algorithmic track meets” as a poor methodology for

    advancing state-of-the-art in algorithms could also be applied to portfolios. The competitive

    testing paradigm provides a useful starting point for our discussion of the crosscutting themes.

    Selecting the “right” or “best” algorithm for a problem is challenging. As mentioned in

    the introduction, there is still a great deal of black art to algorithm design and portfolio

    construction. A portfolio seems like a straightforward way to automate the artform. But

    stopping there does no further advance our understanding than training a new generation of

    researchers with the endeavor to compete in the track meet with more efficient algorithms.

    Fortunately, the portfolio approach has merits beyond increasing the robustness or reducing

    the search cost of algorithms by harness algorithm bias. We can actually deconstruct the

    portfolio’s constituent parts to derive deeper insight into identifying the dependencies between

    the domains, heuristics, algorithms, and runtime dynamics. Leyton-Brown, Nudelman, and

    Shoham [LBNS02] take promising steps in this direction, but there is a lack of such effort

    in most portfolio research. A frequent approach seems to be of the very kind that Hooker

    criticized - an algorithmic track meet where the portfolio is simply another algorithm that

    does better. To advance knowledge about heuristic search, we need to look deeper into the

    portfolio. As Leyton-Brown et al. [LBNA+03] point out:

    Since the purpose of designing new algorithms is to reduce the time that it will take

    to solve problems, designers should aim to produce new algorithms that complement

    an existing portfolio (p. 9).

    An excellent way to complement an existing approach is to identify the complement, hypothe-

    size that the complement will add to the portfolio, verify that the complementing algorithm did

    (or did not) actually enhance the portfolio, and finally explain why the change was observed.

    As a side-note, one can get carried away with a vision of an ever increasingly robust

    portfolio. It might at first appear that the portfolio approach is headed toward creating a

    18

  • General Problem Solver that inspired early AI researchers. But worst-case complexity results

    given by Guo [Guo03] as well as No Free Lunch proofs by Wolpert and Macready [WM95]

    preclude Algorithm Selection as a panacea.

    Clearly another issue that one must keep in mind for any meta-learning method is whether

    the performance boost justifies the effort. Beck and Frueder [BF04] examine the best-possible

    gain for improving a search algorithm and found that some cases may not warrant the pro-

    grammatic effort it would take to produce better performance. Recent advances in stochas-

    tic algorithms and restart strategies deal with heavy-tailed phenomena in runtime very well.

    Though it might not be nearly as creative to simply add a random number generator to an

    algorithm rather than create a whole portfolio, it could be the very thing that is needed. Clear

    analysis and hypothesis driven insight lay an appropriate foundation for determining the value

    of harnessing algorithm bias with a portfolio approach.

    There seems to be little effort to move this technology toward application in the business

    world. In part, this could be the novelty of the approach; it appears that portfolios still need

    some answers to the above questions before the technology is completely applicable. Then

    again, perhaps this is a naive view. Successful portfolios are leading the way in understanding

    the rich substructure of problems. In tandem with research on algorithm design, portfolios may

    well lead to greater insight into the inner workings of various search processes. The design of

    an effective portfolio’s allows for pragmatic solutions that follow with scientific research. A

    portfolio also abstracts away algorithms in a manner similar to the Strategy design pattern

    canonized by Gamma et al. [GHJV95]. Under such a pattern, the inherent modularity of a

    portfolio allows speedier adoption of new algorithms into an existing piece of software.

    The issue of time limits surfaces in many areas of portfolio construction and application:

    when to quit altogether, when to restart, and how long to run an algorithm for prediction. In

    setting an appropriate time length for individual algorithms, many researchers find that the

    best algorithm changes over time. Presumably this has to do with when in the search process

    a particular algorithm does better. The total time of the portfolio is closely related to the time

    limit of its individual algorithms as well as any on-line modeling costs. It isn’t always clear

    how much time it took to construct the models or whether this time should even be reported;

    it is certainly impressive when the portfolio, including modeling time, has a better average

    time than the best single algorithm. More work needs to be done in exploring the issue of time

    limits for portfolios.

    Two relatively unexplored areas of research in portfolios is how much information to transfer

    between algorithms and how much learning algorithms themselves might perform as they

    search. Solution transfer is rare because many algorithms use a unique solution representation

    unshared by other algorithms; translation between solutions is impossible or undefined. Only

    one portfolio used solution transfer, but other portfolios may have benefited from such a

    approach. Adding solution transfer makes a portfolio appear similar to algorithms that perform

    global search then refine the solution with some form of local search. Solution transfer also

    begins to look like some forms of memory that adaptive algorithms use, which may beg the

    question of simply including adaptive algorithms in the portfolio. Certainly it is possible

    to consider constructing portfolios using meta search techniques. But it is not clear how a

    portfolio might best interact with complex algorithms that incorporate their own learning and

    19

  • branching techniques. The issue is only cursorily discussed in the portfolio literature, but

    should probably get more attention.

    6 Closing: Beyond Bias

    We began our discussion by pointing out that selecting an appropriate algorithm is a sort of

    black art. We then looked at a way to harness algorithm bias through a portfolio approach.

    We examined several portfolios from the literature to get a sense of the different ways that

    researchers have harnessed algorithm bias; in particular, we focused on two questions related

    to portfolio strategy:

    1. Given a set algorithms and performance models of those algorithms, what selection mech-

    anism will effectively choose between them?

    2. Given a portfolio, how does one evaluate its performance?

    With respect to the selection mechanisms, we identified three primary types (parallel,

    switching, and predictive) that closely followed the portfolio architectures. With respect to

    evaluation, we showed that authors rely on measures of runtime or solution quality to determine

    if the portfolio outperformed individual algorithms.

    Unfortunately, we haven’t really solved the issue of the “black art” that began our venture

    into this topic. Portfolios are a way to harness the bias of algorithms, and portfolios can

    be used to drive deeper insight into understanding algorithm performance. But they are not

    often used that way. More frequently, the research in portfolio approach leaves more questions

    unanswered than answered because the “art” is performed by a machine rather than a human.

    20

  • 7 Acknowledgments

    In June, July and August, the author of this report worked with Dr. Adele Howe to refine

    the scope and focus of the exam. The student asked for and received feedback about three

    revisions of the proposal from Dr. Howe.

    September 5: Mark discussed his approach to the exam with Dr. Howe; Landon Flom

    was present for this meeting. Mark expressed a particular concern about the scope of written

    portion. Dr. Howe reminded the student that the scope of this project was limited to 10

    papers and that the tone should be similar to that of a literature review in a dissertation; she

    referred Mark to Dr. Laura Barbulescu’s dissertation as a possible model. She also suggested

    that Mark spend some time putting together a list of dependencies for the exam.

    September 7: Mark sent a proposed time-line for completing the exam by mid-October.

    Dr. Howe responded that it might be a little ambitious, but otherwise approved the proposal.

    September 19: Mark showed Dr. Howe, Landon Flom and Christie XX a copy of his

    proposed outline for the paper. Dr. Howe offered some suggestions about the outline.

    September 26: Mark showed Dr. Howe his 4 page revised outline and introduction. Dr.

    Howe expressed that the introduction began a little far afield from the primary contents of the

    outline

    October 4: Mark emailed his 6000 word draft. In a meeting, Dr. Howe reviewed the overall

    structure of the document (the table of contents) and made some critique of the abstract

    portfolio model; we focused our discussion on improving the figure.

    October 8: Mark emailed a revised 7000 word draft that incorporated the suggestions from

    Dr. Howe. She reviewed the document in a meeting on October 10 to determine if the work

    was close enough to be ready to set a date for the oral presentation. We set the date of October

    31, 2006.

    October 11: To obtain peer-level feedback about the document, Mark also emailed the

    draft to Monte Lunacek, Keith Bush, Arif Albarak, Crystal Redman, Andrew Sutton, Andy

    Curtis, and Scott Lunberg. He received direct feedback from Monte Lunacek, Andrew Sutton,

    and Andy Curtis. All printed copies of the reviewers’ notes are available to the committee

    upon request.

    October 24: Mark intends to show his slides to Dr. Howe to get feedback.

    October 25: Mark plans to present a mock defense to the Graduate Student Association.

    21

  • References

    [BF04] J.C. Beck and E.C. Freuder. Simple rules for low-knowledge algorithm selection.In Jean-Charles Rgin and Michel Rueher, editors, Proc. of the First Conference onthe Integration of AI and OR Techniques in Constraint Programming for Com-binatorial Optimization Problems (CPAIOR 2004), Lecture Notes in ComputerScience 3011, pages 50–64, Nice, France, April 20-22 2004. Springer.

    [Car05] J.C. Carchrae, T. & Beck. Applying machine learning to low knowledge controlof optimization algorithms. Computational Intelligence, 21(4):373–387, 2005.

    [CHM97] David Maxwell Chickering, David Heckerman, and Christopher Meek. A Bayesianapproach to learning Bayesian networks with local structure. In Proc. of Thir-teenth Conference on Uncertainty in Artificial Intelligence (UAI-97), pages 80–89,Providence, RI, August 1997. Morgan Kaufmann.

    [CKT91] P. Cheeseman, B. Kanefsky, and W.M. Taylor. Where the Really hard problemsare. In John Mylopoulos and Raymond Reiter, editors, Proceedings of the TwelfthInternational Joint Conference on Artificial Intelligence (IJCAI-91), pages 331–337, Sydney, Australia, August 24-30 1991.

    [Fin98] Eugene Fink. How to solve it automatically: Selection among problem-solvingmethods. In Reid G. Simmons, Manuela M. Veloso, and Stephen Smith, editors,Proceedings of the Fourth International Conference on Artificial Intelligence Plan-ning Systems (AIPS-98), pages 128–136, Pittsburgh Pennsylvania, 1998. AAAI.

    [Fin03] Eugene Fink. Changes of Problem Represenation: Theory and Experiments. Stud-ies in Fuziness and Soft Computing. Springer-Verlag, 2003.

    [Fin04] Eugene Fink. Automatic evaluation and selection of problem-solving methods:Theory and experiments. Journal of Experimental and Theoretical Artificial In-telligence, 16(2):73–105, 2004.

    [GGHM04] Cormac Gebruers, Alessio Guerri, Brahim Hnich, and Michela Milano. Makingchoices using structure at the instance level within a case based reasoning frame-work. In Jean-Charles Régin and Michel Rueher, editors, CPAIOR, volume 3011of Lecture Notes in Computer Science, pages 380–386. Springer, 2004.

    [GH02] H. Guo and W. Hsu. A survey of algorithms for real-time bayesian network in-ference. In Proceedings of the Eighteenth National Conference on Artificial Intel-ligence (AAAI-02), Edmonton, Alberta, Canada, July 28 - August 1 2002. AAAIPress.

    [GH04] Haipeng Guo and William H. Hsu. A learning-based algorithm selection meta-reasoner for the real-time mpe problem. In Proc. of the 17th Australian Confer-ence on Artificial Intelligence (AUS-AI 2004), pages 307–318, Cairns, Australia,December 4-6 2004.

    [GHBF05] Cormac Gebruers, Brahim Hnich, Derek G. Bridge, and Eugene C. Freuder. UsingCBR to select solution strategies in constraint programming. In Héctor Muñoz-Avila and Francesco Ricci, editors, Proc. of the Sixth Conference on Case Based

    22

  • Reasong (ICCBR-05), volume 3620 of Lecture Notes in Computer Science, pages222–236, Chicago, IL, August 23-26 2005. Springer.

    [GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Pat-terns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.

    [GL00] Fred Glover and Manuel Laguna. Fundamentals of scatter search and path relink-ing. Control and Cybernetics, 29:653–684, 2000.

    [Glo89] F. Glover. Tabu search – Part I. ORSA Journal on Computing, 1(3):190–206,1989.

    [GM04] A. Guerri and M. Milano. Learning techniques for automatic algorithm port-folio selection. In Proc. of the 16th Biennial European Conference on ArtificialIntelligence (ECAI 2004), pages 475–479, Valencia, Spain, August 2004.

    [GS97] C.P. Gomes and B. Selman. Algorithm portfolio design: Theory vs. practice. InDan Geiger and Prakash P. Shenoy, editors, Proc. of the Thirteenth Conferenceon Uncertainty in Artificial Intelligence (UAI-97), pages 190–197, Providence, RI,August 1-3, 1997 1997. Brown University, Morgan Kaufmann.

    [GS01] Carla Gomes and Bart Selman. Algorithm portfolios. Artificial Intelligence Jour-nal, 126:43–62, 2001.

    [GSC97] Carla P. Gomes, Bart Selman, and Nuno Crato. Heavy-tailed distributions incombinatorial search. In Gert Smolka, editor, Proc. of the Third InternationalConference on the Principles and Practice of Constraint Programming (CP97),Lecture Notes in Computer Science 1330, pages 121–135, Linz, Austria, October29 - November 1 1997. Springer.

    [Guo03] H. Guo. Algorithm Selection for Sorting and Probabilistic Inference: A MachineLearning-Based Approach. PhD thesis, Department of Computing and Informa-tion Sciences, Kansas State University, August 2003.

    [HDH+99] Adele Howe, Eric Dahlman, C. Hansen, A. vonMayrhauser, and M. Scheetz. Ex-ploiting competitive planner performance. In Proceedings of Fifth European Con-ference on Planning (ECP-99), Durham, England, September 1999.

    [HHHLB06] Frank Hutter, Youssef Hamadi, Holger H. Hoos, and Kevin Leyton-Brown. Per-formance prediction and automated tuning of randomized and parametric algo-rithms. In Principles and Practice of Constraint Programming (CP-06), page toappear, 2006.

    [HLH97] B. A. Huberman, R. Lukose, and T. Hogg. An economics approach to hardcomputational problems. Science., 275(5296):51–54, January 3 1997.

    [Hoo95] J. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33–42,1995.

    [HRG+01] Eric Horvitz, Yongshao Ruan, Carla P. Gomes, Henry Kautz, Bart Selman, andDavid Maxwell Chickering. A bayesian approach to tackling hard computational

    23

  • problems. In Proc. of the Seventeenth Conference on Uncertainty in Artificial In-telligence (UAI-01), pages 235–244, Seattle, WA, August 2001. Morgan KaufmannPublishers.

    [HS04] Holger Hoos and Thomas Stuetzle. Stochastic Local Search. Morgan Kaufmann,San Francisco, CA, 2004.

    [KGV83] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated an-nealing. Science, Number 4598, 13 May 1983, 220, 4598:671–680, 1983.

    [KHR+02] Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes, and Bart Selman.Dynamic restart policies. In Proceedings of the Eighteenth National Conferenceon Artificial Intelligence, Edmonton, Alberta, July 2002. AAAI Press.

    [LBNA+03] Kevin Leyton-Brown, Eugene Nudelman, Galen Andrew, Jim McFadden, andYoav Shoham. Boosting as a metaphor for algorithm design. In Francesca Rossi,editor, Proc. of the 9th International Conference on the Principles and Practiceof Constraint Programming (CP 2003), Lecture Notes in Computer Science 2833,Kinsale, Ireland, September 29 - October 3 2003. Springer.

    [LBNS02] Kevin Leyton-Brown, Eugene Nudelman, and Yoav Shoham. Learning the em-pirical hardness of optimization problems: the case of combinatorial auctions. InPascal Van Hentenryck, editor, Proc. of the 8th International Conference on thePrinciples and Practice of Constraint Programming (CP 2002), volume 2470 ofLecture Notes in Computer Science, Ithaca, NY, September 9-13 2002. Springer.

    [LSZ93] M. Luby, A. Sinclair, and D. Zucherman. Optimal speedup of las vegas algorithms.Information Prcesssing Letters, pages 173–180, 1993.

    [Mit97] Tom Mitchell. Machine Learning. McGraw-Hill, 19997.

    [NLBH+04] Eugene Nudelman, Kevin Leyton-Brown, Holger Hoos, Alex Devkar, and YoavShoham. Understanding random SAT: Beyond the clauses-to-variables ratio. InMark Wallace, editor, Proc. of the 10th International Conference on the Principlesand Practice of Constraint Programming (CP 2004), Lecture Notes in ComputerScience 3258, Toronto, Canada, September 27 - October 1 2004. Springer.

    [Ric76] J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65–118,1976.

    [SGS00] J. Singer, I.P. Gent, and A. Smaill. Backbone fragility and the local search costpeak. Journal of Artificial Intelligence Research, 12:235–270, 2000.

    [SK75] H.A. Simon and J.B. Kadane. Optimal problem-solving search: All-or-none solu-tions. Artificial Intelligence, 6:235–247, 1975.

    [TVBV04] G. Tsoumakas, D. Vrakas, N. Bassiliades, and I. Vlahavas. Using the K nearestproblems for adaptive multicriteria planning. In George A. Vouros and ThemisPanayiotopoulos, editors, Proc. on the Methods and Applications of Artificial In-telligence in the Third Helenic Conference on AI (SETN 2004), Lecture Notes inComputer Science 3025, Samos, Greece, May 5-8 2004. Springer.

    24

  • [VTBV03] D. Vrakas, G. Tsoumakas, N. Bassiliades, and I. Vlahavas. Learning rules foradaptive planning. In Proceedings of the 13th International Conference on Auto-mated Planning and Scheduling (ICAPS03), pages 82–91, Trento, Italy, June 9-132003.

    [VTBV05] D. Vrakas, G. Tsoumakas, N. Bassiliades, and I. Vlahavas. Intelligent Techniquesfor Planning, chapter Machine Learning for Adaptive Planning, pages 90–120.IDEA GROUP PUBLISHING., 2005.

    [WBWH02] J.P. Watson, L. Barbulescu, L.D. Whitley, and A.E. Howe. Contrasting structuredand random permutation flow-shop scheduling problems: Search space topologyand algorithm performance. INFORMS Journal on Computing, 14(2):98–123,2002.

    [WF05] Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning toolsand techniques. Number ISBN 0-12-088407-0. Morgan Kaufmann, San Francisco,2nd edition, 2005.

    [WM95] David H. Wolpert and W. G. Macready. No free lunch theorems for search.Technical Report SFI-TR-02-010, Sante Fe Institute, Sante Fe, NM, July 1995.

    25