Toward Interactive and Intelligent Decision Support System

download Toward Interactive and Intelligent Decision Support System

of 23

Transcript of Toward Interactive and Intelligent Decision Support System

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    1/23

    ELSEVIER

    European Journal of Operational Research IO7 ( 1998) 507-529

    EUROPEAN

    JOURNAL

    OF OPERATIONAL

    RESEARCH

    Theory and Methodology

    Multi-attribute decision making:

    A simulation comparison of select methods

    Stelios H. Zanakis aY , Anthony Solomon b, Nicole Wishart a, Sandipa Dublish

    aDecision Sciences and Information Systems Department, College of Business Administration, Florida International University,

    Miami, FL 33199, US A

    Decision & Information Science Department, Oakland University, Rochester, MI 4U309. USA

    Marketing Department, Fairleigh Dickinson IJniversiQ, Tea neck, NJ 07666, USA

    Received 7 August 1996; accepted 18February 1997

    Abstract

    Several methods have been proposed for solving multi-attribute decision mak ing problems (MAD M). A major criticism

    of MA DM is that different techniques

    may yield different results

    when applied to the same problem. The problem

    considered in this study consists of a decision matrix input of N criteria w eights and ratings of L alternatives on each

    criterion. The comparative performance of some methods has been investigated in a few, mostly field, studies. In this

    simulation experiment we investigate the performance of eight methods: ELE CTR E, TOP SIS, Mu ltiplicative Exponential

    Weighting (MEW ), Simple Additive Weighting (SAW ), an d four versions of AHP (original vs. geometric scale and right

    eigenvector vs. mean transformation solution). Simulation parameters are the number of alternatives, criteria and their

    distribution. The solutions are analyzed using twelve measures of similarity of performance. Similarities and differences in

    the behavior of these methods are investigated. Dissimilarities in weights produced by these methods become stronger in

    problems with few alternatives;

    however, the corresponding final rankings of the alternatives vary across methods m ore in

    problems with many alternatives. Although less significant, the distribution of criterion weights affects the methods

    differently. In general, all AHP versions behave similarly and closer to SAW than the other methods. ELEC TRE is the least

    similar to SAW (except for closer matching the top-ranked alternative), followed by MEW . TOPSIS behaves closer to AHP

    and differently from ELEC TRE and MEW , except for problems with few criteria. A similar rank-reversal experiment

    produced the following performance order of methods: SAW and ME W (best), followed by TOPS IS, AHPs and ELEC TRE .

    It should be noted that the ELE CTR E version used was adapted to the common MAD M problem and therefore it did not

    take advantage of the methods capabilities in handling problems with ordinal or imprecise information. 0 199 8 Elsevier

    Science B.V.

    Keywords:

    Multiple criteria analysis; Decision theory; Utility theory; Simulation

    1 Introduction

    Multiple criteria decision making (MCDM) refers

    to making decisions in the presence of multiple,

    Corresponding author. Fax: + I-305-348-4126; e-mail:

    [email protected].

    usually conflicting criteria. MC DM problems are

    commonly catego rized as continuous or discrete, de-

    pending on the domain of alternatives. Hwa ng and

    Yoon (1 981) classify them as (i) Multiple Attribute

    Decision Making (M ADM ), with discrete, usually

    limited, number of prespecifie d alternatives, requir-

    ing inter and intra-attribute compar isons, involving

    0377-2217/98/ 19.00 0 1998 Elsevier Science B.V. All rights reserved

    PII SO377-2217(97)00147-l

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    2/23

    508

    S.H. Zanakis et al. / European Journal

    o

    perational Research 107 1998) 507-529

    implicit or explicit tra deoffs; and (ii) Multiple Objec-

    tive Decision Making (M ODM ), with decision vari-

    able values to be determined in a continuous or

    integer do main, of infinite or large number of choices,

    to best satisfy the DM constraints, preferen ces or

    priorities. MA DM methods have also been used for

    combining good MO DM solutions based on DM

    preferences (Kok, 1986; Kok and Lootsma, 1985).

    In this paper we focus on MA DM which is used

    in a finite selection or choice pro blem. In litera-

    ture, the term MC DM is often u sed to indicate

    MA DM , and sometimes MO DM m ethods. To avoid

    any ambiguity we would hence forth use the term

    MA DM when referring to a discrete M CDM prob-

    lem. Met hods involving only ranking discrete alter-

    natives with equal criteria weights, like voting

    choices, will not be examined in this paper .

    Churchman et al. (1957) were among the earlier

    academicians to look at the MA DM problem for-

    mally using a simple additive weighting metho d.

    Over the years different behavioral scientists, opera-

    tional researc hers and decision the orists h ave pro-

    posed a variety of methods describing how a DM

    might arrive at a preference judgment when choosing

    among multiple attribute alternatives. For a survey of

    MC DM methods and applications see Stewart (1992)

    and Zanakis et al. (199 5).

    Gershon and Duckstein (1983) state that the major

    criticism of MA DM methods is that different tech-

    niques yield different results wh en applied to the

    same problem, apparently under the same assump-

    tions and by a single DM . Comparing 23 cardinal

    and 9 qualitative aggregation methods, Voogd (1983 )

    found that, at least 40% of the time, each technique

    produ ced a different result from any other technique.

    The inconsistency in such results occurs becau se:

    (a> the techniques use weigh ts differently in their

    calculations;

    (b) algorithms differ in their approach to selecting

    the best solution;

    cc> many algorithms attempt to scale the objec-

    tives, which affects the weights already chosen;

    (d) some algorithms introduce additional param e-

    ters that affect w hich solution will be chosen.

    This is compo unded by the inherent differences in

    experimental conditions and human information pro-

    cessing be tween DM , even under similar prefer-

    ences. Other researchers have argued the opposite;

    namely that, given a type of problem , the solutions

    obtained by different MA DM methods are essen-

    tially the same (Belton, 1986; Timmermans et al.,

    1989; Karni et al., 1990; Goicoechea et al., 1992;

    Olson et al., 1995). Schoemaker and Waid (1982 )

    found different additive utility models produce gen-

    erally different weigh ts, but predicted equally well

    on the averag e. Practitioners seem to prefer simple

    and transparent methods, which, however, are un-

    likely to represent weigh t trade-offs that users are

    willing to make (H obbs et al., 1992).

    The wide variety of available techniques, of vary-

    ing complexity and possibly solutions, confuses po-

    tential users. Several MAD M methods may appear to

    be suitable for a particular decision problem. Hence

    the user faces the task of selecting the most appropri-

    ate metho d from among several alternative feasible

    methods.

    The need for comparing MC DM methods and the

    importance of the selection problem were probably

    first recognized by MacCrimmon (1973) who sug-

    gested a taxonomy of MC DM methods. More re-

    cently several authors have outlined p rocedu res for

    the selection of an appropriate MC DM method such

    as Ozernoy (1992), Hw ang and Yoon (1981), Hobbs

    (1986), Ozernoy (1987). These classifications are

    primarily driven by the input requirements of the

    method (type of information that the DM must pro-

    vide and the form in which it must be provide d).

    Very often these classifications serve more as a tool

    for elimination rather than selection of the right

    method. The use of expert systems has also been

    advocated for selecting MC DM methods (Jelassi and

    Ozernoy, 1988).

    Our literature search rev ealed that a limited num-

    ber of works has been done in terms of comparing

    and integrating the different m ethod s. Denpon tin et

    al. (1983) developed a comprehensive catalogue of

    the different metho ds, but concluded that it was

    difficult to fit the metho ds in a classification schem a

    since decision studies varied so much in quantity,

    quality and precision of information. Many authors

    stress th e validity of the metho d as the key criterion

    for choosing it. Validity implies that the metho d is

    likely to yield choices that accurately reflect the

    values of the user (Hobbs et al., 1992). How ever

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    3/23

    S.H. Zunakis et al/European Journal of Operational Research 107 (1998) 507-529

    509

    there is no absolute, objective standard of validity as

    preferen ces can be contradictory when articulated in

    different w ays. Resea rchers often measure v alidity

    by checking how well a given method predicts the

    unaided decisions made independently of judgments

    used to fit the model (Schoemaker and Waid, 1982;

    Currim and Sarin, 1984 ). Decision scientists q uestion

    the applicability of this criterion, particularly in com-

    plex problems that will cause users to adopt less

    rational heuristics and to be inconsistent. Studies in

    decision making have shown tha t the efficiency of a

    decision made h as an inverted U shape d relationship

    with the amount of information provided (Kok, 1986;

    Gemunden and Hauschildt, 1985).

    Researchers, who have attempted the task of com-

    paring the different MA DM methods have used ei-

    ther real life cases or formulated a real life like

    problem and presented it to a selected group of users

    (Currim and Satin, 1 984; Gemunden and Hausch ildt,

    1985; Belton, 1986; Roy and Bouyssou, 1986; Hobbs,

    1986; Buchanan and Daellenbach, 1987; Lockett and

    Stratfor d, 1987; Stillwell et al., 1987; Karni et al.,

    1990; Stewart, 1992; Goicoechea et al., 1992). Such

    field experimen ts are valuable tools for comparing

    MA DM methods, based on user reactions. If prop-

    erly designed, they assess the impact of human

    information processing and judgmental decision

    making, beyond the nature of the methods employed.

    Users may compare these methods along different

    dimensions, such as perceived simplicity, trustwor-

    thiness, robustness and quality. How ever, field stud-

    ies have the following limitations and disadvantages:

    are not affecte d significantly by the choice o f

    decision maker or which of these methods is

    used. The fact that judgments were elicited from

    working professionals in one study and gradua te

    students in the other may explain partially the

    discrepancy.

    (f) It is impossible or difficult to answer questions

    like:

    1. Which method is more approp riate for what

    type of problem?

    2. What are the advantages/disadvantages of us-

    ing one method over another?

    3. Do es a decision change when using different

    methods? If yes, why and to what extent?

    The above limitations may be overco me via simula-

    tion. How ever, since they cannot ca pture hum an

    idiosyncrasies, their findings should supplement

    rather than substitute those of the field experiments.

    We have found only three simulation studies com-

    paring solely AHP type methods.

    (a) The sample size and range of problems studied

    is very limited.

    (b) The subjects are often students, rather than real

    decision makers.

    (c) The way the information is elicited may influ-

    ence the results more than the model used (Olson

    et al., 1995).

    (d) The learning effect biases outcom es, especially

    when a subject employs various methods sequen-

    tially (Kok, 1986).

    Zahedi (1986) generated symmetric AHP and

    asymmetric matrices of size 6 and 22 from uniform,

    gamm a and lognormal distributions, with muhiplica-

    tive error term. Criteria weights were derived using

    six metho ds: Right eigenvalue, row and column geo-

    metric means, harmonic mean, simple row average,

    and row averag e of columns normalized first by their

    sum (called mean transformation method). The accu-

    racy of the corresponding weight and rank estimators

    was evaluated using MAE, M SE, Variance and

    Theils coefficient. She concluded that, when the

    input matrix is symmetric, the mean transformation

    method outperformed all other methods in accuracy,

    rank preservation and robustness towa rd error distri-

    bution. Differenc es between m ethods w ere notice-

    able only under a gamm a e rror distribution, whe re

    the eigenvalue method did poorly, while the row

    geome tric mean exhibited better rank preservation

    with large-size matrix. All methods performed

    equally well (except simple row average) and much

    better when errors had a uniform than lognormal

    distribution.

    (e> Inherent human differences led Hobb s et al.

    Takeda et al. (1987) conducted an AHP simula-

    (1992) to conclude that decisions can be as or

    tion study, with multiplicative random er rors, to

    more sensitive to the method used as to which

    evaluate different eigen-weight vectors. They advo-

    person applies it. How ever, in a similar study ,

    cate using their graded eigenvector method over

    Goicoechea et al. (1992) concluded that rankings

    Saatys simpler right eigenvector approach.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    4/23

    510

    S.H. Zunak is et al./ European Journal of Operat i onal Research 107 1998) 507-529

    Triantaphyllou and Mann (198 9) simulated ran-

    dom AHP matrices of 3-21 criteria and alternatives.

    Each problem was solved using four methods:

    Weighted sum model (WSM ), weighted product

    model (WPM ), right-eigenvector AHP and AHP re-

    vised by normalizing each column by the maximum

    rather the sum of its elements, according to Belton

    and Gear (1984) suggestion for reducing rank rever-

    sals. Solutions were compared against the WSM

    benchm ark and rate of change in best alternative

    when a nonoptimal alternative is replaced by a wor se

    one. They concluded that the revised A HP appears to

    perform closest to the WSM ; AHP tends to behave

    like WS M as the number of alternatives increases;

    and that the rate of change does not depend on the

    number of criteria.

    The first two studies are limited to a single AH P

    matrix; i.e. different metho ds for deriving weig hts

    only for the criteria or only for the alternatives under

    a single criterion - not simultaneously for the entire

    MA DM problem. And all three are limited to vari-

    ants of the AH P. A further limitation of the third

    study is that it employs only two measures of perfor-

    mance: The percentag e contradiction between a

    methods rankings to WSM , and the rate of rank

    reversal of top priority. There is clearly a need fo r a

    simulation study comparing also other MA DM type

    methods, using various measures of performance.

    Our w ork in that regard is explained in the next

    section. The MA DM problem under consideration is

    depicted by the following DM matrix of preferences

    for m alternatives rated on n criteria:

    Criterion

    Alternative c, c2 . . . cJ . . . cN

    1

    rll

    rt* . . . rIj . . .

    rl N

    2

    r 21

    rz2 . . . rlj . . .

    r 2N

    i

    r

    11

    ri2 . . . rij . . . riN

    r, . .

    TL I

    rL2 . . . rLj

    . ._

    LN

    Where c, is the importance (weight) of the jth

    criterion and rij is the rating of the ith alternative on

    the jth criterion. As commonly done, w e will as-

    sume that the latter are column normalized, to also

    add to one. Different MA DM methods will be exam-

    ined for eliciting these judgments and aggregating

    them into an overall score S, for each alternative.

    Then, the overall evaluation (weig ht) of each alterna-

    tive will be W, = S,/CS,, leading to a final ranking

    of all alternatives. Develop ment of a cardinal mea-

    sure of overall prefe rence of alternatives (S;) have

    been criticized by advoc ates of outranking metho ds

    as not reliably portraying true or incomplete prefer-

    ences. Such methods establish measures of outrank-

    ing relationships among p airs of alternatives, leading

    to a comp lete or partial ordering of alternatives.

    2. Methods compared

    Of the many MA DM methods available we have

    chosen th e following five for comparison in our

    research, when applied to solve the same problem

    with the decision matrix information stated earlier:

    1.

    2

    3

    4

    5

    Simple Additive Weighting SAW): Si = Cjcjri,.

    Multiplicative Exponent Weighting MEW ): Si =

    n, rz.

    Analytic Hierarchy Process AHP) - four ver-

    sions.

    ELECTRE.

    TOPSIS Technique for Preference by Similarity

    to the Ideal Solution).

    The rationale for selection has been that most of

    these are among the more popular and widely used

    methods and each method reflects a different ap-

    proach to solve MA DM problems. SAWs simplicity

    makes it very popular to practitioners (Hob bs et al.,

    1992, Zanakis et al., 1995 ). MEW is a theoretically

    attractive contrast against SA W. Howev er, it has not

    been applied often, because of its practitioner-unat-

    tractive mathem atical concept, yet in spite of its

    scale invariant proper ty (depend s only on the ratio of

    ratings of alternatives). TOPSIS (Hwang and Yoon,

    1981 ) is an exception in that it is not widely used;

    we have included it because it is unique in the way it

    appr oache s the problem and is intuitively appealing

    and easy to understand. Its fundamental premise is

    that the best alternative, say ith, should have the

    shortes t Euclidean distance S, = [C rij - r,?)2]12

    from the ideal solution r,?, made up of the best

    value for each attribute regard less of alternative) and

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    5/23

    S.H. Zmak is et al ./ European Journal of Operat ional Research 107 1998) 507-529

    511

    the farthest distance S; = [C(rjj -

    r,: 2]/2

    from the

    negative-ideal solution (r,:, m ade up of the wor st

    value for each attribute). The alternative with the

    highest relative closeness measure S,T/

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    6/23

    512

    S.H. Zmukis et d/European Journul

    of

    Operational Research 107 19981507-529

    and U.S. Army Corps Engineers evaluate A HP,

    ELECTR E, SAW and other methods on water supply

    planning studies. The ir results were contradictory;

    the first found perceived differences across methods

    and users, while the latter study did not. Finally,

    Comes (198 9) compared ELECT RE to his method

    TODIM (a combination of direct rating, AH P

    weighting and dominance ordering rules) on a trans-

    portation problem and concluded that both methods

    produ ced essentially the same ranking of alterna-

    tives. The above findings highlight our motivation

    and justification for undertaking this simulation

    study. Our major objective was to conduct an exten-

    sive numerical comparison of several MCDA meth-

    ods, contrasted in several field studies, when applied

    to a common problem (a decision matrix o f explic-

    itly rated alternatives and criteria weigh ts) and deter-

    mine when and how their solutions differ.

    3.

    Simulation experiment

    According to Hobbs et al. (1992) a good experi-

    ment should satisfy the following conditions:

    (a) Compare methods that are widely used, repre-

    sent divergent philosoph ies of decision m aking or

    claimed to represent imp ortant m ethodolo gical im-

    provements.

    (b) Address the question of appropriateness, ease

    of use and validity.

    (c) Well controlled, uses large samples and is

    replicable.

    (d) Compares methods across a variety of prob-

    lems.

    (e) Problems involved are realistic.

    Our simulation experimen t satisfies all conditions

    except the second one.

    Computer simulation was used for the purpose of

    comparing the MA DM methods. The reason for

    using simulation was that it is a flexible and versatile

    method which allows us to generate a range of

    problem s, and replicate them several times. This

    provides a vast database of results from which we

    can study the patterns of solutions provided by the

    different methods.

    The following parameters were chosen for our

    simulation:

    1.

    2

    3

    4

    5

    Number of criteria N: 5 10 15 20.

    Number of alternatives L: 3 5 7 9.

    Ratings of alternatives rjj: randomly generated

    from a uniform distribution in O-l

    Weig hts of criteria c,: set all equal (l/N), ran-

    domly gene rated from a uniform distribution in

    O-l (std. dev. l/12) or from a beta U-shaped

    distribution in O-l (std. dev. l/24 ).

    Number of replications: 100 for each combina-

    tion, thus producing 4 criteria levels

    4

    alterna-

    tive levels X 3 weigh t distributions 100 replica-

    tions = 480 0 pro blems, resulting in a total of

    38,400 solutions, across eight approaches - four

    methods plus AHP with four versions.

    An explanation of these choice s is in order. Th e

    range for the number of criteria and alternatives is

    typical of those found in many applications. This is

    representative of a typical MA DM problem, where a

    few alternatives are evaluated on the basis of a wide

    set of criteria, as explained below. Many empirical

    studies on the size of the evoked set in the consumer

    and industrial mark et context h ave shown that the

    number of intensely discussed alternatives does not

    exceed 4-5 (Gemunden and Hauschildt, 1985). In

    practice a simple check-list of desirable features will

    rule out unacceptable alternatives early, thus leaving

    for consideration only a small number. The number

    of criteria, th ough , can be considerably higher. T hree

    distributions for weigh ts were assumed: No distribu-

    tion, i.e. all weigh ts equal to l/N (class of problems

    where criteria are replaced by judges or voters of

    equal impact); u niform distribution, which may re-

    flect an unbiased, indecisive or uninformed user; and

    a U shape distribution, which may typify a biased

    user, strongly favoring some issues while rigidly

    opposing others. Under group pressure , similar situa-

    tion may not arise often in openly supporting pet

    projects. For this reason and in order to keep this

    simulation size manageab le, we considered only one

    distribution (uniform) for ratings under each crite-

    rion.

    Additional care was taken during the data genera-

    tion phase. The ratio of any two criteria weights or

    alternative ratings should not be extremely high or

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    7/23

    S.H. Zanakis et ul. / Eurc~peun Joumul of Operutionul Reseurch 107 1998) 507-529

    513

    extremely low; this will avoid pathological cases or

    scale-induced imbalances between methods, whose

    performance then deteriorates (Zahedi, 1986). After

    some experimentation, this was set at 75 (and l/75),

    one step beyond the maximum e4 of the geometric

    AHP scale. Symmetric reciprocal matrices were ob-

    tained from these ratio entries for the AHP methods.

    No alternative was kept if it was dominating all

    others on every criterion, or if it was dominated by

    another alternative on all criteria. For each criterion,

    all weights were normalized to add up to one. Simi-

    lar normalization was applied to the final weigh ts o f

    the alternatives overall criteria in each problem. The

    AHP pairwise comparisons a,, (> 1) were generated

    by selecting the closest o riginal (Saaty) or geom etric

    scale value to the ratio c,/ci for two criteria and

    rrk/rlk for two alternative ratings under criterion k;

    and then filling the symme tric entries using the

    reciprocal ratio condition aji = l/a;,.

    for selecting SAW as the benchmark is that its

    simplicity make s it extremely popular in practice.

    For each method, the following measures of similar-

    ity were computed on its final evaluation (weig hts or

    ranks) against those of the SAW m ethod, averaged

    over all alternatives in the problem:

    1. Mean squared error of weights (MSEW ) and the

    same for ranks (M SER).

    2. Mean Absolute error of weights (M AEW ) and the

    same for ranks (M AER).

    3. Theils coefficient U for weights (UW) and the

    same for ranks CUR).

    4. Kendalls correlation Tau for weights (K WC).

    5. Spermans correlation for ranks (SRC).

    6. Weighted rank crossing 1 (WRCI).

    7. Weighted rank crossing 2 (WRC 2).

    8. Top rank matched count (TOP).

    9. Number of ranks matched, as % of number of

    The generated data were also altered subsequently

    to simulate rank reversal conditions, when a non-op-

    timal new alternative is introduced. This is a primary

    criticism of AH P and has created a long and intense

    controversy among researchers (Belton and Gear,

    1984; Saaty, 1984; Saaty, 1990; Dyer, 1990; Harker

    and Vargas, 1990; Stewart, 1992). This experimenta-

    tion was applied to each method solution and initial

    problem, say of L alternatives, as follows: (i) A new

    alternative is introduced in the problem by randomly

    generating n ratings for each criterion from the

    uniform distribution; (ii) the ranks of L + 1 alterna-

    tives in the new problem are determined; (iii) if the

    new (L + 1) alternative gets th e first rank, it is

    rejected and another alternative is generated as in

    step (ii); (iv) if the new alternative gets any other

    rank, the new rank ord er of the old alternatives is

    determined after removing the new alternative rank.

    Thus an original array o f ranks and a new array o f

    ranks are produced for each problem and method.

    These tw o rank arrays are used in computing the

    rank reversal measures.

    alternatives L (MAT CH% ).

    The reason for looking at measur es for both final

    weights and ranks is because methods may produce

    different final weigh ts for alternatives, but they can

    result in the same or different rank or der o f alterna-

    tives. Our last four measures capture this rank dis-

    agreement (crossings of rank order), of which mea-

    sures, two are giving more w eight to higher rank

    differences:

    W RC

    5

    W,R.s w R,.blETHv

    Y

    i

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    8/23

    514

    S.H. Zanakis et al. /European Journal qfOperational Research 107 II 998) 507-529

    duction of a new nonoptimal alternative (TOP ); and

    the total number or ranks not altered as a percent of

    number of alternatives (MAT CH% ) for that problem.

    Here we would like to clarify that the efficiency

    of a metho d is not merely a function of the theory

    supporting it or how rigorou s it is mathematically

    speaking. The other aspects which are also very

    important, relate to its ease of using it, user under-

    standing and faith in the results, metho d reliability

    (consistency) vs. variety. These are important and

    have been tackled by some authors (Buchanan and

    Daellenbach, 1987; Hobbs et al., 1992; Stewart,

    1992 ). Such issues can not be studied in a simulation

    experiment.

    4. Analysis of experimental results

    The simulation results were analyzed using the

    SAS package. Each measure of performance was

    analyzed via parametric ANO VA and nonparametric

    (Kruskal-W allis). The results are summa rized in

    Tables 1 and 3. The nonparametric tests reveal th at

    N, L and distribution type affect all perform ance

    Table 1

    Summary of ANOVA significance levels for factors and interactions

    measures at the 95% confidence level, except by

    distribution type for KWC , SRC , MSE R, U R, and

    marginally for MA ER, W RCl and WRC 2. Accord-

    ing to the parametric ANO VA, the number of alter-

    natives, number of criteria and metho d, as well as

    most of their interactions, affect significantly all

    measures of performance. However, the distribution

    type and few of its interactions, do not influence

    significantly four perform ance measures; namely

    KWC and UR (as was the case with the nonparamet-

    ric tests), SRC and MSER at the 95% level.

    Table 5 portrays the average performance mea-

    sure for each method, along with Tukeys studen-

    tized range test of mean differences. Perform ance

    measures on weights are not given for ELECT RE,

    since it only rankord ers the alternatives. The four

    AH P metho ds produ ce indistinguishable results on

    all measures, and they were always closer to SAW

    than the other three methods. The only exception is

    the TOP result for ELECT RE, indicating that it

    matched the top ranked alternative produced by SAW

    90% of the time, vs. 82% for the AHPs. Any differ-

    ences among the four AHP version results are af-

    fected mo re by the scale (original vs. geom etric) than

    KWC

    MATCH WRCl WRC2 SRC

    MSER MAER MSEW MAEW UW UR

    L

    0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001

    0.0001

    V

    0.0019 0.0410

    0.0373 - - 0.0607

    0.0001 0.0001 0.0001

    -

    METH

    0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001 O.oool

    0.0001 0.0001 0.0001

    0.0001

    N

    0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    o.ooo1 0.0001 0.0001

    0.0001

    L*V

    0.0001 0.0001 0.0001

    O.cOOl 0.0001 0.0001 0.0001

    0.0001 0.0001 -

    0.0001

    L*METH

    0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001

    0.0001

    N*L

    0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    0.0001 o.Oc01 0.0001

    0.0001

    V * METH

    0.0410 0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    0.0010 0.0079 0.0001

    0.0001

    N*V

    0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    0.0787 0.0577 0.0138

    0.0001

    N+METH

    0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001

    0.0001

    N*L*V

    0.0071 0.0001 0.0058

    0.0025 0.0015 0.0998 0.0155

    0.0013 0.0004 0.0001

    0.0094

    N* L*METH

    0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001

    0.0001 - -

    0.0001

    N*V*MJTH

    0.0001 0.0498

    - - 0.0503 0.0204

    - 0.0329 0.0001

    0.0253

    L* V*METH

    -

    0.0002

    0.0030 - 0.0001 0.0002

    - - -

    -

    N*L*V*h4ETH

    - -

    -: Indicates not significant result (P-value > 0.10).

    L: Number of alternatives.

    N: Number of criteria.

    V: Type of distribution = 1 equal weights; 2 uniform; 3 beta U.

    MISTH: Method = 1 Simple Additive Weighting (SAW); 2 AHP with original scale using eigenvector; 3 AHP with geometric scafe using

    eigenvector; 4 AHP with original scale using mean transformation; 5 AHP with geometric scale using mean transformation; 6 Multiplicative

    Exponential Weighting;

    7 TOPSIS; 8 ELECTR.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    9/23

    S.H. Zanaki s et al. European Journal qf Operat iona l Research 107 I 998) 507-529

    515

    Table 2

    Summary of ANOVA significance levels for factors and interactions rank reversal experiment

    MATCH

    WRCl

    WRC2 SRC

    MSER

    MAER

    L

    0.0001

    0.0001

    0.0001 0.0001 0.0001

    0.0001

    V

    METH

    N

    L*V

    L * METH

    N*L

    V * METH

    N*V

    N*METH

    N*L*V

    N*

    L*METH

    N* V*METH

    L* v*METH

    N*

    L*V*METH

    0.0001

    0.0001

    0.0001

    0.0001

    0.0001

    0.0001

    0.0001

    0.0226

    O.OOQl

    0.0055

    0.0001

    0.0146

    0.0001

    O.OCQl

    0.0001

    0.0753

    0.0001

    0.0001

    0.0001

    0.0039

    0.0001

    0.0077

    0.0001

    0.005 1

    0.0001

    0.0181

    0.0001

    0.0001

    0.0001

    O.OQOl

    O.CQOl

    0.0030

    O.oool

    0.0185

    0.0001

    0.0001

    0.0261

    0.0001

    0.0433

    0 0001

    O.OOQl

    0.0001

    0.0001

    0.0001

    0.007 1

    0.0001

    0.0126

    0.0001

    0.0796

    0.0001

    0.0001

    0.0001 0.0001

    0.0001 0.0001

    0.0001 0.0089

    0.0001 0.0001

    O.OOfll

    0.0001

    0.0001

    0.0001

    0.0006

    0.0041

    0.0001 0.0001

    0.0110 0.0161

    0.0001

    0.0001

    0.0001

    0.0004

    0.0001 0.0001

    0.0001 0.0175

    -: Indicates not significant result (P-value > 0. IO).

    L:

    Number of alternatives.

    N: Number of criteria.

    V: Type of distribution = 1 equal weights; 2 uniform; 3 beta U.

    METH: Method = 1 Simple Additive Weighting (SAW); 2 AHP with original scale using eigenvector; 3 AHP with geometric scale using

    eigenvector; 4 AHP with original scale using mean transformation; 5 AHP with geometric scale using mean transformation; 6 Multiplicative

    Exponential Weighting; 7 TOPSIS; 8 ELECTRE.

    by the solution appro ach (eigenvector vs. mean

    produ ces significantly different results from all AH P

    transformation). The latter contradicts Zahedis

    versions on all measures. MEW and ELECT RE be-

    (1986) study that examined single AH P matrices,

    have similarly in SRC and MSER, but differ accord-

    possibly due to the aggregating effect of looking at ing to MA R, UR, WRC l and WCR 2. TOPSIS dif-

    criteria and alternatives together. The MAEW for

    fers from ELECTRE and MEW on all measures; and

    each AHP version was only about 0.008, implying agrees with AHP only on SRC and UR (only for

    weights of about +0X% away from those of SAW original scale). The rankord er results o f all metho ds

    on the average. The most dissimilar method to SAW

    mostly agree with those of SAW, as indicated by

    is ELECT RE followed by MEW , and TOPSIS to a

    their high correlations (all SRC > 0.80). In light of

    lesser extent. More specifically, the MEW method

    the prior comments, SRC gives a stronger impression

    Table 3

    Summary of Kruskal-Wallis nonparametric ANOVA significance levels

    SRC

    MSER

    MAER UR

    WRCl

    WRC2 MAEW

    MSEW UW KWC

    MATCH

    Alternatives O.oool 0.0001 0.0001 0.0001

    0.0001 0.0001 0.0001 0.0001 O.OQOl O.Oc@l 0.0001

    Criteria o.ooo3 0.0006 0.0004 0.0004

    0.0010 0.0005 0.0001 0.0001 0.0001 0.0002 O.OflOl

    Distribution 0.0473 0.0151 0.0177 0.0518

    0.0260 0.0464 O.OtlOl O.oool 0.0001 0.1021 0.0234

    Method 0.0001 0.0001 0.0001 0.0001

    0.0001 O.oool 0.0001 0.0001 0.0001 0.0001 0.0001

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    10/23

    516

    S.H. Zanak is et al. European Journal of Operat ional Research 107 1998) 507-529

    Table 4

    Summary of Kruskal-Wallis nontwametric ANOVA significance levels rank reversal exoeriment

    SRC

    MSER MAER

    WRC 1

    WRC2

    MATCH

    Alternatives 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

    Criteria 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

    Distribution 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

    Method 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

    of similarity than it actually exists. F or the large

    sample sizes involved, SRC should be below 0.04

    approxim ately to imply no correlation or above 0.96

    to imply perfect rank agreement, neither o f which is

    the case here. SRC results sometimes contradicted

    those of the other rank performance measures. In

    those cases we lean towards the latter, since SRC

    does not consider rank importance , unlike our mea-

    sures WRC l and WR C2 (the former giving larger

    values than the later by design). Comparing SRC to

    WRC l or WCR 2, one may observe that although

    TOPSIS and the four AHPs have similar SRC, the

    higher WRC values imply that TOPSIS differs from

    the AHPs more in higher ranked than lower ranked

    alternatives. Similarly, E LECTR E differs from MEW

    also more in higher ranked alternatives than lower

    ones. An interesting finding is that although ELEC -

    TRE matches SAW top rank more often (90%) than

    the other methods, its match of all SAW ranks

    (MAT CH% ) is far smaller than any of the other

    methods. Many graphs w ere also drawn to further

    identify parameter value impacts, mean differences

    and important interactions. How ever, space limita-

    tions prevent showing all of them.

    ESfect of number of alternatives L): As the num-

    ber of alternatives L increases, all metho ds tend to

    produce overall weights closer to SAWs (especially

    TOPS IS). This is reflected in higher corre lations

    KWC (except for the insensitive method MEW ) and

    SRC, higher Theils UW (only for AHPs), and lower

    0.65 I

    2 3 4 5 6 7

    Method

    Fig. 1. KWC by number of alternatives.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    11/23

    S.H. Zanuki.s et al. / European Journal ~fOperutionu1 Research 107 1998) 507-529

    517

    Table 5

    Average performance measures by method and Tukeys test on differences

    SRC

    WRC I

    Methods

    Mean

    Tukey Mean

    AHP, Original, eigen 0.8967 A

    0.362 1

    AHP, Geometric, eigen 0.8992 A

    0.3507

    AHP, Original, MTM 0.8969 A

    0.3626

    AHP, Geometric, MTM 0.8992 A

    0.3500

    MEW 0.8045 B

    0.6278

    TOPSIS 0.8921 A

    0.4047

    ELECTRE 0.8078 B

    0.7267

    WRC2

    Tukey Mean

    Tukey

    D 0.3253 D

    D 0.3142 D

    D 0.3258 D

    D 0.3138 D

    B 0.5726 B

    C 0.3723 C

    A 0.686 I A

    Methods

    KWC

    Mean

    Tukey

    MSEW

    Mean

    Tukey

    MAEW

    Mean

    Tukey

    AHP, Original, eigen

    0.8257

    A

    O.OQOl7

    B 0.0085 C

    AHP, Geometric, eigen 0.8280 A

    0.00019 B 0.0087 C

    AHP, Original, MTM

    0.8257

    A

    0.ooo17 B

    0.0084 C

    AHP, Geometric, MTM 0.827 1 A

    0.00019 B 0.0087 C

    MEW 0.7329 C

    0.00074 A 0.0194 A

    TOPSIS 0.7764 B

    o.OOQ77 A 0.0158 B

    ELECTRE

    Methods

    MSER

    Mean

    Tukey

    MAER

    Mean Tukey

    uw

    Mean Tukey

    AHP, Original, eigen 0.4972 C

    0.3590 D 0.023 C

    AHP, Geometric, eigen 0.4784 C

    0.348 1 D 0.0236 C

    AHP, Original, MTM 0.4974 C

    0.3592 D 0.0232 C

    AHP, Geometric, MTM 0.4779 C

    0.3474 D 0.0235 C

    MEW 1.1820 A

    0.6376 B 0.0565 A

    TOPSIS 0.6747 B

    0.4093 C 0.0416 B

    ELECTRE 1.2132 A

    0.7250 A

    Methods

    UR

    Mean Tukey

    TOP

    Mean Tukey

    MATCH

    Mean Tukey

    AHP, Original, eigen 0.0663

    CD

    0.8215 B 0.6910 A

    AHP, Geometric, eigen 0.0647 D 0.8246 B 0.6966 A

    AHP, Original, MTM 0.0663

    CD

    0.8206

    B 0.6908 A

    AHP, Geometric, MTM 0.0646 D 0.8254 B 0.6950 A

    MEW 0.1055

    B

    0.7548

    C 0.567 1 C

    TOPSIS 0.0690 C 0.7549 C 0.6343 B

    ELECTRE

    0.1168 A

    0.9035 A 0.3537 D

    Note: The same letter (A, B, C, D) indicates no significant average difference between methods, based on Tukeys test. Letter order A to D

    is from largest to smallest average value.

    MSEW and MAE W. However, when the number of

    alternatives is large, rank discrepancies are amplified

    (to a lesser extent for TOPSIS), as evident by higher

    rank performance measures MAER, MSER , WRC l,

    WR C2 and to some extent UR. In contrast to the

    clear rank results of MAT CH% , WRC l and WRC 2,

    SRC produces mixed results as L increases; this

    demon strates further its inability to account for dif-

    ferent rank importance. ELECT RE matched the SAW

    top (all) ranked alternatives more (less) often than

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    12/23

    518

    S.H. Zanaki s et al ./ European Journal of Operat ional Research 107 1998) 507-529

    0.035

    0.03

    0.025

    3

    B

    P t

    p 0.02

    E

    t

    z

    0.015

    z

    P

    0.01

    I

    f

    OC

    2 3 4 5 6 7

    Method

    Fig. 2. MAE W by number of alternatives.

    any other method, resulting in larger WRC s, regard-

    less of the number of alternatives. The change in L

    affects each AH P version the same way. See Figs.

    1-6.

    Effect of numbe r of criteria N): Most perfor-

    mance measures (MA ER, M SER, SRC, KWC , UR,

    WRC I, WRC 2) for most methods changed slightly

    with N, but significantly according to AN OV A. This

    i.2

    .~_

    7

    +3

    +5

    L

    A-7

    9

    I

    3

    4 5

    6 7 8

    Method

    Fig.

    3. MAER by number of alternatives.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    13/23

    S.H. Zanaki s et al . European Journal of Operat io nal Research 107 1998) 507-529

    519

    0. 1 -

    I

    Fig. 4. TOP by number of alternatives.

    is because MEW and the four AHPs are hardly

    sensitive to changes in N (no change in KW C and

    all rank perform ance measure s). As the number of

    criteria N increases, the methods (especially ELEC-

    TRE but not TOPSIS) tend to produce different

    rankings of the alternatives from those of SAW, as

    documented by higher MAE R, MSE R, UR, WRCl,

    WR C2 and lower SRC; and to some extent different

    0.1

    Fig. 5. MAT CH by number of alternatives.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    14/23

    520

    S .H . Zanak i s e t a l . / Eu r opean Jou r na l o f Ope r a t i o n a l Resea r c h 107 1998 ) 507 -529

    0

    -~__

    3 4 5 6 7 8

    Method

    Fig. 6. WRCl by number of alternatives

    weigh ts of alternatives, as implied by some what

    differently from the other methods, more so in its

    smaller K WC . Howev er, differences in the final final rankings than its final weigh ts. TOPS IS rank-

    weights for alternatives were larger in problems with ings differ from those of SAW and the AHPs w hen

    fewer criteria, as proven by increased MAE W, N is large (= 20) and, to a lesser extent, when N is

    MSEW , UW and lower KW C. TOPSIS behaved small (= 5) where it behaved more like ELECT RE

    0 025

    P

    P 0.02

    L

    $

    0

    ii 0.015

    d

    J

    I

    5 0.01

    P

    0.005

    3

    4 5

    6 7

    Method

    Fig. 7. MAEW by number of criteria.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    15/23

    S .H . Zanak i s e t a l . / Eu r opean Jou r na l qf Ope r a t i o n a l Resea r c h 107 1998 ) 507 - 529

    521

    09

    0.6

    0.7

    : 0.3

    i

    0 ~~

    2 3 4 5 6 7 a

    Method

    Fig. 8. MAERE by number of criteria.

    and MEW . This is evident by its increased MA ER,

    MSER, UR , WRCl, WRC2 and reduced TOP,

    MATCH % and SRC. Again, ELECTRE matched the

    SAW top (all) ranked alternatives more (less) often

    than any other method, resulting in larger WR Cs,

    regardless of the number of criteria. The change in L

    affects each AHP version the same w ay. See Figs.

    7-11.

    EfSect of distribution of criteria weights V): It

    does not affect significantly several we ight measures

    0.9

    ,

    2 0 5

    f

    a

    t

    op .5

    L

    Fig. 9. TOP by number of criteria.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    16/23

    522

    S.H. Zmak is et al/ European Journal of Operat ional Research 107 1998) 507-529

    . . ____

    1

    0.9 i

    0.6

    0.7 --

    P

    f 0.8 --

    d

    5

    E 0.5 --

    8

    s

    e

    0.41

    0.3

    1

    0.2

    0.1

    0

    I__ --_-____~-_-_+

    1

    2

    3 4 5 6 7

    8

    Method

    Fig. 10. MA TCH by number of criteria.

    (VW,

    MAE W, MSEW - except TOPSIS), while the

    native weigh t differences between me thods. Surp ris-

    effect is mixed according to rank measures. As

    ingly, how ever, final weigh t dissimilarities between

    expected, equal criteria weights V = 1) reduce alter- methods w ere higher under the uniform than beta

    0.6

    7

    0.7 i

    _ 0.6

    e

    i

    g 0.5

    5

    i

    o.4

    E

    0.3

    0.2

    0.1

    I

    Fig. Il. WRC 1 by number of criteria.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    17/23

    S.H. Zunaki s er al . European Journal

    q f

    Operat i onal Research 107 1998) 507-529

    523

    o--.

    I

    3

    Fig. 12. MAEW by criterion weight distribution.

    distribution. In the case of AH P, the uniform distri-

    bution differentiates slightly m ore its final rankings

    and weights from SAW when using the original

    scale rather than the geometric scale. TOPSIS final

    rankings differ from those of SAW more (least)

    under the beta (equal constant) distribution. ELE C-

    TRE and MEW methods differentiate their final

    rankings more (least) under the equal constant (uni-

    form) distribution. See Figs. 12-15.

    4.1.

    ank rever sa l resu l t s

    Similar analyses were performed on the rank re-

    versa1 experimental results. Here each method results

    0 /

    3 4 5 6 7 8

    Method

    Fig. 13. MAER by criterion weight distribution.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    18/23

    524

    S.H. Zanaki s et al. European Journal

    o

    Operat i onal Research 107 1998) 507-529

    Fig. 14. TOP by criterion weight distribution

    were compared to its own (not SAW), before and

    after the introduction of a new (not best) alternative.

    The major findings are summa rized in Tables 2, 4

    and 6. The parametric and non-parametric ANO VAs

    0.9

    0.6

    07

    _

    06

    @

    ' I

    I

    0'

    0.5

    reveal that all factors (num ber of alternatives, num-

    ber of criteria, distribution and metho d), and most of

    their interactions, are highly significant (Tables 2

    and 4).

    2

    3 4

    5

    5

    7

    Method

    Fig. 15. WRCl by criterion weight distribution.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    19/23

    S.H. Zanak is et al. European Journal of Operat io nal Research 107 1998) 507-529

    525

    Table 6

    Average performance measures by method and Tukeys test on differences rank reversal experiment

    Methods

    SAW

    AHP, Original, eigen

    AHP, Geometric, eigen

    AHP, Original, MTM

    AHP, Geometric, MTM

    MEW

    TOPSIS

    ELECTRE

    SRC WRCl

    Mean Tukey Mean

    Tukey

    I.0

    A 0 D

    0.9530 C 0.1532 B

    0.9499 C 0.1595 B

    0.9560 C 0.1520 B

    0.95 1 I C 0.1610 B

    1.0 A 0 D

    0.9692 B 0.1116 C

    0.9356 D 0.2138 A

    WRC2

    Mean Tukey

    0 D

    0.1361 B

    0.1421 B

    0.1351 B

    0.1446 B

    0 D

    0.097 C

    0.1996 A

    Methods

    MSER

    Mean Tukey

    MAER

    Mean Tukey

    TOP

    Mean Tukey

    MATCH

    Mean Tukey

    SAW

    0

    AHP, Original, eigen 0.1752

    AHP, Geometric, eigen 0.1854

    AHP, Original, MTM

    0.1740

    AHP, Geometric, MTM 0.1820

    MEW 0

    TOPSIS 0.1379

    ELECTRE 0.3479

    0

    0.1522

    0.1581

    0.1515

    0.1568

    0

    0.1104

    0.2347

    1

    o

    0.9258

    0.9235

    0.9258

    0.9165

    1.0

    0.9531

    0.4402

    1 o

    0.8584

    0.8544

    0.8590

    0.855 1

    1.0

    0.9005

    0.7501

    Note: The same letter (A, B, C, D) indicates no significant average difference between methods, based on Tukeys test. Letter order A to D

    is from largest to smallest average value.

    d

    0.2

    L

    0

    i

    2 0.15

    I

    4

    5

    g 01

    0.05

    0

    1

    2

    3

    4

    5

    6

    7

    8

    Fig. 16. Rank reversal MAER by number of alternatives

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    20/23

    526

    S .H . Zuzak i s e t a l . / Eu r op ean Jou r na l qf Ope r a t i o n a l Resea r c h 107 1998 ) 507 - 529

    0.7

    z

    5 t

    B 0. 6

    9

    t

    /

    I - -

    5

    L 0.5 i

    i Z: '

    B

    -A-l

    5

    *9

    -

    . E

    B

    0.4

    t

    i

    03 1

    0.2 1

    0.1

    I

    0

    k -~--~-~~ ~~

    -c-

    ----_t------~r ~ ~~~~

    1

    2 3 4

    5

    6 7 8

    Method

    Fig. 17. Rank reversal MATCH by number of alternatives.

    As summarized in Table 6, the MEW and SAW

    methods did not produce any rank reversals, which

    was expected. The next best method was TOPSIS,

    followed by the four AHPs, according to all rank

    reversal performance measures (larger TOP,

    MATCH% , SRC, and smaller RMSER, RMAER,

    WRC l and WRC 2). The rank reversal performance

    of each AHP version was statistically not different

    03

    0.25 L

    I

    p 0.2 j

    +5

    +10

    d-15;

    .- rt20/

    1 2 3

    4 5 6 7

    Fie. 18. Rank reversal MAER bv number of criteria.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    21/23

    SH . Z u n a k i s e t a l ./ E u r o p e a n J o u r n a l

    qf

    Ope r a t i o n a l Resea r c h 107 1998 ) 507- 529

    527

    from the other three AHPs. E LECTR E exhibited the

    worst rank reversal performance of all the methods

    in this experiment, and more s o in TOP than all

    ranks (MATC H% ). The last finding should be inter-

    preted with caution, since it does not reflect E LEC -

    TRE s versatile capabilities when used directly by a

    human; it is only indicative of its restrictive ability to

    discriminate among several alternatives, based on

    prespecified threshold parameters.

    Effect of number of alt ernati ves L) on rank

    reuersal: In general, more rank reversals occur in

    problems with more alternatives. This is evident by

    lower MATCH% and higher MAER, WRCl and

    WR C2 Among AHPs. That increase was a little

    faster for the AHP with original scale and MTM

    solution. The MTM AHP has a slight advantage over

    the eigenvector AHP when there are not many alter-

    natives. Reversals of the top rank occur more often

    in problem s with more alternatives for the AH Ps, but

    fewer alternatives for ELECT RE. TOPSIS top rank

    reversals seem to be insensitive to L. See Figs. 16

    and 17.

    Effect of number of crit eri a N) on rank reversal:

    The number of rank reversals was influenced less by

    the number of criteria than by the number of alterna-

    tives. For all AH P versions, rank reversals for top

    (all) ranks remained at about 9% (14%) of

    L,

    regard-

    less of the number of criteria. Howev er, the geomet-

    ric scale in AH P see ms to reduce rank rev ersals

    when the number of criteria is small, as docume nted

    by smaller MAER and higher MAT CH% . According

    to the SRC criterion, rank reversals for TOPSIS and

    the AH Ps with original scale are not sensitive to N.

    Interestingly enough, TO PSIS exhibits its wors t rank

    reversals when N is small, while ELECTRE does

    the same when N is large. See Fig. 18.

    Effect of distr ibut ion of crit eri a w eight s V) on

    rank reversal:

    In general, more rank reversals were

    observed under constant weights, and fewer under

    uniformly distributed weigh ts. This was negligible

    for TOPSIS, but most profound on ELECT RE. See

    Fig. 19.

    5 Conclusion and recommendations

    This simulation e xperiment evaluated eigh t

    MA DM methods (including four variants of AHP)

    under different number of alternatives

    CL),

    criteria

    (N) and distributions. The final results ar e affected

    by these thr ee facto rs in that order. In general, as the

    number of alternatives increases, the metho ds tend to

    produ ce similar final weigh ts, but dissimilar rank-

    ings, and more rank reversals (few er top rank rever-

    sals for ELECT RE). The number of criteria had little

    effect on AHPs, M EW and ELECT RE. TOPSIS

    rankings differ from those of SAW m ore when N is

    V

    3 4 5 6 7 a

    Method

    Fig. 19. Rank reversal MAER by criterion weight distribution.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    22/23

    528

    S.H. Zunakis et al. /European Journal q Operational Research 107 1998) 507-529

    large, w hen it also exhibits its fewe st rank rev ersals.

    ELECT RE produces more rank reversals in problems

    with many criteria.

    The distribution of criteria weigh ts affects fewe r

    performance measures than does the number of alter-

    natives or the number of criteria. How ever, it affects

    differently the metho ds exam ined. Equal criterion

    weights reduces final weight differences between

    metho ds, it differentiates further the rankings pro-

    duced by ELECT RE and MEW , and produces more

    rank reversals than the other distributions. Surpris-

    ingly, how ever, final weigh t dissimilarities between

    methods were higher under the uniform than beta

    distribution, while the latter prod uced the fewe st

    rank reversals. A uniform distribution of criteria

    weigh ts differentiates more the AH P final rankings

    from SAW when using the original scale rather than

    the geome tric scale. Finally, a beta distribution of

    criterion weights affects more TOPSIS, whose final

    rankings differ even more from those of SAW.

    In general, all AH P versions behave similarly and

    closer to SAW than the other methods. ELECT RE is

    the least similar to SAW (except for best matching

    the top-ranked alternative), followed by the MEW

    method. TOPSIS behaves closer to AHP and differ-

    ently from ELECT RE and MEW , except for prob-

    lems with few criteria. In terms o f rank reversals, the

    four AHP versions were uniformly worse than TOP-

    SIS, but more robust than ELECTRE.

    lated beyond the type of MA DM problem considered

    in this study; namely a decision matrix input of N

    criteria weigh ts and explicit ratings of L alternatives

    on each criterion. Theref ore, metho d variations capa-

    ble of handling different problems were not consid-

    ered in this simulation. This standardization ham-

    pers ELECTRE more than any of the other methods.

    It unavoidably did not consider the variety of fea-

    tures of the many versions of this method developed

    to handle different problem types. It did not take

    advantage of the metho ds capabilities in handling

    problems with ordinal or imprecise information. Even

    in the form used here, ELECT RE may produce

    different results for different thresho lds of concor-

    dance and discordance indexes (wh ich of course

    leaves op en the question on which index sho uld th e

    user select). Finally, any MA DM metho d cannot be

    considered as a tool for discovering an objective

    truth. Such models sh ould function within a DS S

    context to aid the user to learn more about the

    problem and solutions to reach the ultimate decision.

    Such insight-gaining metho ds are better termed deci-

    sion aids rather than decision making. MA DM meth-

    ods should not be considered as single-pass tech-

    niques, without a posteriori robustness analysis. A

    sensitivity (robustness) analysis is essential for any

    MA DM method, but this is clearly beyond the scope

    of this simulation experimen t.

    The detailed findings of this simulation study can

    provide useful insights to researc hers and practition-

    ers of MA DM . A users interest in evaluating alter-

    natives may be in one or more of the final o utput,

    namely their weigh ts, ranking or rank reversals. This

    experimen t reveals when a users results are likely to

    be practically the same, regardless of the subset of

    methods employed; or when and by how much the

    solutions may differ, thus guiding a user in selecting

    an appropriate method. SAW was selected as the

    basis to which to compare the other methods, be-

    cause its simplicity make s it used often by practition-

    ers. Even some researchers argue that SAW should

    be the standard for compariso ns, because it gives

    the most acceptable results for the majority of

    single-dimensional problems (Triantaphyllou and

    Mann, 1989).

    References

    Belton, V., 1986. A comparison of the analytic hierarchy process

    and a simple multi-attribute value function. European Journal

    of Operational Research 26, 7-2 I

    Belton, V., Gear, T., 1984. The legitimacy of rank reversal - A

    comment. Omega 13, 143-144.

    Buchanan, J.T., Daellenbach, H.G., 1987. A comparative evalua-

    tion of interactive solution methods for multiple objective

    decision models. European Journal of Operational Research

    29, 353-359.

    Churchman, C.W., Ackoff, R.L., Amoff, E.L., 1957. Introduction

    to Operations Research. Wiley, New York.

    Currim, I.S., Satin, R.K., 1984. A comparative evaluation of

    multiattribute consumer preference models. Management Sci-

    ence 30, 543-561.

    Denpontin, M., Mascarola, H., Spronk, J., 1983. A user oriented

    listing of MCDM. Revue Beige de Researche Operationelle

    23, 3-11.

    Some caution, however, must be used when con-

    Dyer, J., 1990. Remarks on the analytic hierarchy process. Man-

    sidering our findings. They should not be extrapo - agement Science 36, 249-258.

  • 7/26/2019 Toward Interactive and Intelligent Decision Support System

    23/23

    S.H. Zanakis er ul. / Europeun Journd of Operutwnul Research IO? (I 998) 507-529

    529

    Dyer. J., Fishbum, P., Steuer, R., Wallenius, J., Zionts, S., 1992.

    Multiple criteria decision making, multiattribute utility theory:

    The next ten years. Management Science 38, 645-654.

    Gemunden, H.G., Hauschildt, J., 1985. Number of alternatives

    and efficiency in different types of top-management decisions.

    European Journal of Operational Research 22, 178- 190.

    Gershon, M.E., Duckstein, L., 1983. Multiobjective approaches to

    river basin planning. Journal of Water Resource Planning 109,

    13-28.

    Goicoechea, A., Stakhiv, E.Z., Li, F., 1992. Experimental evalua-

    tion of multiple criteria decision making models for applica-

    tion to water resources planning. Water Resources Bulletin 28,

    89- 102.

    Gomes, L.F.A.M., 1989. Comparing two methods for multicrite,ria

    ranking of urban transportation system alternatives. Journal of

    Advanced Transportation 23, 217-219.

    Harker, P.T., Vargas, L.G., 1990. Reply to Remarks on the

    analytic hierarchy process by J.S. Dyer. Management Sci-

    ence 36, 269-273.

    Hobbs, B.F., 1986. What can we learn from experiments in

    multiobjective decision analysis. IEEE Transactions on Sys-

    tems Management and Cybernetics 16, 384-394.

    Hobbs, B.J., Chankong, V., Hamadeh, W., Stakhiv, E., 1992.

    Does choice of multicriteria method matter? An experiment in

    water resource planning. Water Resources Research 28, 1767-

    1779.

    Hwang, C.L. Yoon, K.L., 198 1. Multiple Attribute Decision Mak-

    ing: Methods and Applications. Springer-Verlag, New York.

    Jelassi, M.T.J., Ozemoy, V.M., 1988. A framework for building

    an expert system for MCDM models selection. In: Lockett,

    A.G., Islei, G. (Eds.), Improving Decision Making in Organ-

    zations. Springer-Verlag, New York, pp. 553-562.

    Karni, R., Sanchez, P., Tummala, V., 1990. A comparative study

    of multiattribute decision making methodologies. Theory and

    Decision 29, 203-222.

    Kok, M., 1986. The interface with decision makers and some

    experimental results in interactive multiple objective program-

    ming methods. European Journal of Operational Research 26,

    96- 107.

    Kok, M., Lootsma, F.A., 1985. Pairwise-comparison methods in

    multiple objective programming, with applications in a long-

    term energy-planning model. European Journal of Operational

    Research 22, 44-55.

    Lockett, G., Stratford, M., 1987. Ranking of research projects:

    Experiments with two methods. Omega 15, 395-400.

    Legrady, K., Lootsma, F.A., Meisner, J., Schellemans, F., 1984.

    Multicriteria decision analysis to aid budget allocation, In:

    Grower, M., Wierzbicki, A.P., (Ed ), Interactive Decision

    Analysis. Springer-Verlag, pp. 164-174.

    Lootsma, F.A., 1990. The French and American school in multi-

    criteria decision analysis. Recherche Operationelle 24, 263-

    285.

    MacCrimmon, K.R., 1973. An overview of multiple objective

    decision making. In: Co&ran, J.L., Zeleny, M. (Eds.), Multi-

    ple Criteria Decision Making. University of South Carolina

    Press, Columbia.

    Olson, D.L., Moshkovich, H.M., Schellenberger, R., Mechitov,

    A.]., 1995. Consistency and accuracy in decision aids: Experi-

    ments with four multiattribute systems. Decision Sciences 26,

    723-748.

    Ozemoy, V.M., 1987. A framework for choosing the most appro-

    priate discrete alternative MCDM in decision support and

    expert systems. In: Savaragi, Y., et al. (Eds.), Toward Interac-

    tive and Intelligent Decision Support Systems. Springer-Verlag,

    Heildelberg, pp. 56-64.

    Ozemoy, V.M., 1992. Choosing the best multiple criteria deci-

    sion-making method. INFOR 30, I59- I7 I

    Pomerol, J., 1993. Multicriteria DSS: State of the art and prob-

    lems Central European Journal for Operations Research and

    Economics 2, 197-212.

    Roy, B., Bouyssou, D., 1986. Comparison of two decision-aid

    models applied to a nuclear power plant siting example.

    European Journal of Operational Research 25, 200-215.

    Saaty, T.L., 1984. The legitimacy of rank reversal. OMEGA 12,

    513-516.

    Saaty, T.L., 1990. An exposition of the AHP in reply to the paper

    remarks on the analytic hierarchy process. Management Sci-

    ence 36, 259-268.

    Schoemaker, P.J., Waid, CC., 1982. An experimental comparison

    of different approaches to determining weights in additive

    utility models. Management Science 28, I82- 196.

    Stewart, T.J., 1992. A critical survey on the status of multiple

    criteria decision making theory and practice. OMEGA 20,

    569-586.

    Stillwell, W., Winterfeldt, D., John, R., 1987. Comparing hierar-

    chical and nonhierarchical weighting methods for eliciting

    multiattribute value models. Management Science 33, 442-

    450.

    Takeda, E., Cogger, K.O., Yu, P.L., 1987. Estimating criterion

    weights using eigenvectors: A comparative study. European

    Journal of Operational Research 29, 360-369.

    Timmermans, D., Vlek, C., Handrickx, L., 1989. An experimental

    study of the effectiveness of computer-programmed decision

    support. In: Locket& A.G., Islei, G. (Eds.), Improving Deci-

    sion Making in Organizations. Springer-Verlag, Heidelberg,

    pp. 13-23.

    Triantaphyllou, E., Mann, S.H., 1989. An examination of the

    effectiveness of multi-dimensional decision-making methods:

    A decision-making paradox. Decision Support Systems 5,

    303-312.

    Voogd, H., 1983. Multicriteria Evaluation for Urban and Regional

    Planning. Pion, London.

    Zahedi, F., 1986. A simulation study of estimation methods in the

    analytic hierarchy process. Socio-Economic Planning Sciences

    20, 347-354.

    Zanakis, S., Mandakovic, T., Gupta, S., Sahay, S., Hong, S.,

    1995. A review of program evaluation and fund allocation

    methods within the service and government sectors. Socio-

    Economic Planning Sciences 29, 59-79.