Decision...

Research ArticleDecision-Making Support for the Evaluation of ClusteringAlgorithms Based on MCDM

Wenshuai Wu 1 Zeshui Xu 1 Gang Kou 2 and Yong Shi 34

1Business School Sichuan University Chengdu Sichuan 610065 China2School of Business Administration Southwestern University of Finance and Economics Chengdu Sichuan 611130 China3CAS Research Center on Fictitious Economy amp Data Science Chinese Academy of Sciences Beijing 100190 China4Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing 100190 China

Correspondence should be addressed to Zeshui Xu xuzeshui263net

Received 28 October 2019 Revised 18 February 2020 Accepted 5 March 2020 Published 5 May 2020

Academic Editor Dimitri Volchenkov

Copyright copy 2020WenshuaiWu et al0is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In many disciplines the evaluation of algorithms for processing massive data is a challenging research issue However differentalgorithms can produce different or even conflicting evaluation performance and this phenomenon has not been fully in-vestigated0e motivation of this paper aims to propose a solution scheme for the evaluation of clustering algorithms to reconciledifferent or even conflicting evaluation performance 0e goal of this research is to propose and develop a model called decision-making support for evaluation of clustering algorithms (DMSECA) to evaluate clustering algorithms by merging expert wisdomin order to reconcile differences in their evaluation performance for information fusion during a complex decision-makingprocess 0e proposed model is tested and verified by an experimental study using six clustering algorithms nine externalmeasures and four MCDM methods on 20 UCI data sets including a total of 18310 instances and 313 attributes 0e proposedmodel can generate a list of algorithm priorities to produce an optimal ranking scheme which can satisfy the decision preferencesof all the participants 0e results indicate our developed model is an effective tool for selecting the most appropriate clusteringalgorithms for given data sets Furthermore our proposed model can reconcile different or even conflicting evaluation per-formance to reach a group agreement in a complex decision-making environment

1 Introduction

Clustering is widely applied in the initial stage of big dataanalysis to divide large data sets into smaller sections so thedata can be comprehended and mastered easily with suc-cessive analytic operations [1ndash3] 0e processing of massivedata relies on the selection of an appropriate clusteringalgorithm and the issue of the evaluation of clustering al-gorithms remains an active and significant issue in manysubjects such as fuzzy set genomics data mining computerscience machine learning business intelligence and fi-nancial analysis [1 4ndash6] Computer scientists economistspolitical scientists bioinformatics specialists sociologistsand many other groups usually debate the potential costsand benefits by analyzing these data for supporting decision-making [7] However the decision-making process is

extremely complex because of competing interests of mul-tiple stakeholders and the intricacy of systems [8ndash10]

Clustering algorithms which are unsupervised pattern-learning algorithms without a priori information partitionthe original data space into smaller sections with high in-tergroup dissimilarities and intragroup similarities Clus-tering can be used to process various types of massive data touncover unknown correlations hidden patterns and otherpotentially useful information However Naldi et al [11]pointed out different clustering algorithms sometimesproduce different data partitions In some situations dif-ferent algorithms produce different or even conflicting re-sults 0erefore the evaluation of clustering algorithmsremains a significant task and a challenging problem

Several validity measures for assessing clustering algo-rithms are presented successively such as the XiendashBeni (XB)

HindawiComplexityVolume 2020 Article ID 9602526 17 pageshttpsdoiorg10115520209602526

index [12] the I-index [13] the CS index [14 15] Dunnrsquosindex [16 17] and the DaviesndashBouldin (DB) index [18 19]0ese validity measures are often divided into the threecategories of external relative and internal measures[20ndash22] External measures compare the partitions pro-duced by clustering algorithms with a given data partition[20 22] Relative measures compare partitions produced bythe same clustering algorithm with discrepant data subsetsor diverse parameters [22] Internal measures depend oncomputing the property of the resulting clusters [22] Brrnet al [20] stated that relative and internal measures fail inpredicting and locating error produced by clustering algo-rithms and external measures for evaluating clusteringresults perform more effectively 0erefore in our empiricalresearch we will select external measures to evaluate andmeasure the performance of clustering algorithms

0e theorem of No Free Lunch (NFL) states that thereexists no single model or algorithm which can get the bestperformance for a given domain problem [23ndash25] It sug-gests that the evaluation of clustering algorithms is verycomplicated and challenging Moreover different clusteringalgorithms may produce different or conflicting partitions0e motivation of this paper fixates on the evaluation ofclustering algorithms to reconcile different or even con-flicting evaluation performance Besides the reconciliationof these differences or conflicts is an important problemwhich has not been fully investigated In addition theevaluation of clustering algorithms usually involves multiplecriteria which are modeled as an MCDM problem So basedon MCDM this paper proposes a model called decision-making support for evaluation of clustering algorithms(DMSECA) to evaluate and measure the performance ofclustering algorithms and further to reconcile their differ-ences or even conflicts among the evaluation performance ofclustering algorithms during a complex decision-makingprocess

0e proposed model consists of three steps First weapply the six most influential clustering algorithms to taskmodeling on 20 UCI data sets with a total of 18310 instancesand 313 attributes Second based on nine external measureswe employ four commonly used MCDM approaches to rankthe performance of clustering algorithms over the 20 UCIdata sets 0ird based on the eighty-twenty rule we proposea decision-making support model to generate a list of al-gorithm priorities to identify the best clustering algorithmamong 20 UCI data sets for secondary mining andknowledge discovery Each MCDM method is randomlyassigned to five UCI data sets

0e contributions of this article are threefold first ourproposed DMSECA model can identify the best clusteringalgorithms for the given data sets by a generated list ofalgorithm priorities during a complex decision-makingprocess Second the proposed model can reconcile thosedifferences or even conflicts to achieve agreement in terms ofthe clustering algorithm evaluation 0ird based on theeighty-twenty rule the expert wisdom is merged to proposea decision-making support model to carry out secondaryknowledge discovery for information fusion in a complexdecision-making environment

0e rest of this article is organized as follows Section 2reviews the related work Section 3 describes some pre-liminaries such as clustering algorithms MCDM methodsand external measures Section 4 proposes our model bymerging the expert wisdom to reconcile disagreementsamong the clustering algorithms Section 5 presents the datasets provides the experimental design shows empiricalresults and discusses the significance of this work Section 6summarizes this article

2 Related Work

Cluster analysis aims to classify elements into categories onthe basis of their similarity [26] In recent years manyclustering algorithms have been proposed [26ndash29] 0edensity peak clustering has been published by Rodriguez andLaio in Science [26] In view of the low objectivity andaccuracy because of the man-made factor a density frag-ment clustering without peaks is proposed based on densitypeak clustering [30] Jiang et al [28] developed GDPC al-gorithm with an alternative decision graph based on grav-itation theory and nearby distance to identify centroids andanomalies accurately In order to overcome the defect of theoriginal DPC in detecting anomalies and hub nodes Jianget al [29] proposed an improved recognition method on thehalo node for density peak clustering algorithm (halo DPC)[29] 0e proposed halo DPC can improve the ability to dealwith varying densities irregular shapes the number ofclusters outlier and hub node detection [29]

Clustering ensemble has been increasingly popular in therecent years by consolidating several base clusteringmethods into a probably better and more robust one Ali-zadeh et al [31] presented a novel optimization-basedmethod for the combination of cluster ensembles Parvinand Minaei-Bidgoli [32] proposed a weighted locallyadaptive clustering (WLAC) algorithmwhich is based on theLAC algorithm Considering that some features have moreinformation than the others in a dataset Parvin and Minaei-Bidgoli [27] proposed a fuzzy weighted locally adaptiveclustering (FWLAC) algorithm which is capable of handlingimbalanced clustering Abbasi et al [33] proposed a criterionto assess the association between a cluster and a partitionwhich is called edited normalized mutual informationENMI criterion Mojarad et al [34] presented a clusteringensemble method named RCEIFBC with a new aggregationfunction which takes into account the two similarity criteria(a) one of them is the cluster-cluster similarity and (b) theother one is the object-cluster similarity Mojarad et al [35]proposed an ensemble aggregator or a consensus functioncalled as the robust clustering ensemble based on samplingand cluster clustering (RCESCC) algorithm in order to getbetter clustering results Rashidi et al [36] proposed a newclustering ensemble approach using a weighting strategy forperforming consensus clustering by exploiting the clusteruncertainty concept Bagherinia et al [37] proposed a novelfuzzy clustering ensemble framework based on a new fuzzydiversity measure and a fuzzy quality measure to find thebase clusterings with the best performance In clusteringensemble multiple clustering outputs can be combined to

2 Complexity

produce better results in terms of consistency robustnessand performance than the basic individual clusteringmethods

0e evaluation of clustering algorithms is an active issuein the fields such as machine learning data mining artificialintelligence databases and pattern recognition [11] In atypical clustering scenario three fundamental questionsmust be addressed (i) identifying an effective clusteringalgorithm suitable for a given data set (ii) determining howmany clusters are presented in the data and (iii) evaluatingthe clustering [38] 0is article focuses on the first problem

Several validity measures have been proposed to evaluateclustering algorithms Yeung et al [39] pointed out that thefigure of merit (FOM) is used on microarray data anddifferent biological groups represent the clusters Halkidiet al [40] presented the Rand statistic to measure theproportion of pairs of vectors Roth et al [41] presented astability measure to evaluate the partitioning validity and tochoose the number of clusters Chou et al [14] presented aCS cluster relative measure to assess clusters with differentsizes and densities Zalik [42] presented a CO cluster-validitymeasure based on compactness and overlapping measures toestimate the quality of partitions Chou et al [43] presentedan area measure to evaluate the initial cluster number basedon the information of cluster areas Wani and Riyaz [44]presented a new compactness measure using a novel penaltyfunction to describe the typical behavior of a clusterAzhagiri and Rajesh [45] proposed a novel approach tomeasure the quality of the cluster and can find intrusionsusing intrusion unearthing and probability clomp algorithm

0e validity measures are often divided into the types ofinternal relative and external measures [20 21 24 25 46]Internal measures are based on computing properties of theresulting clusters and these measures do not include ad-ditional information on the data [20 25 47] Relativemeasures are based on the comparison of partitions pro-duced by the same clustering algorithm with different datasubsets or different parameters and they do not demandadditional information [20 25 39] External measurescompare the partitions produced by clustering algorithmswith a given data partition [20 25 48]0ese correspond to akind of error measurement so they can be supposed to offerimproved correlation to the true error [20] 0e results ofBurn et al [20] indicate that external measures for evaluatingclustering results are more accurate than internal or relativemeasures 0us external measures are selected to assess theperformance of clustering algorithms

In addition the evaluation of clustering algorithms in-volves more than one criterion 0us it can be solved byMCDMmethods0is differs from previous approaches Forexample Dudoit and Fridlyand [49] proposed a prediction-based resampling method to evaluate the number of clustersand Sugar and James [50] chose the number of clusters by aninformation-theoretical approach Peng et al [51] developedan MCDM-based method to select the number of clustersPeng et al [52] also developed a framework to select theappropriate clustering algorithm and to further choose thenumber of clusters Meyer and Olteanu [53] indicatedclustering in the field of multicriteria decision aid (MCDA)

has seen a few adaptations of methods from data analysismost of them however using concepts native to that fieldsuch as the notions of similarity and distance measuresBesides Chen et al [54] pointed out the clustering problemis one of the well-known MCDA problems and the existingversions of the K-means clustering algorithm are only usedfor partitioning the data into several clusters which do nothave priority relations therefore Chen et al [54] proposed acomplete ordered clustering algorithm called the orderedK-means clustering algorithm which considers the prefer-ence degree between any two alternatives Mahdiraji et al[55] presented marketing strategies evaluation based on bigdata analysis by a clustering-MCDM approach 0is papertakes a new perspective by proposing a DMSECA modelbased on the MCDM method merging expert wisdom byusing the eighty-twenty rule to select the best clusteringalgorithms for the given data sets during a complex decisionprocess Furthermore our proposed DMSECA model canreconcile different or even conflicting evaluation perfor-mance to reach a group agreement for information fusion ina complex decision-making environment

0e eighty-twenty rule is proposed by Pareto [56] whoresearches the wealth distribution in different countries 0eeighty-twenty rule is based on the observation that in mostcountries about 80 of the wealth is controlled by about20 of the people which is called a ldquopredictable imbalancerdquoby Pareto [57] 0e eighty-twenty rule has been expanded tomany fields such as sociology and quality control [58] In thiswork the eighty-twenty rule is used to focus on the analysison the most important positions of the rankings in relationto the number of observations for predictable imbalance0e truth is often in a few hands the views of about 20 ofthe people represent more satisfactory rankings in theopinion of all participants

0e decision-making process is extremely complex be-cause of competing interests of multiple stakeholders andthe intricacy of systems [8ndash10] In this paper the proposedDMSECAmodel based on MCDMmethods and the eighty-twenty rule presents a new perspective by merging expertwisdom to evaluate the most appropriate clustering algo-rithm for the given data sets and the proposed model canreconcile individual differences or conflicts to achieve groupagreements among clustering algorithm evaluations in acomplex decision-making environment

3 Preliminaries

0is section presents some elementary and preparatoryknowledge It first introduces several evaluation approachesin Section 31 and then the classic MCDM methods arepresented in Section 32 finally the performance measuresof clustering algorithms are described in Section 33

31 Clustering Algorithms Clustering is a popular unsu-pervised learning technique It aims to divide large data setsinto smaller sections so that objects in the same cluster arelowly distinct whereas objects in different clusters are lowlysimilar [21] Clustering algorithms based on similarity

Complexity 3

criteria can group patterns where groups are sets of similarpatterns [54 59 60] Clustering algorithms are widely ap-plied in many research fields such as genomics imagesegmentation document retrieval sociology bio-informatics psychology business intelligence and financialanalysis [61ndash64]

Clustering algorithms are usually known as the fourclasses of partitioning methods hierarchical methodsdensity-based methods and model-based methods [65]Several classic clustering algorithms are proposed and re-ported such as the K-means algorithm [66] k-medoid al-gorithm [67] expectation maximization (EM) [68] andfrequent pattern-based clustering [65] In this paper the sixmost influential clustering algorithms are selected for theempirical study0ese are the KM algorithm EM algorithmfiltered clustering (FC) farthest-first (FF) algorithm make-density-based clustering (MD) and hierarchical clustering(HC) 0ese clustering algorithms can be implemented byWEKA [69]

0e KM algorithm a partitioning method takes theinput parameter k and partitions a set of n objects into kclusters so that the resulting intracluster similarity is highand the intercluster similarity is low And the cluster sim-ilarity can be measured by the mean value of the objects in acluster which can be viewed as the centroid or center ofgravity of the cluster [65]

0e EM algorithm which is considered as an extensionof the KM algorithm is an iterative method to find themaximum likelihood or maximum a posteriori estimates ofparameters in statistical models where the model dependson unobserved latent variables [70] 0e KM algorithmassigns each object to a cluster

In the EM algorithm each object is assigned to eachcluster according to a weight representing its probability ofmembership In other words there are no strict boundariesbetween the clusters 0us new means can be computedbased on the weighted measures [68]

0e FC applied in this work can be implemented byWEKA [69] Like the cluster the structure of the filter isbased exclusively on the training data and test instances willbe addressed by the filter without changing their structure

0e FF algorithm is a fast greedy and simple approx-imation algorithm to the k-center problem [67] where the kpoints are first selected as a cluster center and the secondcenter is greedily selected as the point farthest from the firstEach remaining center is determined by greedily selectingthe point farthest from the set of chosen centers and theremaining points are added to the cluster whose center is theclosest [66 71]

0e MD algorithm is a density-based method 0egeneral idea is to continue growing the given cluster as longas the density (the number of objects or data points) in theneighborhood exceeds some threshold 0at is for each datapoint within a given cluster the neighborhood of a givenradius must contain a minimum number of points [65] 0eHC algorithm is a method of cluster analysis that seeks tobuild a hierarchy of clusters which can create a hierarchicaldecomposition of the given data sets [66 72]

32 MCDM Methods 0e MCDM methods which weredeveloped in the 1970s are a complete set of decisionanalysis technologies that have evolved as an importantresearch field of operation research [73 74] 0e Interna-tional Society on MCDM defines MCDM as the research ofmethods and procedures concerning multiple conflictingcriteria which can be formally incorporated into themanagement planning process [73] In an MCDM problemthe evaluation criteria are assumed to be independent[75 76] MCDM methods aim to assist decision-makers(DMs) to identify an optimal solution from a number ofalternatives by synthesizing objective measurements andvalue judgments [77 78] In this section four classic MCDMmethods the weighted sum method (WSM) grey relationalanalysis (GRA) TOPSIS and PROMETHEE II are intro-duced as follows

321 WSM WSM [79] is a well-knownMCDMmethod forevaluating finite alternatives in terms of finite decisioncriteria when all the data are expressed in the same unit[80 81] 0e benefit-to-cost-ratio and benefit-minus-costapproaches [82] can be applied to the problem of involvingboth benefit and cost criteria In this paper the cost criteriaare first transformed to benefit criteria Besides there isnominal-the-better (NB) when the value is closer to theobjective value the nominal-the-better (NB) is better

0e calculation steps of WSM are as follows First as-sume n criteria including benefit criteria and cost criteriaand m alternatives 0e cost criteria are first converted tobenefit criteria in the following standardization process

(1) 0e larger-the-better (LB) a larger objective value isbetter that is the benefit criteria and it can bestandardized as

xijprime

xij minus mini xij

maxi xij minus mini xij

(1)

(2) 0e smaller-the-better (SB) the smaller objectivevalue is better that is the cost criteria and it can bestandardized as

xijprime

maxi xij minus xij


(2)

(3) 0e nominal-the-better (NB) the closer to the ob-jective value is better and it can be standardized as

xijprime 1 minus

xij minus xob

11138681113868111386811138681113868

11138681113868111386811138681113868

max maxi xij minus xob xob minus mini xij1113966 1113967 (3)

Finally the total benefit of all the alternatives can becalculated as

Ai 1113944k

j1wjxijprime 1le ilem 1le jle n (4)

0e larger WSM value indicates the better alternative

4 Complexity

322 GRA GRA is a basic MCDM method of quantitativeresearch and qualitative analysis for system analysis [83]Based on the grey space it can address inaccurate and in-complete information [84] GRA has been widely applied inmodeling prediction systems analysis data processing anddecision-making [83 85ndash88] 0e principle is to analyze thesimilarity relationship between the reference series and al-ternative series [89] 0e detailed steps are as follows

Assume that the initial matrix is R

R

x11 x12 middot middot middot x1n


⋮ ⋮ middot middot middot ⋮

xm1 xm2 middot middot middot xmn

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(1le ilem 1le jle n) (5)

(1) Standardize the initial matrix

Rprime

x11prime x12prime middot middot middot x1nprime



xm1prime xm2prime middot middot middot xmnprime

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(2) Generate the reference sequence x0prime

x0prime x0prime(1) x0prime(2) x0prime(n)( 1113857 (7)

where x0prime(j) is the largest and standardized value inthe jth factor

(3) Calculate the differences Δ0i(j) between the refer-ence series and alternative series

Δ0i(j) x0prime(j) minus xijprime

11138681113868111386811138681113868

11138681113868111386811138681113868

Δ

Δ01(1) Δ01(2) middot middot middot Δ01(n)


⋮ ⋮ ⋮ ⋮

Δ0m(1) Δ0m(2) middot middot middot Δ0m(n)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(1le ilem 1le jle n)

(8)

(4) Calculate the grey coefficient r0i(j)

r0i(j) miniminjΔ0i(j) + δmaximaxjΔ0i(j)

Δ0i(j) + δmaximaxjΔ0i(j) (9)

where δ is a distinguished coefficient 0e value of δis generally set to 05 to provide good stability

(5) Calculate the value of grey relational degree bi

bi 1n

1113944

n

j1r0i(j) (10)

(6) Finally standardize the value of grey relational de-gree βi

βi bi

1113936ni1 bi

(11)

323 TOPSIS Developed by Hwang and Yoon [90]TOPSIS is one of the classic MCDM methods to rank al-ternatives over multicriteria 0e principle is that the chosenalternative should have the shortest distance from thepositive ideal solution (PIS) and the farthest distance fromthe negative ideal solution (NIS) [91] TOPSIS can find thebest alternative by minimizing the distance to the PIS andmaximizing the distance to the NIS [92]0e alternatives canbe ranked by their relative closeness to the ideal solution0ecalculation steps are as follows [93]

(1) 0e decision matrix A is standardized

aij xij

1113936mi1 xij1113872 1113873

21113969 (1le ilem 1le jle n) (12)

(2) 0e weighted standardized decision matrix iscomputed

D aij lowastwj1113872 1113873 (1le ilem 1le jle n) (13)

where the Vlowast are the criteria weights and1113936

mi1 wj 1

(3) 0e PIS Vlowast and the NIS Vminus are calculated

Vlowast

vlowast1 vlowast2 v

lowastn1113864 1113865 max

ivij | j isin J1113874 1113875 min

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

Vminus

vminus1 v

minus2 v

minusn1113864 1113865 min

ivij | j isin J1113874 1113875 max

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

(14)

(4) 0e distances of each alternative from PIS and NISare determined

S+i

1113944

n

j1V

ji minus Vlowast1113872 1113873

2

11139741113972


Sminusi

1113944

n

j1V

ji minus Vminus1113872 1113873

2

11139741113972


(15)

(5) 0e relative closeness to the ideal solution isobtained

Yi Sminus

i

S+i + Sminus

i

(1le ilem) (16)

where when R is closer to 1 the alternative is closerto the ideal solution

(6) 0e preference order is ranked

0e larger relative closeness indicates the betteralternative

324 PROMETHEE II PROMETHEE II proposed byBrans in 1982 uses pairwise comparisons and ldquovaluesoutranking relationsrdquo to select the best alternative [94]PROMETHEE II can support DMs to reach an agreement onfeasible alternatives over multiple criteria from differentperspectives [95 96] In the PROMETHEE II method a

Complexity 5

positive outranking flow reveals that the chosen alternativeoutranks all alternatives whereas a negative outranking flowreveals that the chosen alternative is outranked by all al-ternatives [51 97] Based on the positive outranking flowsand negative outranking flows the final alternative can beselected and determined by the net outranking flow [98]0esteps are as follows

(1) Normalize the decision matrix R

Rij xij minus minxij

max xij minus min xij

(1le ile n 1le jlem) (17)

(2) Define the aggregated preference indices Leta b isin A and

π(a b) 1113944k

j1pj(a b)wj

π(a b) 1113944k

j1pj(b a)wj

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

(18)

where A is a finite set of alternatives a1 a2 an1113864 1113865k is the number of criteria such that 1le klem wj isthe weight of criterion j and 1113936

kj1 wj 1(1le klem)

π(a b) represents how a is preferred to b over allcriteria and pj(a b) represents how b is preferred toa over all criteria pj(a b) and pj(b a) are thepreference functions of the alternatives a and b

(3) Calculate π(a b) and π(b a) for each pair ofalternativesIn general there are six types of preference functionDMs must select one type of preference function andthe corresponding parameter value for each criterion[51 98]

(4) Determine the positive outranking flow and negativeoutranking flow 0e positive outranking flow isdetermined by

ϕ+(a)

1n minus 1

1113944xisinA

π(a x) (19)

and the negative outranking flow is determined by

ϕminus(a)

1n minus 1

1113944xisinA

π(a x) (20)

(5) Calculate the net outranking flow

ϕ(a) ϕ+(a) minus ϕminus

(a) (21)

(6) Determine the ranking according to the net out-ranking flow

Larger ϕ(a) is the more appropriate alternative

33 Performance Measures Burn et al [20] proposed thatexternal measures for evaluating clustering results are moreeffective than internal and relative measures Accordingly inthis study nine clustering external measures are selected for

evaluation 0ese are entropy purity microaverage preci-sion (MAP) Rand index (RI) adjusted Rand index (ARI)F-measure (FM) FowlkesndashMallows index (FMI) Jaccardcoefficient (JC) and Mirkin metric (MM) Among themmeasures of entropy and purity are widely applied as ex-ternal measures in the fields of data mining and machinelearning [99 100] 0e nine external measures are generatedby a computer with an Intel core i5-3210M CPU 250GHzwith 8G memory Before introducing external measures thecontingency table is described

331 9e Contingency Table Given a data set D with nobjects suppose we have a partition P P1 P2 Pn1113864 1113865 bysome clustering method where cupki1Pi D and Pi capPj ϕfor 1le ine jle k According to the preassigned class labels wecan create another partition on C C1 C2 Ck1113864 1113865 wherecupki1Ci D and Ci capCj ϕ for 1le ine jle k Let nij denotethe number of objects in cluster Pi with the label of class Cj0en the data information between the two partitions canbe displayed in the form of a contingency table as shown inTable 1 [65]

0e following paragraphs define the external measures0emeasures of entropy and purity are widely applied in thefield of data mining and machine learning [99 100]

(1) Entropy 0e measure of entropy which originated in theinformation-retrieval community can measure the variance ofa probability distribution If all clusters consist of objects withonly a single class label the entropy is zero and as the classlabels of objects in a cluster become more varied the entropyincreases [101] 0e measure of entropy is calculated as

E minus 1113944i

ni

n1113944

j

nij

ni

lognij

ni

⎛⎝ ⎞⎠ (22)

A lower entropy value usually indicates more effectiveclustering

(2) Purity 0e measure of purity pays close attention to therepresentative class (the class with majority objects withineach cluster) [102] Purity is similar to entropy It is cal-culated as

P 1113944i

ni

nmax

j

nij

ni

1113888 1113889 (23)

A higher purity value usually represents more effectiveclustering

(3) F-Measure 0e F-measure (FM) is a harmonic mean ofprecision and recall It is commonly considered as clusteringaccuracy [103] 0e calculation of FM is inspired by theinformation-retrieval metric as follows

F minus measure 2 times precision times recallprecision + recall

precision nij

nj

recall nij

ni

(24)

6 Complexity

A higher value of FM generally indicates more accurateclustering

(4) Microaverage Precision 0e MAP is usually applied inthe information-retrieval community [104] It can obtain aclustering result by assigning all data objects in a givencluster to the most dominant class label and then evaluatingthe following quantities for each class [60]

(1) α(Cj) the number of objects correctly assigned toclass Cj

(2) β(Cj) the number of objects incorrectly assigned toclassCj0e MAP measure is computed as follows

MAP 1113936jα Cj1113872 1113873

1113936jα Cj1113872 1113873 + β Cj1113872 1113873 (25)

A higher MAP value indicates more accurate clustering

(5) Mirkin Metric 0e measure of Mirkin metric (MM)assumes the null value for identical clusters and a positivevalue otherwise It corresponds to the Hamming distancebetween the binary vector representations of each partition[105] 0e measure of MM is computed as

M 1113944i

n2i + 1113944

j

n2j minus 21113944

i

1113944j

n2ij (26)

A lower value of MM implies more accurate clusteringIn addition given a data set assume a partition C is a

clustering structure of a data set and P is a partition by someclustering method We refer to a pair of points from the dataset as follows

(i) SS if both points belong to the same cluster of theclustering structure C and to the same group of thepartition P

(ii) SD if the points belong to the same clusters ofC andto different groups of P

(iii) DS if the points belong to different clusters of C andto the same groups of P

(iv) DD if the points belong to different clusters of Cand to different groups of P

Assume that a b c and d are the numbers of SS SD DSand DD pairs respectively and that M a + b + c + dwhich is the maximum number of pairs in the data set 0efollowing indicators for measuring the degree of similaritybetween C and P can be defined

(6) Rand Index 0e RI is a measure of the similarity betweentwo data clusters in statistics and data clustering [106] RI iscomputed as follows

R (a + d)

M (27)

A higher value of RI indicates a more accurate result ofclustering

(7) Jaccard Coefficient 0e JC also known as the Jaccardsimilarity coefficient (originally named the ldquocoefficient decommunauterdquo by Paul Jaccard) is a statistic applied tocompare the similarity and diversity of sample sets [107] JCis computed as follows

J a

(a + b + c) (28)

A higher value of JC indicates a more accurate result ofclustering

(8) Fowlkes andMallows Index0e Fowlkes andMallows index(FMI) was proposed by Fowlkes and Mallows [108] as an al-ternative for the RI0emeasure of FMI is computed as follows

FMI

a

a + bmiddot

a

a + c

1113970

(29)

A higher value of FMI indicates more accurateclustering

(9) Adjusted Rand Index 0e adjusted Rand index (ARI) isthe corrected-for-chance version of the measure of RI [106]It ranges from minus 1 to 1 and expresses the level of concordancebetween two bipartitions [109] A value of ARI closest to 1indicates almost perfect concordance between the twocompared bipartitions whereas a value near minus 1 indicatesalmost complete discordance [110] 0e measure of ARI iscomputed as

ARI a minus ((a + c) +(a + b)M)

((a + c) +(a + b)2) minus ((a + c) +(a + b)M) (30)

A higher value of ARI indicates more accurate clustering

34 Index Weights In this work the index weights of thefour MCDM methods can be calculated by AHP 0e AHPmethod proposed by Saaty [111] is a widely used tool formodeling unstructured problems by synthesizing subjectiveand objective information in many disciplines such aspolitics economics biology sociology management sci-ence and life sciences [112ndash114] It can elicit a corre-sponding priority vector according to pair-by-paircomparison values [115] obtained from the scores of expertson an appropriate scale [116] AHP has some problems forexample the priority vector derived from the eigenvaluemethod can violate a condition of order preservation pro-posed by Costa and Vansnick [117] However AHP is still aclassic and important approach especially in the fields ofoperation research and management science [118] AHP hasthe following steps

Table 1 Contingency table

Partition CΣ

C1 C2 Ck

Partition P

P1 n11 n12 n1k N1P2 n21 n22 n2k N2Pk nk1 nk2 nkk nkΣ N1 n2 nk n

Complexity 7

(1) Establish a hierarchical structure a complex problemcan be established in such a structure including thegoal level criteria level and alternative level[119 120]

(2) Determine the pairwise comparison matrix once thehierarchy is structured the prioritization procedurestarts for determining the relative importance of thecriteria (index weights) within each level [119121 122] 0e pairwise comparison values are ob-tained from the scores of experts on a 1ndash9 scale [116]

(3) Calculate index weights the index weights areusually calculated by the eigenvector method [120]proposed by Saaty [111]

(4) Test consistency the value of 01 is generally con-sidered the acceptable upper limit of the consistencyratio (CR) If the CR exceeds this value the proce-dure must be repeated to improve consistency[119 121]

4 The Proposed Model

Clustering results can vary according to the evaluationmethod Rankings can conflict even when abundant data areprocessed and a large knowledge gap can exist between theevaluation results [123] due to the anticipation experienceand expertise of all individual participants 0e decision-making process is extremely complex 0is makes it difficultto make accurate and effective decisions [124] As mentionedin Section 1 the proposed DMSECA model consists of threesteps 0ey are as follows

0e first step usually involves modeling by clusteringalgorithms which can be accomplished using one or moreprocedures selected from the categories of hierarchicaldensity-based partitioning and model-based methods [65]In this section we apply the six most influential clusteringalgorithms including EM the FF algorithm FC HC MDand KM for task modeling by using WEKA 37 on 20 UCIdata sets including a total of 18310 instances and 313 at-tributes Each of these clustering algorithms belongs to oneof the four categories of clustering algorithms mentionedpreviously Hence all categories are represented

In the second step four commonly used MCDMmethods (TOPSIS WSM GRA and PROMETHEE II) areapplied to rank the performance of the clustering algorithmsover 20 UCI data sets based on nine external measures as theinput computed in the first step 0ese methods are highlysuitable for the given data sets Unsuitable methods were notselected For example we did not select VIKOR because itsdenominator would be zero for the given data sets0e indexweights are determined by AHP based on the eigenvaluemethod 0ree experts from the field of MCDM are selectedand consulted as the DMs to derive the pairwise comparisonvalues completed by the scores of experts We randomlyassign each MCDM method to five UCI data sets We applymore than one MCDM method to analyze and evaluate theperformance of clustering algorithms which is essential

Finally in the third step we propose a decision-makingsupport model to reconcile the individual differences or even

conflicts in the evaluation performance of the clusteringalgorithms among the 20 UCI data sets0e proposed modelcan generate a list of algorithm priorities to select the mostappropriate clustering algorithm for secondary mining andknowledge discovery 0e detailed steps of the decision-making support model based on the 80-20 rule are de-scribed as follows

Step 1 Mark two sets of alternatives in a lower position andan upper position respectively

It is well known that the eighty-twenty rule reports thateighty percent of the results originate in twenty percent ofthe activity in most situations [58] 0e rule can be creditedto Vilfredo Pareto [56] who observes that eighty percent ofthe wealth is usually controlled by twenty percent of thepeople in most countries [57] 0e implication is that it isbetter to be in the top of 20 than in the bottom of 80 Sothe eighty-twenty rule introduced in Section 5 can beapplied to focus on the analysis of the most importantpositions of the rankings in relation to the number of ob-servations for predictable imbalance 0e eighty-twenty ruleindicates that the twenty percent of people who are creatingeighty percent of the results which are highly leveraged Inthis research based on the expert wisdom originating fromthe twenty percent of people the set of alternatives isclassified into two categories where the top of 15 of thealternatives is marked in an upper position which representsmore satisfactory rankings from the opinion of all individualparticipants involved in the algorithm evaluation process0e bottom of 15 is in a lower position which representsmore dissatisfactory rankings from the opinion of all in-dividual participants 0e element marked in the upperposition is calculated as follows

x nlowast15

(31)

where n is the number of alternatives For instance if n 7then x 7 times 15 14 asymp 2 Hence the second positionclassifies the ranking where the first and second positionsare those alternatives in the upper position which areconsidered as the collective group idea of the most appro-priate and satisfactory alternatives

Similarly the element marked in the lower position iscalculated as

x nlowast 45

(32)

where n is the number of alternatives For instance if n 7then 7lowast 45 56 asymp 6 0us the sixth position classifies theranking where the sixth and seventh positions in the lowerpositions are considered collectively as the worst and mostdissatisfactory alternatives

Step 2 Grade the sets of alternatives in the lower and upperpositions respectively

A score is assigned to each position of the set of alter-natives in the lower position and upper positionrespectively

8 Complexity

0e score in the lower position can be calculated byassigning a value of 1 to the first position 2 to the secondposition and x to the last position Finally the score ofeach alternative in the lower position is totaled marked d

Similarly the score in the upper position can be cal-culated by assigning a value of 1 to the last position 2 to thepenultimate position and x to the first position Finallythe score of each alternative in the upper position is totaledmarked b

Step 3 Generate the priority of each alternative0e priority of each alternative fi which represents the

most satisfactory rankings from the opinions of all indi-vidual participants can be determined as

fi bi minus di (33)

where a higher value of fi implies a higher priority

5 Experimental Design and Results

We now present an experiment on 20 UCI data sets 0is isdesigned to test and verify our proposed DMSECA modelfor performance evaluation of clustering algorithms in orderto reconcile individual differences or even conflicts in theevaluation performance of clustering algorithms based onMCDM in a complex decision-making environment 0eexperimental data sets experimental design and experi-mental results are as follows

51 Data Sets A total of 20 datasets are applied for per-formance evaluation of clustering algorithms in the exper-iment 0ey are originated from the UCI repository (httparchiveicsucieduml) [125] 0ese 20 datasets which in-clude the structures and characteristics of data set charac-teristics attribute characteristics number of instancesnumber of attributes and area are including Liver DisordersData Set (httparchiveicsuciedumldatasetsLiver+Disorders) Wine Data Set (httparchiveicsuciedumldatasetsWine) Teaching Assistant Evaluation Data Set(httparchiveicsuciedumldatasetsTeaching+Assistant+Evaluation) Wholesale Customers Data Set (httparchiveicsuciedumldatasetsWholesale+customers) HabermanrsquosSurvival Data Set (httparchiveicsuciedumldatasetsHaberman27s+Survival) Balance Scale Data Set (httparchiveicsuciedumldatasetsBalance+Scale) Contracep-tive Method Choice Data Set (httparchiveicsuciedumldatasetsContraceptive+Method+Choice) Page BlocksClassification Data Set (httparchiveicsuciedumldatasetsPage+Blocks+Classification) Breast Tissue DataSet (httparchiveicsuciedumldatasetsBreast+Tissue)Blood Transfusion Data Set (httparchiveicsuciedumldatasetsBlood+Transfusion+Service+Center) and YeastData Set (httparchiveicsuciedumldatasetsYeast) Ta-ble 2 concludes data information of these datasets 0esedatasets summarize a total of 18310 instances and 313 at-tributes from a variety of disciplines such as life sciencesbusiness physical sciences social sciences and CSEngi-neering 0e data sets have a variety of data structures 0eir

sizes range from 100 to 4601 the number of attributes from3 to 60 and the number of classes from 2 to 10

52 Experimental Design In this section the experimentaldesign is described in detail to examine the feasibility andeffectiveness of our proposed DMSECA model 0eDMSECA model can be verified by applying the fourMCDM methods introduced in Section 32 to estimate theperformance of the clustering algorithms for the 20 selectedpublic-domain UCI machine learning data sets EachMCDM method is randomly assigned to five UCI data sets0e experimental design can be implemented as follows

Input 20 UCI data setsOutput rankings of evaluation performance of clus-tering algorithms to generate a list of algorithm pri-orities in order to select the best clustering algorithmand reconcile individual disagreements among theirevaluationsStep 1 prepare target data sets data preprocessing todelete class labels of the original data setsStep 2 obtain clustering solutions obtain clusteringsolutions of the six classic clustering algorithms in-troduced in Section 31 by WEKA based on the targetdata setsStep 3 calculate the values of nine external measures ofeach data setStep 4 obtain the weights of external measures In thispaper the weights of external measures are obtained byAHP based on the eigenvalue method which is scoredby three invited and consulted expertsStep 5 use WSM TOPSIS PROMETHEE II and GRAto generate rankings of evaluation performance ofclustering algorithms Each MCDM method is ran-domly assigned to one of the five UCI data sets 0efour MCDM methods are implemented using MAT-LAB 70 using the external measures as the inputStep 6 achieve consensus 0e consensus on differentor even conflicting individual rankings of evaluationperformance of clustering algorithms can be achievedby using the proposed decision-making support modelin the third step which merges expert wisdomStep 7 generate a list of algorithm priorities 0e listcan reconcile individual disagreements among theevaluation performance of clustering algorithmsStep 8 end

53 Experimental Results 0is section gives the obtainedresults by testing the proposed DMSECA model on the 20UCI datasets including a total of 18310 instances and 313attributes to reconcile those individual differences or conflictsamong evaluation performance of clustering algorithms 0esix clustering algorithms nine external measures and fourMCDM methods are applied to illustrate and explain ourmodel 0e experimental results are as follows

Complexity 9

First the values of the nine external measures of the 20data sets can be obtained using the selected six clusteringalgorithms 0e process is implemented according to Steps1ndash3 in Section 52 To facilitate understanding we haveselected the Ionosphere data set as an example to explain thecomputational process0e initial values of the nine externalmeasures which are provided in Table 3 are standardized byequations (1)ndash(3) to transform cost criteria to benefit cri-teria 0e standardized data are presented in Table 4 Wehighlight the optimal result of each external measure inboldface It is clear that no clustering algorithm obtains theoptimal results for all external measures 0is supports theNFL theorem

Second the rankings of the clustering algorithms onthe 20 data sets computed by SWM TOPSIS GRA andPROMETHEE II are presented in Tables 5ndash8 respectively0e four MCDM methods are implemented usingMATLAB 70 using the external measures such as PurityEn FM and Rand as the input based on Tables 3 and 4Each group of five UCI data sets can be processed by oneof the four MCDM methods which are randomlyassigned 0e measure weights of each expert applied inWSM TOPSIS GRA and PROMETHEE II are obtainedby AHP based on the eigenvalue method 0e final indexweights of three experts can be obtained by the weightedarithmetic mean to aggregation which has been a widelyused aggregation algorithm in the decision problems 0efinal index weights for the nine external measures in theorder given in Tables 4 and 5 are 01893 01820 0044900930 00483 01264 01234 01159 and 00769respectively

0e results in Tables 5ndash8 do not enable us to identify anddetermine a regular pattern of evaluation performance of theclustering algorithms 0e results indicate that variousMCDMmethods generate conflicting rankings On the basis

of these observed results secondary mining and knowledgediscovery are proposed to reconcile these disagreements

Finally a decision-making support model based on theeighty-twenty rule for secondary mining and knowledgediscovery is applied to reconcile individual disagreements0is model includes three steps as follows

In Step 1 mark two sets of alternatives in a lower po-sition and an upper position respectively According toequations (31) and (32) in the upper position we know thatn 6 and then x 6 times 15 12 asymp 2 0us the secondposition classifies the ranking where the first and secondpositions are those alternatives in the upper positionSimilarly in the lower position we havex 6 times 45 48 asymp 5 Hence the fifth position classifies theranking where the fifth and sixth positions are those al-ternatives in the lower position 0e two sets of alternativesin the lower position and upper position can be marked andthey are presented in boldface in Table 9 based onTables 5ndash8

In Step 2 grade the sets of alternatives in the lower andupper positions respectively according to Step 2 in Section4 0e scores of alternatives in the upper position di can betotaled Similarly the scores of alternatives in the lowerposition di can be totaled 0en the results are presented inTable 10 for the 20 UCI data sets

In Step 3 the priority of each alternative is computed byequation (33) and the calculation results are reported inTable 10

54 Discussion and Analysis 0e results in Tables 5ndash8 in-dicate that different MCDM methods produce different oreven conflicting individual rankings 0us it is difficult forDMs to identify the best clustering algorithms for the givendata sets Table 10 reports a list of algorithm priorities 0e

Table 2 Data information of the 20 data sets

Data sets No Area Number of instances Number of attributes Number of classesLiver Disorders 1 Life sciences 345 7 2ZOO 2 Life sciences 101 17 2Pima Indians Diabetes 3 Life sciences 768 8 2Wholesale Customers 4 Business 440 8 2Habermanrsquos Survival 5 Life sciences 306 3 2Wine 6 Physical sciences 178 13 3Balance Scale 7 Social sciences 625 4 3Breast Tissue 8 Life sciences 106 10 6Ecoli 9 Life sciences 336 8 8Fertility 10 Life sciences 100 10 2Ionosphere 11 Physical sciences 351 34 2Iris 12 Life sciences 150 4 3Teaching Assistant Evaluation 13 Other 151 5 3Blood Transfusion 14 Business 748 5 2Spambase 15 CSEngineering 4601 57 2Page Blocks Classification 16 CSEngineering 5473 10 5Sonar 17 Physical sciences 208 60 2Contraceptive Method Choice 18 Life sciences 1473 9 3Dermatology 19 Life sciences 366 33 6Yeast Data 20 Life sciences 1484 8 10Total 18310 313 70

10 Complexity

Table 3 Initial values of the nine external measures for the Ionosphere data set

Purity En F-m Rand ARI Jaccard FM MAP MEM 09003 00331 01109 05897 00001 05689 07411 09003 04839FF 06638 00506 03859 08091 00011 07705 08747 06638 03089FC 09117 00296 00999 05954 00001 05774 07484 09117 04818HC 06439 00356 04020 08177 00012 07785 08819 06439 02982MD 08746 00408 01339 05783 00001 05502 07250 08746 04877KM 09117 00299 00994 05983 00001 05791 07502 09117 04807

Table 4 Standardized values of the nine external measures for the Ionosphere data set


Table 5 Rankings of WSM for the five assigned UCI data sets

ZOO Balance Scale Teaching AssistantEvaluation Spambase Yeast data

Value Rank Value Rank Value Rank Value Rank Value RankEM 01677 2 01701 1 01547 6 01650 6 01719 2FF 01653 5 01651 3 01684 4 01652 4 01790 1FC 01677 2 01648 5 01727 1 01695 1 01644 5HC 01638 6 01701 1 01595 5 01652 4 01560 3MD 01676 4 01650 4 01721 3 01656 3 01645 4KM 01679 1 01648 5 01727 1 01695 1 01643 6

Table 6 Rankings of TOPSIS for the five assigned UCI data sets

Pima IndiansDiabetes

WholesaleCustomers Wine Ecoli Ionosphere


Table 7 Rankings of the GRA for the five assigned UCI data sets

Breast Tissue Fertility Iris ContraceptiveMethod Choice Dermatology


Complexity 11

rankings of the clustering algorithms are 6 4 2 5 3 and 1which are in line with EM FF FC HC MD and KM 0usthe best clustering algorithm for the given data sets is the KMalgorithm In addition we conduct a statistical analysis ofthe rankings obtained for the 20 UCI data sets to comparethe results generated by our proposed model 0e analysisresults are reported in Table 11

In Table 11 the number of each position ranking can bedetermined according to Tables 5ndash8 For example forranking 1 of the upper position the numbers of clusteringalgorithms are 1 3 9 8 3 and 12 respectively and therankings of the clustering algorithms are 6 45 2 3 45 and1 corresponding to EM FF FC HC MD and KM re-spectively However the rankings of the lower positions are

Table 8 Rankings of the PROMETHEE II for the five assigned UCI data sets

Liver Disorders HabermanrsquosSurvival

Blood TransfusionService Center

Page BlocksClassification Sonar


Table 9 Rankings of four MCDM methods for a total of 20 UCI data sets

RankData set

ZOO BalanceScale

Teaching AssistantEvaluation Spambase Yeast Data Pima Indians

Diabetes Wholesale Customers

1 KM EM FC FC FF KM FC2 FC HC KM KM EM FC KM3 EM FF MD MD HC HC MD4 MD MD FF FF MD MD EM5 FF FC HC HC FC FF HC6 HC KM EM EM KM EM FF

RankData set

Wine Ecoli Ionosphere Breast Tissue Fertility Iris Contraceptive MethodChoice

1 FC FF KM KM HC HC KM2 KM EM FC MD FF KM FC3 MD KM EM FC MD FC EM4 HC MD MD EM KM FF MD5 EM FC FF HC EM MD FF6 FF HC HC FF FC EM HC

Rank

Data set

Dermatology LiverDisorders

HabermanrsquosSurvival

BloodTransfusionService


1 FC FF KM KM HC MD2 KM MD FC FC MD HC3 EM FC HC HC KM EM4 MD KM FF FF FC FF5 FF EM MD MD FF FC6 HC HC EM EM EM KM

Table 10 Priority of each alternative

Position 1st 2nd bi5th 6th di fi RankingScore 2 1 1 2

EM 1 2 4 3 7 17 minus 13 6FF 3 1 7 6 3 12 minus 5 4FC 5 6 16 4 1 6 10 2HC 3 2 8 4 6 16 minus 8 5MD 1 3 5 3 0 3 2 3KM 7 6 20 0 3 6 14 1

Table 11 Statistical analysis of rankings for all 20 UCI data sets

RankingAlgorithm 1 2 3 4 5 6EM 1 3 4 3 2 7FF 3 2 1 5 6 3FC 9 3 3 0 4 1HC 8 1 1 1 3 6MD 3 4 3 8 2 0KM 12 1 3 1 2 1

12 Complexity

ignored When making decisions the overall situation af-fected by the decision-making process should be consideredto the maximum extent In this work we establish two sets ofalternatives in the lower and upper positions After therankings of the lower position are fully considered therankings of the clustering algorithms are 6 4 2 5 3 and 1respectively 0ese results are basically the same whichshows that our proposed model is feasible and effective

0erefore in this paper from an empirical perspectivethe effectiveness of our proposed model is examined andverified using six clustering algorithms nine externalmeasures and four MCDM methods on 20 UCI data setsincluding a total of 18310 instances and 313 attributesMoreover our proposed model merges expert wisdomusing the eighty-twenty rule which reports that eightypercent of the results originate from twenty percent of theactivity [58] and indicates that the twenty percent ofpeople who are creating eighty percent of the results arehighly leveraged 0us based on the expert wisdomoriginating from the twenty percent of the people the setof alternatives is classified into two categories where thetop of 15 of the alternatives is marked in an upper po-sition and the bottom of 15 is marked in a lower position0e empirical results also verify our proposed model andconfirm its ability to reduce and reconcile individualdifferences among the performance of clustering algo-rithms by employing a list of algorithm priorities in acomplex decision environment

6 Conclusions

Data clustering is often widely applied in the initial stage ofbig data analysis Clustering analysis can be used to examinemassive data sets of a variety of types to uncover unknowncorrelations hidden patterns and other potentially usefulinformation However Naldi et al [11] pointed out thatdifferent clustering algorithms may produce different datapartitions Furthermore the NFL theorem states that thereexists no single algorithm or model that can achieve the bestperformance for a given domain problem [23ndash25] 0ere-fore the focused question becomes how to select the bestclustering algorithms for the given data sets

0e decision-making process is extremely complex be-cause of competing interests of multiple stakeholders andthe intricacy of systems [8ndash10] 0is paper proposes aDMSECA model to estimate the performance of clusteringalgorithms in selecting the most satisfactory clustering al-gorithm according to the decision preferences of all indi-vidual participants during a complex decision-makingprocess 0e proposed model has been designed to reconcileindividual disagreements in the evaluation performance ofclustering algorithms 0e studies have shown that theDMSECA model which is based on the eighty-twenty rulecan generate a list of algorithm priorities and an optimalranking scheme that is the most satisfactory according to thedecision preferences of all individual participants involvedin a complex decision-making problem An experimentalstudy involves the use of 20 UCI data sets including a total of18310 instances and 313 attributes six clustering

algorithms nine external measures and four MCDMmethods in order to test and examine our proposed model

0e feasibility and effectiveness of the proposed modelare illustrated and verified by carrying out a statisticalanalysis of rankings for a total of 20 UCI data sets to allow fora comparison of the results with those generated by ourproposed model 0e results are basically the same as therankings of the clustering algorithms produced by ourproposed DMSECA model 0e empirical results show thatour proposed model cannot only identify the best clusteringalgorithms for the given data sets but also can reconcileindividual differences or even conflicts to achieve groupagreement among the evaluation performance of clusteringalgorithms in a complex decision-making environmentFinally a decision-making support model is proposed bymerging expert wisdom for secondary knowledge discoverybased on the 80-20 rule in order to focus the analysis on themost important positions of the rankings in relation to thenumber of observations for predictable imbalance

In future work a decision support system including dataspace method space model space and knowledge space willbe further developed which can deal with much moremethodsmodelsalgorithms such as general clusteringtheory of subspace clustering fuzzy clustering and densitypeak clustering in order to form a robust and effectivealgorithm selection and evaluation framework for improv-ing the universality of the application

Data Availability

0e data used to support the findings of this study are includedwithin the article and a total of 20 datasets are originated fromthe UCI repository (httparchiveicsucieduml)

Conflicts of Interest

0e authors declare that they have no conflicts of interest

Acknowledgments

0is research was partially supported by Grants from Fundfor Less Developed Regions of the National Natural ScienceFoundation of China (71761014) the State Key Program ofthe National Natural Science Foundation of China(71532007 71932008 and 91546201) the General Pro-gram of the National Natural Science Foundation of China(71471149) Major Project of the National Social ScienceFoundation of China (15ZDB153) and the PostdoctoralScience Foundation Project of China (2016M592683)

References

[1] Z Xu J Chen and J Wu ldquoClustering algorithm forintuitionistic fuzzy setsrdquo Information Sciences vol 178no 19 pp 3775ndash3790 2008

[2] W Hang K S Choi and S Wang Synchronization Clus-tering Based on Central Force Optimization and its Extensionfor Large-Scale Data Sets Elsevier Science Publishers BVAmsterdam Netherlands 2017

Complexity 13

[3] M Abavisani and V M Patel ldquoMulti-modal sparse and low-rank subspace clusteringrdquo Information Fusion vol 39pp 168ndash177 2018

[4] X Zhang and Z Xu ldquoHesitant fuzzy agglomerative hier-archical clustering algorithmsrdquo International Journal ofSystems Science vol 46 no 3 pp 562ndash576 2015

[5] Y Wang Z Sun and K Jia An Automatic Decoding Methodfor Morse Signal Based on Clustering Algorithm SpringerInternational Publishing Berlin Germany 2017

[6] C Zhang L Hao and L Fan ldquoOptimization and im-provement of data mining algorithm based on efficient in-cremental kernel fuzzy clustering for large datardquo ClusterComputing vol 22 no S2 pp 3001ndash3010 2018

[7] X Yang Z Xu and H Liao ldquoCorrelation coefficients ofhesitant multiplicative sets and their applications in decisionmaking and clustering analysisrdquo Applied Soft Computingvol 61 pp 935ndash946 2017

[8] J C Ascough II H R Maier J K Ravalico andM W Strudley ldquoFuture research challenges for incorpo-ration of uncertainty in environmental and ecological de-cision-makingrdquo Ecological Modelling vol 219 no 3-4pp 383ndash399 2008

[9] Z Xu and N Zhao ldquoInformation fusion for intuitionisticfuzzy decision making an overviewrdquo Information Fusionvol 28 pp 10ndash23 2016

[10] Z Xu and H Wang ldquoOn the syntax and semantics of virtuallinguistic terms for information fusion in decision makingrdquoInformation Fusion vol 34 pp 43ndash48 2017

[11] M C Naldi A C P L F Carvalho and R J G B CampelloldquoCluster ensemble selection based on relative validity in-dexesrdquo Data Mining and Knowledge Discovery vol 27 no 2pp 259ndash289 2013

[12] X L Xie and G Beni ldquoA validity measure for fuzzy clus-teringrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 13 no 8 pp 841ndash847 1991

[13] U Maulik and S Bandyopadhyay ldquoPerformance evaluationof some clustering algorithms and validity indicesrdquo IEEETransactions on Pattern Analysis and Machine Intelligencevol 24 no 12 pp 1650ndash1654 2002

[14] C H Chou M C Su and E Lai ldquoA new cluster validitymeasure and its application to image compressionrdquo PatternAnalysis and Applications vol 7 pp 205ndash220 2004

[15] S Sriparna andM Ujjwal ldquoUse of symmetry and stability fordata clusteringrdquo Evolutionary Intelligence vol 3 no 3-4pp 103ndash122 2010

[16] J C Dunn ldquoA fuzzy relative of the ISODATA process and itsuse in detecting compact well-separated clustersrdquo Journal ofCybernetics vol 3 no 3 pp 32ndash57 1973

[17] S Mahallati J C Bezdek D Kumar M R Popovic andT A Valiante ldquoInterpreting cluster structure in waveformdata with visual assessment and Dunnrsquos indexrdquo Frontiers inComputational Intelligence Springer Cham Switzerlandpp 73ndash101 2017

[18] D L Davies and D W Bouldin ldquoA cluster separationmeasurerdquo IEEE Transactions on Pattern Analysis and Ma-chine Intelligence vol PAMI-1 no 2 pp 224ndash227 1979

[19] V Bolandi A Kadkhodaie and R Farzi ldquoAnalyzing organicrichness of source rocks fromwell log data by using SVM andANN classifiers a case study from the Kazhdumi formationthe Persian Gulf basin offshore Iranrdquo Journal of PetroleumScience and Engineering vol 151 pp 224ndash234 2017

[20] M Brun C Sima J Hua et al ldquoModel-based evaluation ofclustering validation measuresrdquo Pattern Recognition vol 40no 3 pp 807ndash824 2007

[21] A K JainMNMurty and P J Flynn ldquoData clusteringrdquoACMComputing Surveys (CSUR) vol 31 no 3 pp 264ndash323 1999

[22] Y Abdullahi B Coetzee and L van den Berg ldquoRelationshipsbetween results of an internal and external match load de-termining method in male singles badminton playersrdquoJournal of Strength and Conditioning Research vol 33 no 4pp 1111ndash1118 2019

[23] D HWolpert andW GMacready ldquoNo free lunch theoremsfor optimizationrdquo IEEE Transactions on EvolutionaryComputation vol 1 no 1 pp 67ndash82 1997

[24] G Kou and W Wu ldquoAn analytic hierarchy model forclassification algorithms selection in credit risk analysisrdquoMathematical Problems in Engineering vol 2014 no 1Article ID 297563 2014

[25] D G Guillen and A R Espinosa ldquoA meta-analysis onclassification model performance in real-world datasets anexploratory viewrdquo Applied Artificial Intelligence vol 31no 9-10 pp 715ndash732 2018

[26] A Rodriguez and A Laio ldquoClustering by fast search and findof density peaksrdquo Science vol 344 no 6191 pp 1492ndash14962014

[27] H Parvin and B Minaei-Bidgoli ldquoA clustering ensembleframework based on selection of fuzzy weighted clusters in alocally adaptive clustering algorithmrdquo Pattern Analysis andApplications vol 18 no 1 pp 87ndash112 2015

[28] J Jiang D Hao Y Chen M Parmar and K Li ldquoGDPCgravitation-based density peaks clustering algorithmrdquoPhysica A Statistical Mechanics and Its Applications vol 502pp 345ndash355 2018

[29] J Jiang W Zhou L Wang X Tao and K Li ldquoHaloDPC Animproved recognition method on halo node for density peakclustering algorithmrdquo International Journal of Pattern Rec-ognition and Artificial Intelligence vol 33 no 8 2019

[30] X Tao Research and Improvement on Density Peak Clus-tering Algorithm and Application for Earthquake Classifi-cation Changchun Jilin University of Finance andEconomics (D) Changchun China 2017 In Chinese

[31] H Alizadeh B Minaei-Bidgoli and H Parvin ldquoOptimizingfuzzy cluster ensemble in string representationrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelli-gence vol 27 no 2 2013

[32] H Parvin and B Minaei-Bidgoli ldquoA clustering ensembleframework based on elite selection of weighted clustersrdquoAdvances in Data Analysis and Classification vol 7 no 2pp 181ndash208 2013

[33] S-o Abbasi S Nejatian H Parvin V Rezaie andK Bagherifard ldquoClustering ensemble selection consideringquality and diversityrdquo Artificial Intelligence Review vol 52no 2 pp 1311ndash1340 2019

[34] M Mojarad H Parvin S Nejatian and V Rezaie ldquoCon-sensus function based on clusters clustering and iterativefusion of base clustersrdquo International Journal of UncertaintyFuzziness and Knowledge-Based Systems vol 27 no 1pp 97ndash120 2019

[35] MMojarad S Nejatian H Parvin andMMohammadpoorldquoA fuzzy clustering ensemble based on cluster clustering anditerative fusion of base clustersrdquo Applied Intelligence vol 49no 7 pp 2567ndash2581 2019

[36] F Rashidi S Nejatian H Parvin and V Rezaie ldquoDiversitybased cluster weighting in cluster ensemble an informationtheory approachrdquoArtificial Intelligence Review vol 52 no 2pp 1341ndash1368 2019

[37] A Bagherinia B Minaei-Bidgoli M Hossinzadeh andH Parvin ldquoElite fuzzy clustering ensemble based on

14 Complexity

clustering diversity and quality measuresrdquo Applied Intelli-gence vol 49 no 5 pp 1724ndash1747 2019

[38] S Saha and S Bandyopadhyay ldquoSome connectivity basedcluster validity indicesrdquo Applied Soft Computing vol 12no 5 pp 1555ndash1565 2012

[39] K Y Yeung D R Haynor and W L Ruzzo ldquoValidatingclustering for gene expression datardquo Bioinformatics vol 17no 4 pp 309ndash318 2001

[40] M Halkidi Y Batistakis and M Vazirgiannis ldquoOn clus-tering validation techniquesrdquo Journal of Intelligent Infor-mation Systems vol 17 no 2-3 pp 107ndash145 2001

[41] V Roth M Braun T Lange and J M Buhmann Stability-Based Model Order Selection in Clustering with Applicationsto Gene Expression Data Springer Berlin Germany 2002

[42] K R Zalik ldquoCluster validity index for estimation of fuzzyclusters of different sizes and densitiesrdquo Pattern Recognitionvol 43 no 10 pp 3374ndash3390 2010

[43] C H Chou Y X Zhao and H P Tai ldquoVanishing-pointdetection based on a fuzzy clustering algorithm and newclustering validity measurerdquo Journal of Applied Science andEngineering vol 18 no 2 pp 105ndash116 2015

[44] M A Wani and R Riyaz ldquoA new cluster validity index usingmaximum cluster spread based compactness measurerdquo In-ternational Journal of Intelligent Computing and Cyberneticsvol 9 no 2 pp 179ndash204 2016

[45] M Azhagiri and A Rajesh ldquoA novel approach tomeasure thequality of cluster and finding intrusions using intrusionunearthing and probability clomp algorithmrdquo InternationalJournal of Information Technology vol 10 no 3 pp 329ndash337 2018

[46] F Azuaje ldquoA cluster validity framework for genome expressiondatardquo Bioinformatics vol 18 no 2 pp 319-320 2002

[47] R O Duda P E Hart and D G Stork Pattern Classifi-cation Wiley New York NY USA 2002

[48] E R Dougherty J Barrera M Brun et al ldquoInference fromclustering with application to gene-expression microarraysrdquoJournal of Computational Biology vol 9 no 1 pp 105ndash1262002

[49] S Dudoit and J Fridlyand ldquoA prediction-based resamplingmethod for estimating the number of clusters in a datasetrdquoGenome Biology vol 3 Article ID research00361 2002

[50] C A Sugar and GM James ldquoFinding the number of clustersin a datasetrdquo Journal of the American Statistical Associationvol 98 no 463 pp 750ndash763 2003

[51] Y Peng Y Zhang G Kou and Y Shi ldquoA multi-criteriadecision making approach for estimating the number ofclusters in a data setrdquo PLoS One vol 7 no 7 Article IDe41713 2012

[52] Y Peng Y Zhang G Kou J Li and Y Shi ldquoMulti-criteriadecision making approach for cluster validationrdquo in Pro-ceedings of the International Conference on ComputationalScience pp 1283ndash1291 Omaha NE USA 2012

[53] P Meyer and A-L Olteanu ldquoFormalizing and solving theproblem of clustering in MCDArdquo European Journal ofOperational Research vol 227 no 3 pp 494ndash502 2013

[54] L Chen Z Xu H Wang and S Liu ldquoAn ordered clusteringalgorithm based on K-means and the PROMETHEEmethodrdquo International Journal of Machine Learning andCybernetics vol 9 no 6 pp 917ndash926 2018

[55] H AMahdiraji E Kazimieras Zavadskas A Kazeminia andA Abbasi Kamardi ldquoMarketing strategies evaluation basedon big data analysis a CLUSTERING-MCDM approachrdquoEconomic Research-Ekonomska Istrazivanja vol 32 no 1pp 2882ndash2898 2019

[56] V Pareto Cours drsquoEconomie Politique Droz GenevaSwitzerland 1896

[57] B Franz Pareto John Wiley amp Sons New York NY USA1936

[58] R Cirillo ldquoWas Vilfredo Pareto really a ldquoprecursorrdquo offascismrdquo 9e American Journal of Economics and Sociologyvol 42 no 2 pp 235ndash246 2006

[59] R Xu and D WunschII ldquoSurvey of clustering algorithmsrdquoIEEE Transactions on Neural Networks vol 16 no 3pp 645ndash678 2005

[60] J Wu J Chen H Xiong and M Xie ldquoExternal validationmeasures for K-means clustering a data distribution per-spectiverdquo Expert Systems with Applications vol 36 no 3pp 6050ndash6061 2009

[61] Z Wang Z Xu S Liu and J Tang ldquoA netting clusteringanalysis method under intuitionistic fuzzy environmentrdquoApplied Soft Computing vol 11 no 8 pp 5558ndash5564 2011

[62] S Askari ldquoA novel and fast MIMO fuzzy inference systembased on a class of fuzzy clustering algorithms with inter-pretability and complexity analysisrdquo Expert Systems withApplications vol 84 pp 301ndash322 2017

[63] Q Li M Guindani B J Reich H D Bondell andM Vannucci ldquoA Bayesian mixture model for clustering andselection of feature occurrence rates under mean con-straintsrdquo Statistical Analysis and DataMining9e ASADataScience Journal vol 10 no 6 pp 393ndash409 2017

[64] A K Paul and P C Shill ldquoNew automatic fuzzy relationalclustering algorithms using multi-objective NSGA-IIrdquo In-formation Sciences vol 448-449 pp 112ndash133 2018

[65] J Han and M Kamber Data Mining Concepts andTechniques Morgan Kaufmann San Francisco CA USA2nd edition 2006

[66] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann 2ndedition 2005

[67] D S Hochbaum and D B Shmoys ldquoA best possible heuristicfor thek-center problemrdquo Mathematics of Operations Re-search vol 10 no 2 pp 180ndash184 1985

[68] G Fayyad and T Krishnan 9e EM Algorithm andExtensions Wiley-Interscience Hoboken NJ USA Secondedition 2008

[69] MHall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquo0e WEKA data mining softwarerdquo ACMSIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash18 2009

[70] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via theEMAlgorithmrdquoJournal of the Royal Statistical Society Series B (Methodo-logical) vol 39 no 1 pp 1ndash22 1977

[71] S M Kumar ldquoAn optimized farthest first clustering algo-rithmrdquo in Proceedings of the 2013 Nirma University Inter-national Conference on Engineering pp 1ndash5 AhmedabadIndia November 2013

[72] S Dasgupta and P M Long ldquoPerformance guarantees forhierarchical clusteringrdquo Journal of Computer and SystemSciences vol 70 no 4 pp 351ndash363 2005

[73] Y Peng and Y Shi ldquoEditorial multiple criteria decisionmaking and operations researchrdquo Annals of OperationsResearch vol 197 no 1 pp 1ndash4 2012

[74] S Hamdan and A Cheaitou ldquoSupplier selection and orderallocation with green criteria anMCDM andmulti-objectiveoptimization approachrdquo Computers amp Operations Researchvol 81 pp 282ndash304 2017

[75] J L Yang H N Chiu G-H Tzeng and R H Yeh ldquoVendorselection by integrated fuzzy MCDM techniques with

Complexity 15

independent and interdependent relationshipsrdquo InformationSciences vol 178 no 21 pp 4166ndash4183 2008

[76] B Wang and Y Shi ldquoError correction method in classifi-cation by using multiple-criteria and multiple-constraintlevels linear programmingrdquo International Journal of Com-puters Communications amp Control vol 7 no 5 pp 976ndash9892012

[77] J He Y Zhang Y Shi and G Huang ldquoDomain-drivenclassification based on multiple criteria and multiple con-straint-level programming for intelligent credit scoringrdquoIEEE Transactions on Knowledge and Data Engineeringvol 22 no 6 pp 826ndash838 2010

[78] Y Shi L Zhang Y Tian and X Li Intelligent Knowledge AStudy beyond Data Mining Springer Berlin Germany 2015

[79] L Zadeh ldquoOptimality and non-scalar-valued performancecriteriardquo IEEE Transactions on Automatic Control vol 8no 1 pp 59-60 1963

[80] P C FishburnAdditive Utilities with Incomplete Product SetApplications to Priorities and Assignments Operations Re-search Society of America (ORSA) Baltimore MD USA1967

[81] E Triantaphyllou Multi-Criteria Decision Making A Com-parative Study Kluwer Academic Publishers DordrechtNetherlands 2010

[82] E Triantaphyllou and K Baig ldquo0e impact of aggregatingbenefit and cost criteria in four MCDA methodsrdquo IEEETransactions on Engineering Management vol 52 no 2pp 213ndash226 2005

[83] J Deng ldquoControl problems of grey systemsrdquo Systems andControl Letters vol 1 pp 288ndash294 1982

[84] J DengGrey System Book Windsor Science and TechnologyInformation Services Albany NY USA 1988

[85] WWu G Kou and Y Peng ldquoGroup decision-making usingimproved multi-criteria decision making methods for creditrisk analysisrdquo Filomat vol 30 no 15 pp 4135ndash4150 2016

[86] WWu and Y Peng ldquoExtension of grey relational analysis forfacilitating group consensus to oil spill emergency man-agementrdquo Annals of Operations Research vol 238 no 1-2pp 615ndash635 2016

[87] D Liang A Kobina and W Quan ldquoGrey relational analysismethod for probabilistic linguistic multi-criteria group de-cision-making based on geometric bonferroni meanrdquo In-ternational Journal of Fuzzy Systems vol 20 no 7pp 2234ndash2244 2017

[88] E Onder and C Boz ldquoComparing macroeconomic per-formance of the union for the mediterranean countries usinggrey relational analysis and multi-dimensional scalingrdquoEuropean Scientific Journal vol 13 pp 285ndash299 2017

[89] J Deng ldquoIntroduction to grey theory systemrdquo9e Journal ofGrey System vol 1 no 1 pp 1ndash24 1989

[90] C L Hwang and K Yoon Multiple Attribute DecisionMaking Springer-Verlag Berlin Germany 1981

[91] G R Jahanshahloo F H Lotfi andM Izadikhah ldquoExtensionof the TOPSIS method for decision-making problems withfuzzy datardquo Applied Mathematics and Computation vol 181no 2 pp 1544ndash1551 2006

[92] S J Chen and C L Hwang Fuzzy Multiple Attribute De-cision Making Methods and Applications Springer-VerlagBerlin Germany 1992

[93] S Opricovic and G-H Tzeng ldquoCompromise solution byMCDM methods a comparative analysis of VIKOR andTOPSISrdquo European Journal of Operational Research vol 156no 2 pp 445ndash455 2004

[94] J P Brans and B Mareschal ldquoPROMETHEE methodsrdquo inMultiple Criteria Decision Analysis State of the Art SurveysJ Figueira V Mousseau and B Roy Eds pp 163ndash195Springer New York NY USA 2005

[95] C Hermans and J Erickson ldquoMulticriteria decision analysisoverview and implications for environmental decisionmakingrdquo Advances in the Economics of Environmental Re-sources vol 7 pp 213ndash228 2007

[96] H Kuang D M Kilgour and K W Hipel Grey-basedPROMETHEE II with Application to Evaluation of SourceWater Protection Strategies Information Sciences 2014

[97] J P Brans and B Mareschal ldquoHow to decide withPROMETHEErdquo 1994 httpwwwvisualdecisioncomPdfHow20to20use20prometheepdf

[98] J P Brans and P Vincke ldquoNote-A preference ranking or-ganisation methodrdquo Management Science vol 31 no 6pp 647ndash656 1985

[99] M Steinbach G Karypis and V Kumar ldquoA comparison ofdocument clustering techniquesrdquo in Proceedings of the SixthACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining Boston MA USA August 2000

[100] Y Zhao G Karypis and U Fayyad ldquoHierarchical clusteringalgorithms for document datasetsrdquo Data Mining andKnowledge Discovery vol 10 no 2 pp 141ndash168 2005

[101] E Rendon I Abundez A Arizmendi and E M QuirozldquoInternal versus external cluster validation indexesrdquo Inter-national Journal of Computers and Communications vol 5no 1 pp 27ndash34 2011

[102] Y Zhao and G Karypis ldquoEmpirical and theoretical com-parisons of selected criterion functions for document clus-teringrdquo Machine Learning vol 55 no 3 pp 311ndash331 2004

[103] R Baeza-Yates and B Ribeiro-Neto Modern InformationRetrieval Addison-Wesley Boston MA USA 1999

[104] N Slonim N Friedman and N Tishby ldquoUnsupervised doc-ument classification using sequential information maximiza-tionrdquo in Proceedings of the 25th annual international ACMSIGIR conference on Research and development in informationretrievalmdashSIGIRrsquo02 Tampere Finland August 2002

[105] B Mirkin Mathematical Classification and ClusteringKluwer Academic Press Dordrecht Netherlands 1996

[106] W M Rand ldquoObjective criteria for the evaluation of clus-tering methodsrdquo Journal of the American Statistical Asso-ciation vol 66 no 336 pp 846ndash850 1971

[107] S Jaccard ldquoNouvelles recherches surla distribution floralerdquoBulletin de la Societe vaudoise des sciences vol 44pp 223ndash270 1908

[108] E B Fowlkes and C L Mallows ldquoA method for comparingtwo hierarchical clusteringsrdquo Journal of the American Sta-tistical Association vol 78 no 383 pp 553ndash569 1983

[109] L Hubert and P Arabie ldquoComparing partitionsrdquo Journal ofClassification vol 2 no 1 pp 193ndash218 1985

[110] D Badescu A Boc A Banire Diallo and V MakarenkovldquoDetecting genomic regions associated with a disease usingvariability functions and Adjusted Rand Indexrdquo BMC Bio-informatics vol 12 no S-9 pp 1ndash10 2011

[111] T L Saaty 9e Analytic Hierarchy Process McGraw-HillNew York NY USA 1980

[112] WWu G Kou Y Peng and D Ergu ldquoImproved ahp-groupdecision making for investment strategy selectionrdquo Tech-nological and Economic Development of Economy vol 18no 2 pp 299ndash316 2012

[113] S Tyagi S Agrawal K Yang and H Ying ldquoAn extended Fuzzy-AHP approach to rank the influences of socialization-

16 Complexity

externalization-combination-internalization modes on the de-velopment phaserdquo Applied Soft Computing vol 52 pp 505ndash5182017

[114] I Takahashi ldquoAHP applied to binary and ternary compar-isonsrdquo Journal of the Operations Research Society of Japanvol 33 no 3 pp 199ndash206 2017

[115] C-S Yu ldquoA GP-AHP method for solving group decision-making fuzzy AHP problemsrdquo Computers amp OperationsResearch vol 29 no 14 pp 1969ndash2001 2002

[116] M Kamal and A H Al-Subhi ldquoApplication of the AHP inproject managementrdquo International Journal of ProjectManagement vol 19 no 1 pp 19ndash27 2001

[117] C A Bana e Costa and J C Vansnick ldquoA critical analysis ofthe eigenvalue method used to derive priorities in AHPrdquoEuropean Journal of Operational Research vol 187 no 3pp 1422ndash1428 2008

[118] T Ertay D Ruan and U Tuzkaya ldquoIntegrating data en-velopment analysis and analytic hierarchy for the facilitylayout design in manufacturing systemsrdquo Information Sci-ences vol 176 no 3 pp 237ndash262 2006

[119] M Dagdeviren S Yavuz and N Kılınccedil ldquoWeapon selectionusing the AHP and TOPSIS methods under fuzzy envi-ronmentrdquo Expert Systems with Applications vol 36 no 4pp 8143ndash8151 2009

[120] M P Amiri ldquoProject selection for oil-fields development byusing the AHP and fuzzy TOPSIS methodsrdquo Expert Systemswith Applications vol 37 no 9 pp 6218ndash6224 2010

[121] X Yu S Guo J Guo and X Huang ldquoRank B2C e-commercewebsites in e-alliance based on AHP and fuzzy TOPSISrdquoExpert Systems with Applications vol 38 no 4 pp 3550ndash3557 2011

[122] Y Peng G Kou G Wang W Wu and Y Shi ldquoEnsemble ofsoftware defect predictors an AHP-based evaluationmethodrdquo International Journal of Information Technology ampDecision Making vol 10 no 1 pp 187ndash206 2011

[123] P Domingos ldquoToward knowledge-rich data miningrdquo DataMining and Knowledge Discovery vol 15 no 1 pp 21ndash282007

[124] G Kou and W Wu ldquoMulti-criteria decision analysis foremergency medical service assessmentrdquo Annals of Opera-tions Research vol 223 no 1 pp 239ndash254 2014

[125] A Frank and A Asuncion UCI Machine Learning Reposi-tory University of California School of Information andComputer Science Irvine CA USA 2010 httparchiveicsucieduml

Complexity 17

index [12] the I-index [13] the CS index [14 15] Dunnrsquosindex [16 17] and the DaviesndashBouldin (DB) index [18 19]0ese validity measures are often divided into the threecategories of external relative and internal measures[20ndash22] External measures compare the partitions pro-duced by clustering algorithms with a given data partition[20 22] Relative measures compare partitions produced bythe same clustering algorithm with discrepant data subsetsor diverse parameters [22] Internal measures depend oncomputing the property of the resulting clusters [22] Brrnet al [20] stated that relative and internal measures fail inpredicting and locating error produced by clustering algo-rithms and external measures for evaluating clusteringresults perform more effectively 0erefore in our empiricalresearch we will select external measures to evaluate andmeasure the performance of clustering algorithms

0e theorem of No Free Lunch (NFL) states that thereexists no single model or algorithm which can get the bestperformance for a given domain problem [23ndash25] It sug-gests that the evaluation of clustering algorithms is verycomplicated and challenging Moreover different clusteringalgorithms may produce different or conflicting partitions0e motivation of this paper fixates on the evaluation ofclustering algorithms to reconcile different or even con-flicting evaluation performance Besides the reconciliationof these differences or conflicts is an important problemwhich has not been fully investigated In addition theevaluation of clustering algorithms usually involves multiplecriteria which are modeled as an MCDM problem So basedon MCDM this paper proposes a model called decision-making support for evaluation of clustering algorithms(DMSECA) to evaluate and measure the performance ofclustering algorithms and further to reconcile their differ-ences or even conflicts among the evaluation performance ofclustering algorithms during a complex decision-makingprocess

0e proposed model consists of three steps First weapply the six most influential clustering algorithms to taskmodeling on 20 UCI data sets with a total of 18310 instancesand 313 attributes Second based on nine external measureswe employ four commonly used MCDM approaches to rankthe performance of clustering algorithms over the 20 UCIdata sets 0ird based on the eighty-twenty rule we proposea decision-making support model to generate a list of al-gorithm priorities to identify the best clustering algorithmamong 20 UCI data sets for secondary mining andknowledge discovery Each MCDM method is randomlyassigned to five UCI data sets

0e contributions of this article are threefold first ourproposed DMSECA model can identify the best clusteringalgorithms for the given data sets by a generated list ofalgorithm priorities during a complex decision-makingprocess Second the proposed model can reconcile thosedifferences or even conflicts to achieve agreement in terms ofthe clustering algorithm evaluation 0ird based on theeighty-twenty rule the expert wisdom is merged to proposea decision-making support model to carry out secondaryknowledge discovery for information fusion in a complexdecision-making environment

0e rest of this article is organized as follows Section 2reviews the related work Section 3 describes some pre-liminaries such as clustering algorithms MCDM methodsand external measures Section 4 proposes our model bymerging the expert wisdom to reconcile disagreementsamong the clustering algorithms Section 5 presents the datasets provides the experimental design shows empiricalresults and discusses the significance of this work Section 6summarizes this article

2 Related Work

Cluster analysis aims to classify elements into categories onthe basis of their similarity [26] In recent years manyclustering algorithms have been proposed [26ndash29] 0edensity peak clustering has been published by Rodriguez andLaio in Science [26] In view of the low objectivity andaccuracy because of the man-made factor a density frag-ment clustering without peaks is proposed based on densitypeak clustering [30] Jiang et al [28] developed GDPC al-gorithm with an alternative decision graph based on grav-itation theory and nearby distance to identify centroids andanomalies accurately In order to overcome the defect of theoriginal DPC in detecting anomalies and hub nodes Jianget al [29] proposed an improved recognition method on thehalo node for density peak clustering algorithm (halo DPC)[29] 0e proposed halo DPC can improve the ability to dealwith varying densities irregular shapes the number ofclusters outlier and hub node detection [29]

Clustering ensemble has been increasingly popular in therecent years by consolidating several base clusteringmethods into a probably better and more robust one Ali-zadeh et al [31] presented a novel optimization-basedmethod for the combination of cluster ensembles Parvinand Minaei-Bidgoli [32] proposed a weighted locallyadaptive clustering (WLAC) algorithmwhich is based on theLAC algorithm Considering that some features have moreinformation than the others in a dataset Parvin and Minaei-Bidgoli [27] proposed a fuzzy weighted locally adaptiveclustering (FWLAC) algorithm which is capable of handlingimbalanced clustering Abbasi et al [33] proposed a criterionto assess the association between a cluster and a partitionwhich is called edited normalized mutual informationENMI criterion Mojarad et al [34] presented a clusteringensemble method named RCEIFBC with a new aggregationfunction which takes into account the two similarity criteria(a) one of them is the cluster-cluster similarity and (b) theother one is the object-cluster similarity Mojarad et al [35]proposed an ensemble aggregator or a consensus functioncalled as the robust clustering ensemble based on samplingand cluster clustering (RCESCC) algorithm in order to getbetter clustering results Rashidi et al [36] proposed a newclustering ensemble approach using a weighting strategy forperforming consensus clustering by exploiting the clusteruncertainty concept Bagherinia et al [37] proposed a novelfuzzy clustering ensemble framework based on a new fuzzydiversity measure and a fuzzy quality measure to find thebase clusterings with the best performance In clusteringensemble multiple clustering outputs can be combined to

2 Complexity









3 Preliminaries



Complexity 3













xijprime

xij minus mini xij


(1)


xijprime

maxi xij minus xij


(2)


xijprime 1 minus

xij minus xob

11138681113868111386811138681113868

11138681113868111386811138681113868



Ai 1113944k



4 Complexity



R





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦



Rprime





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦







11138681113868111386811138681113868

11138681113868111386811138681113868

Δ



⋮ ⋮ ⋮ ⋮


⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(8)






bi 1n

1113944

n

j1r0i(j) (10)


βi bi

1113936ni1 bi

(11)



aij xij

1113936mi1 xij1113872 1113873

21113969 (1le ilem 1le jle n) (12)




mi1 wj 1


Vlowast

vlowast1 vlowast2 v

lowastn1113864 1113865 max

ivij | j isin J1113874 1113875 min

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

Vminus

vminus1 v

minus2 v

minusn1113864 1113865 min

ivij | j isin J1113874 1113875 max

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

(14)


S+i

1113944

n

j1V


2

11139741113972


Sminusi

1113944

n

j1V


2

11139741113972


(15)


Yi Sminus

i

S+i + Sminus

i

(1le ilem) (16)





Complexity 5







π(a b) 1113944k

j1pj(a b)wj

π(a b) 1113944k

j1pj(b a)wj

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

(18)


kj1 wj 1(1le klem)




ϕ+(a)

1n minus 1

1113944xisinA

π(a x) (19)


ϕminus(a)

1n minus 1

1113944xisinA

π(a x) (20)



(a) (21)








E minus 1113944i

ni

n1113944

j

nij

ni

lognij

ni

⎛⎝ ⎞⎠ (22)



P 1113944i

ni

nmax

j

nij

ni

1113888 1113889 (23)




precision nij

nj

recall nij

ni

(24)

6 Complexity





MAP 1113936jα Cj1113872 1113873

1113936jα Cj1113872 1113873 + β Cj1113872 1113873 (25)



M 1113944i

n2i + 1113944

j

n2j minus 21113944

i

1113944j

n2ij (26)









R (a + d)

M (27)



J a

(a + b + c) (28)



FMI

a

a + bmiddot

a

a + c

1113970

(29)




((a + c) +(a + b)2) minus ((a + c) +(a + b)M) (30)




Partition CΣ

C1 C2 Ck

Partition P


Complexity 7













x nlowast15

(31)



x nlowast 45

(32)




8 Complexity





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17









3 Preliminaries



Complexity 3













xijprime

xij minus mini xij


(1)


xijprime

maxi xij minus xij


(2)


xijprime 1 minus

xij minus xob

11138681113868111386811138681113868

11138681113868111386811138681113868



Ai 1113944k



4 Complexity



R





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦



Rprime





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦







11138681113868111386811138681113868

11138681113868111386811138681113868

Δ



⋮ ⋮ ⋮ ⋮


⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(8)






bi 1n

1113944

n

j1r0i(j) (10)


βi bi

1113936ni1 bi

(11)



aij xij

1113936mi1 xij1113872 1113873

21113969 (1le ilem 1le jle n) (12)




mi1 wj 1


Vlowast

vlowast1 vlowast2 v

lowastn1113864 1113865 max

ivij | j isin J1113874 1113875 min

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

Vminus

vminus1 v

minus2 v

minusn1113864 1113865 min

ivij | j isin J1113874 1113875 max

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

(14)


S+i

1113944

n

j1V


2

11139741113972


Sminusi

1113944

n

j1V


2

11139741113972


(15)


Yi Sminus

i

S+i + Sminus

i

(1le ilem) (16)





Complexity 5







π(a b) 1113944k

j1pj(a b)wj

π(a b) 1113944k

j1pj(b a)wj

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

(18)


kj1 wj 1(1le klem)




ϕ+(a)

1n minus 1

1113944xisinA

π(a x) (19)


ϕminus(a)

1n minus 1

1113944xisinA

π(a x) (20)



(a) (21)








E minus 1113944i

ni

n1113944

j

nij

ni

lognij

ni

⎛⎝ ⎞⎠ (22)



P 1113944i

ni

nmax

j

nij

ni

1113888 1113889 (23)




precision nij

nj

recall nij

ni

(24)

6 Complexity





MAP 1113936jα Cj1113872 1113873

1113936jα Cj1113872 1113873 + β Cj1113872 1113873 (25)



M 1113944i

n2i + 1113944

j

n2j minus 21113944

i

1113944j

n2ij (26)









R (a + d)

M (27)



J a

(a + b + c) (28)



FMI

a

a + bmiddot

a

a + c

1113970

(29)




((a + c) +(a + b)2) minus ((a + c) +(a + b)M) (30)




Partition CΣ

C1 C2 Ck

Partition P


Complexity 7













x nlowast15

(31)



x nlowast 45

(32)




8 Complexity





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17













xijprime

xij minus mini xij


(1)


xijprime

maxi xij minus xij


(2)


xijprime 1 minus

xij minus xob

11138681113868111386811138681113868

11138681113868111386811138681113868



Ai 1113944k



4 Complexity



R





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦



Rprime





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦







11138681113868111386811138681113868

11138681113868111386811138681113868

Δ



⋮ ⋮ ⋮ ⋮


⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(8)






bi 1n

1113944

n

j1r0i(j) (10)


βi bi

1113936ni1 bi

(11)



aij xij

1113936mi1 xij1113872 1113873

21113969 (1le ilem 1le jle n) (12)




mi1 wj 1


Vlowast

vlowast1 vlowast2 v

lowastn1113864 1113865 max

ivij | j isin J1113874 1113875 min

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

Vminus

vminus1 v

minus2 v

minusn1113864 1113865 min

ivij | j isin J1113874 1113875 max

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

(14)


S+i

1113944

n

j1V


2

11139741113972


Sminusi

1113944

n

j1V


2

11139741113972


(15)


Yi Sminus

i

S+i + Sminus

i

(1le ilem) (16)





Complexity 5







π(a b) 1113944k

j1pj(a b)wj

π(a b) 1113944k

j1pj(b a)wj

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

(18)


kj1 wj 1(1le klem)




ϕ+(a)

1n minus 1

1113944xisinA

π(a x) (19)


ϕminus(a)

1n minus 1

1113944xisinA

π(a x) (20)



(a) (21)








E minus 1113944i

ni

n1113944

j

nij

ni

lognij

ni

⎛⎝ ⎞⎠ (22)



P 1113944i

ni

nmax

j

nij

ni

1113888 1113889 (23)




precision nij

nj

recall nij

ni

(24)

6 Complexity





MAP 1113936jα Cj1113872 1113873

1113936jα Cj1113872 1113873 + β Cj1113872 1113873 (25)



M 1113944i

n2i + 1113944

j

n2j minus 21113944

i

1113944j

n2ij (26)









R (a + d)

M (27)



J a

(a + b + c) (28)



FMI

a

a + bmiddot

a

a + c

1113970

(29)




((a + c) +(a + b)2) minus ((a + c) +(a + b)M) (30)




Partition CΣ

C1 C2 Ck

Partition P


Complexity 7













x nlowast15

(31)



x nlowast 45

(32)




8 Complexity





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17



R





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦



Rprime





⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦







11138681113868111386811138681113868

11138681113868111386811138681113868

Δ



⋮ ⋮ ⋮ ⋮


⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦


(8)






bi 1n

1113944

n

j1r0i(j) (10)


βi bi

1113936ni1 bi

(11)



aij xij

1113936mi1 xij1113872 1113873

21113969 (1le ilem 1le jle n) (12)




mi1 wj 1


Vlowast

vlowast1 vlowast2 v

lowastn1113864 1113865 max

ivij | j isin J1113874 1113875 min

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

Vminus

vminus1 v

minus2 v

minusn1113864 1113865 min

ivij | j isin J1113874 1113875 max

ivij j isin Jprime

11138681113868111386811138681113874 111387511138831113882

(14)


S+i

1113944

n

j1V


2

11139741113972


Sminusi

1113944

n

j1V


2

11139741113972


(15)


Yi Sminus

i

S+i + Sminus

i

(1le ilem) (16)





Complexity 5







π(a b) 1113944k

j1pj(a b)wj

π(a b) 1113944k

j1pj(b a)wj

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

(18)


kj1 wj 1(1le klem)




ϕ+(a)

1n minus 1

1113944xisinA

π(a x) (19)


ϕminus(a)

1n minus 1

1113944xisinA

π(a x) (20)



(a) (21)








E minus 1113944i

ni

n1113944

j

nij

ni

lognij

ni

⎛⎝ ⎞⎠ (22)



P 1113944i

ni

nmax

j

nij

ni

1113888 1113889 (23)




precision nij

nj

recall nij

ni

(24)

6 Complexity





MAP 1113936jα Cj1113872 1113873

1113936jα Cj1113872 1113873 + β Cj1113872 1113873 (25)



M 1113944i

n2i + 1113944

j

n2j minus 21113944

i

1113944j

n2ij (26)









R (a + d)

M (27)



J a

(a + b + c) (28)



FMI

a

a + bmiddot

a

a + c

1113970

(29)




((a + c) +(a + b)2) minus ((a + c) +(a + b)M) (30)




Partition CΣ

C1 C2 Ck

Partition P


Complexity 7













x nlowast15

(31)



x nlowast 45

(32)




8 Complexity





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17







π(a b) 1113944k

j1pj(a b)wj

π(a b) 1113944k

j1pj(b a)wj

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

(18)


kj1 wj 1(1le klem)




ϕ+(a)

1n minus 1

1113944xisinA

π(a x) (19)


ϕminus(a)

1n minus 1

1113944xisinA

π(a x) (20)



(a) (21)








E minus 1113944i

ni

n1113944

j

nij

ni

lognij

ni

⎛⎝ ⎞⎠ (22)



P 1113944i

ni

nmax

j

nij

ni

1113888 1113889 (23)




precision nij

nj

recall nij

ni

(24)

6 Complexity





MAP 1113936jα Cj1113872 1113873

1113936jα Cj1113872 1113873 + β Cj1113872 1113873 (25)



M 1113944i

n2i + 1113944

j

n2j minus 21113944

i

1113944j

n2ij (26)









R (a + d)

M (27)



J a

(a + b + c) (28)



FMI

a

a + bmiddot

a

a + c

1113970

(29)




((a + c) +(a + b)2) minus ((a + c) +(a + b)M) (30)




Partition CΣ

C1 C2 Ck

Partition P


Complexity 7













x nlowast15

(31)



x nlowast 45

(32)




8 Complexity





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17





MAP 1113936jα Cj1113872 1113873

1113936jα Cj1113872 1113873 + β Cj1113872 1113873 (25)



M 1113944i

n2i + 1113944

j

n2j minus 21113944

i

1113944j

n2ij (26)









R (a + d)

M (27)



J a

(a + b + c) (28)



FMI

a

a + bmiddot

a

a + c

1113970

(29)




((a + c) +(a + b)2) minus ((a + c) +(a + b)M) (30)




Partition CΣ

C1 C2 Ck

Partition P


Complexity 7













x nlowast15

(31)



x nlowast 45

(32)




8 Complexity





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17













x nlowast15

(31)



x nlowast 45

(32)




8 Complexity





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17





fi bi minus di (33)









Complexity 9












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17












10 Complexity















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17















Complexity 11









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17









RankData set

ZOO BalanceScale




RankData set



Rank

Data set











12 Complexity



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17



6 Conclusions






Data Availability




Acknowledgments


References



Complexity 13




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17




































14 Complexity








































Complexity 15








































16 Complexity














Complexity 17








































Complexity 15








































16 Complexity














Complexity 17








































16 Complexity














Complexity 17














Complexity 17

Decision...

Documents

Transcript of Decision...