A fast particle swarm optimization for clustering

18
Soft Comput DOI 10.1007/s00500-014-1255-3 METHODOLOGIES AND APPLICATION A fast particle swarm optimization for clustering Chun-Wei Tsai · Ko-Wei Huang · Chu-Sing Yang · Ming-Chao Chiang © Springer-Verlag Berlin Heidelberg 2014 Abstract This paper presents a high-performance method to reduce the time complexity of particle swarm optimization (PSO) and its variants in solving the partitional clustering problem. The proposed method works by adding two addi- tional operators to the PSO-based algorithms. The pattern reduction operator is aimed to reduce the computation time, by compressing at each iteration patterns that are unlikely to change the clusters to which they belong thereafter while the multistart operator is aimed to improve the quality of the clustering result, by enforcing the diversity of the population to prevent the proposed method from getting stuck in local optima. To evaluate the performance of the proposed method, we compare it with several state-of-the-art PSO-based meth- ods in solving data clustering, image clustering, and code- book generation problems. Our simulation results indicate that not only can the proposed method significantly reduce the computation time of PSO-based algorithms, but it can also provide a clustering result that matches or outperforms the result PSO-based algorithms by themselves can provide. Keywords Clustering · Particle swarm optimization · Pattern reduction Communicated by W. Pedrycz. C.-W. Tsai Department of Applied Informatics and Multimedia, Chia Nan University of Pharmacy and Science, Tainan 71710, Taiwan, R.O.C. C.-W. Tsai · K.-W. Huang · C.-S. Yang Institute of Computer and Communication Engineering, Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan, R.O.C. M.-C. Chiang (B ) Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan, R.O.C. e-mail: [email protected] 1 Introduction As a traditional optimization problem, the partitional clus- tering problem (Jain et al. 1999; Xu and Wunsch-II 2005) has attracted particular attention from researchers in recent years because of the need to analyze and understand infor- mation hidden in the datasets coming from different sources. The clustering problem, which refers to the process of split- ting dissimilar data into disjoint clusters and grouping similar data into the same cluster based on some predefined similar- ity metric, has been used in a variety of areas such as web mining, image processing, bioinformatics, and pattern recog- nition (Leuski 2001; Xu et al. 2003; Getz et al. 2003; Xu and Wunsch-II 2008; Cai et al. 2007; Lughofer 2008; Zhang et al. 2010; Kuo et al. 2011). An optimal clustering is a partition- ing that minimizes the intra-cluster distance and maximizes the inter-cluster distance at the same time. As is well known, the clustering problem is NP-hard Kogan (2007); thus, clustering algorithms usually take a tremendous amount of time to find just an approximate solu- tion. Worse, many datasets created in our daily life require terabytes or even petabytes of space. Examples are web con- tents, magnetic resonance imaging data, and multimedia con- tents (Jain et al. 1999; Kogan 2007). Many researchers Xu and Wunsch-II (2008) have focused their attention on either finding a better solution or accelerating the speed of cluster- ing algorithms, by using techniques such as data sampling (Ng and Han 2002; Bradley and Fayyad 1998a; Cheng et al. 1998), data condensation Zhang et al. (1996), density- based approaches Ester et al. (1996), divide and conquer Guha et al. (2003), and incremental learning Hammouda and Kamel (2004); Ordonez and Omiecinski (2004); Bagirov et al. (2011). An incremental approach Ordonez and Omiecin- ski (2004) was used to overcome the memory space prob- lem caused by datasets that are too large to be loaded into 123

Transcript of A fast particle swarm optimization for clustering

Page 1: A fast particle swarm optimization for clustering

Soft ComputDOI 10.1007/s00500-014-1255-3

METHODOLOGIES AND APPLICATION

A fast particle swarm optimization for clustering

Chun-Wei Tsai · Ko-Wei Huang · Chu-Sing Yang ·Ming-Chao Chiang

© Springer-Verlag Berlin Heidelberg 2014

Abstract This paper presents a high-performance methodto reduce the time complexity of particle swarm optimization(PSO) and its variants in solving the partitional clusteringproblem. The proposed method works by adding two addi-tional operators to the PSO-based algorithms. The patternreduction operator is aimed to reduce the computation time,by compressing at each iteration patterns that are unlikelyto change the clusters to which they belong thereafter whilethe multistart operator is aimed to improve the quality of theclustering result, by enforcing the diversity of the populationto prevent the proposed method from getting stuck in localoptima. To evaluate the performance of the proposed method,we compare it with several state-of-the-art PSO-based meth-ods in solving data clustering, image clustering, and code-book generation problems. Our simulation results indicatethat not only can the proposed method significantly reducethe computation time of PSO-based algorithms, but it canalso provide a clustering result that matches or outperformsthe result PSO-based algorithms by themselves can provide.

Keywords Clustering · Particle swarm optimization ·Pattern reduction

Communicated by W. Pedrycz.

C.-W. TsaiDepartment of Applied Informatics and Multimedia, Chia NanUniversity of Pharmacy and Science, Tainan 71710, Taiwan, R.O.C.

C.-W. Tsai · K.-W. Huang · C.-S. YangInstitute of Computer and Communication Engineering,Department of Electrical Engineering, National ChengKung University, Tainan 70101, Taiwan, R.O.C.

M.-C. Chiang (B)Department of Computer Science and Engineering,National Sun Yat-sen University, Kaohsiung 80424, Taiwan, R.O.C.e-mail: [email protected]

1 Introduction

As a traditional optimization problem, the partitional clus-tering problem (Jain et al. 1999; Xu and Wunsch-II 2005)has attracted particular attention from researchers in recentyears because of the need to analyze and understand infor-mation hidden in the datasets coming from different sources.The clustering problem, which refers to the process of split-ting dissimilar data into disjoint clusters and grouping similardata into the same cluster based on some predefined similar-ity metric, has been used in a variety of areas such as webmining, image processing, bioinformatics, and pattern recog-nition (Leuski 2001; Xu et al. 2003; Getz et al. 2003; Xu andWunsch-II 2008; Cai et al. 2007; Lughofer 2008; Zhang et al.2010; Kuo et al. 2011). An optimal clustering is a partition-ing that minimizes the intra-cluster distance and maximizesthe inter-cluster distance at the same time.

As is well known, the clustering problem is NP-hardKogan (2007); thus, clustering algorithms usually take atremendous amount of time to find just an approximate solu-tion. Worse, many datasets created in our daily life requireterabytes or even petabytes of space. Examples are web con-tents, magnetic resonance imaging data, and multimedia con-tents (Jain et al. 1999; Kogan 2007). Many researchers Xuand Wunsch-II (2008) have focused their attention on eitherfinding a better solution or accelerating the speed of cluster-ing algorithms, by using techniques such as data sampling(Ng and Han 2002; Bradley and Fayyad 1998a; Cheng etal. 1998), data condensation Zhang et al. (1996), density-based approaches Ester et al. (1996), divide and conquerGuha et al. (2003), and incremental learning Hammouda andKamel (2004); Ordonez and Omiecinski (2004); Bagirov etal. (2011). An incremental approach Ordonez and Omiecin-ski (2004) was used to overcome the memory space prob-lem caused by datasets that are too large to be loaded into

123

Page 2: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

memory at once. Using sampling to generate better initialsolutions (centroids), Cheng et al. (1998) was presented toreduce the computation time of a clustering algorithm. Thetriangle inequality method Elkan (2003) was developed toreduce the computation time of assigning patterns to the clus-ters to which they belong. Moreover, an efficient method Luet al. (2004) was presented to speed up the calculation ofmeasurement.

In addition to the aforementioned approaches, severalstudies are aimed at enhancing the speed of clustering algo-rithms, by reducing the number of comparisons betweenpatterns and centroids for large-scale and high-dimensionaldatasets. These studies can be divided into three categories:

– Dimension reduction: As observed by Xu and Wunsch-II(2005), dimensional reduction, or feature selection, is animportant research topic for clustering because most ofthe efficient clustering algorithms designed for large-scaledatasets are not as efficient for high-dimensional datasets.Principal component analysis Ding and He (2004) isa straightforward method for dimensional reduction, byselecting and clustering a small number of features (alsocalled attributes). A quantization method was used inEschrich et al. (2003) to reduce the number of featuresof each pattern so as to accelerate the fuzzy c-means.

– Centroid reduction: To reduce the computation time ofclustering algorithms, numerous studies (Kaukoranta et al.2000; Lai et al. 2008, 2009; Buzo et al. 1980; Kekre andSarode 2009) tried to reduce the number of calculations infinding the closest centroid of each pattern. In Kaukorantaet al. (2000); Lai et al. (2008, 2009), the centroids are clas-sified into two clusters—static and active—to eliminateall the impossible centroids, thus reducing the number ofunnecessary calculations taken to find the cluster to whicha pattern belongs. Another method is to construct a cen-troid structure to help search for the applicable cluster ofeach pattern. The binary tree structured for reducing thesearch time of the closest codeword (centroid) is a goodexample Buzo et al. (1980); Kekre and Sarode (2009).

– Pattern reduction: Different from dimensional reduction,pattern reduction (PR) provides an effective and efficientstrategy for reducing the computation time of both high-dimensional and large-scale datasets. Compared with cen-troid reduction, pattern reduction is not limited to cluster-ing algorithms aimed at finding clusters to which patternsbelong. The focus in Bradley et al. (1998b); Tsai et al.(2007); Chiang et al. (2011) is on finding and compress-ing patterns, which we will refer to as the “static” patternsthroughout the rest of the paper, that are unlikely to changethe clusters to which they belong to reduce the computationtime of clustering algorithms. By finding and compressingpatterns that are unlikely to change the clusters to whichthey belong, the pattern reduction method can significantly

reduce the computation time of most operators of a cluster-ing algorithm. However, most pattern reduction methodsare suffered from the problem of losing quality becausethe compression operator they use may prevent patternsfrom being assigned to the right group.

Particle swarm optimization (PSO) Omran et al. (2002);van der Merwe and Engelbrecht (2003) was proposed to solvethe clustering problem, by using multiple search directionswith social behavior to enhance the quality of the cluster-ing result. An important advantage of PSO is that it can beused to avoid the problem of converging to the nearest localoptimum Paterlini and Krink (2006). In addition to data andimage clustering, PSO-based clustering algorithms have beensuccessfully applied to many other problems (Parsopoulosand Vrahatis 2010; Banks et al. 2008), such as gene clus-tering Xiao et al. (2003), wireless sensors clustering Kulka-rni and Venayagamoorthy (2011); Tillett et al. (2003), andvector quantization Chen et al. (2005); Feng et al. (2007).However, the computation time issue will certainly affectthe performance of a system that uses PSO to cluster largedatasets online. To obviate this problem, this study is focusedon reducing the computation time of PSO-based clusteringalgorithms while at the same time attempting to retain or evenimprove the quality of the clustering results.

The remainder of the paper is organized as follows. Sec-tion 2 first defines the clustering problem and then reviewsthe PSO-based algorithms for clustering. Section 3 describesin detail the proposed algorithm. Performance evaluation ofthe proposed algorithm is presented in Sect. 4. Conclusion isdrawn in Sect. 5.

2 Related work

2.1 The clustering problem

Mathematically, an optimal clustering Theodoridis andKoutroumbas (2009); Xiang et al. (2008) can be definedas a partition of the input vectors as follows: Given a set ofpatterns X = {x1, x2, . . . , xn} in d-dimensional space, theoutputs of an optimal clustering are a partitioning of X intok clusters� = {π1, π2, . . . , πk}1 and a set of means or cen-troids C = {c1, c2, . . . , ck} in the same space such that

ci = 1

|πi |∑

∀x∈πi

x, (1)

and

πi = {x ∈ X | d(x, ci ) ≤ d(x, c j ),∀i �= j}, (2)

where d(·) is a predefined function for measuring the simi-larity between patterns and means, which depends, to a large

1 That is, X = ∪ki=1πi and ∀i �= j, πi ∩ π j = ∅.

123

Page 3: A fast particle swarm optimization for clustering

A fast particle swarm optimization

extent, on the application. In practice, the most widely usedmetric is the sum of squared errors (SSE) defined by

SSE =k∑

i=1

∀x∈πi

‖x − ci‖2. (3)

For image clustering, another widely used metric is thepeak-signal-to-noise ratio (PSNR) defined by

PSNR = 10 · log102552

MSE, (4)

where the mean squared error (MSE) is defined by

MSE = 1

w × h

w∑

i=1

h∑

j=1

‖vi j − vi j‖2, (5)

with w and h denoting, respectively, the width and heightof the input and reconstructed images; vi j and vi j denoting,respectively, the pixel values at row i and column j of theinput and reconstructed images.

A third metric that will be used in this study for measuringthe quality of the clustering result is the accuracy rate (AR)defined by

AR =∑n

i=1 Ai

n× 100 %, (6)

where Ai assumes one of the two values 0 and 1, with Ai = 1denoting that the pattern xi is assigned to the right cluster andAi = 0 denoting that the pattern xi is assigned to the wrongcluster.

2.2 Simple PSO for clustering

The simple PSO uses the position and velocity of particlesto emulate the social behavior. The position represents a trialsolution of the optimization problem (e.g., clustering prob-lem) while the velocity represents the search direction of theparticle. Initially, all the particles are randomly put in thesearch space, and all the velocities are randomly generated.The velocity and position of each particle at iteration t + 1are defined, respectively, as

vt+1i j = ωvt

i j + a1ϕ1(pbti j − pt

i j )+ a2ϕ2(gbtj − pt

i j ), (7)

and

pt+1i j = pt

i j + vt+1i j , (8)

where vti j and pt

i j denote the velocity and position of the j thdimension of the i th particle at iteration t ; pbt

i j the personalbest position of the j th dimension of the i th particle up toiteration t ; gbt

j the global best position of the j th dimensionso far; ω an inertial weight; ϕ1 and ϕ2 two uniformly distrib-uted random numbers used to determine the influence of pbi jand gb; a1 and a2 two constant values denoting, respectively,the cognitive and social learning rate.

Pioneered by the studies Omran et al. (2002); van derMerwe and Engelbrecht (2003); Xiao et al. (2003), the PSO-based clustering algorithm, as outlined in Fig. 1, intuitivelyencodes the k centroids as the position of a particle, i.e.,pi = (ci1, ci2, . . . , cik) where ci j denotes the j th centroidencoded in the i th particle. The fitness of each particle isdefined by

f (pi ,Mi ) = w1dmax(pi ,Mi )+ w2[zmax − dmin(pi )], (9)

where

dmax(pi ,Mi ) = maxj=1,...,k

⎣∑

∀x∈πi j

d(x, ci j )

|ci j |

⎦ , (10)

denotes the maximum mean square intra-cluster distance;

dmin(pi ) = min∀a,b,a �=b{d(cia, cib)} , (11)

the minimum distance between all the centroids; Mi thematrix representing the assignment of patterns to the clus-ters encoded in the i th particle; zmax the maximum featurevalue in the dataset (e.g., the maximum pixel value in animage); w1 and w2 the two user-defined constants. In brief,in addition to the original PSO operators (the initializationoperator, the measurement operator, the personal and globalbest update operator, the velocity change operator on lines1, 7, 8, and 9 of Fig. 1), Omran et al. added two operators tothe PSO for calculating the distance between input patternsand centroids and assigning input patterns to the clusters to

Fig. 1 Outline of PSO for theclustering problem Omran et al.(2002)

1. Create an initial population of particles, each of which contains k randomly generated centroids.2. For each particle i3. For each pattern x4. Calculate the distance between x and all the centroids.5. Assign x to the closest cluster.6. End7. Calculate the fitness value using Eq. (9).8. Update the personal best pbi of each particle and the global best gb.9. Change the velocities and positions using Eqs. (7) and (8).10.

If the stop criterion is satisfied, then stop and output the best particle; otherwise, go to step 2.11.End

123

Page 4: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

which they belong, as shown in lines 4 and 5 of Fig. 1. Dif-ferent from the simple PSO (also called the global best PSO)in which the population is made up of the neighborhood ofeach particle Engelbrecht (2006), several recent studies Brat-ton and Kennedy (2007); Miranda et al. (2008) used differenttopologies to enhance the performance of PSO. For instance,the local best PSO Bratton and Kennedy (2007) uses a ringtopology for each particle and its neighborhood. In additionto the basic idea of PSO, in what follows, we discuss in turnthe representation, fitness function, transition, and parameteradjusting of PSO.

2.3 Representation

To apply PSO to the clustering problem, several represen-tations (encodings) (van der Merwe and Engelbrecht 2003;Chen and Ye 2004; Omran et al. 2005a; Cohen and de Castro2006; Ahmadi et al. 2007a,b; Jarboui et al. 2007; Karthi etal. 2009; Omran et al. 2005b, 2006; Chen et al. 2006; Daset al. 2008) were proposed that can be classified into fivecategories, as follows:

1. All centroids: This representation, first used in Omranet al. (2002) and later in van der Merwe and Engel-brecht (2003); Chen and Ye (2004); Omran et al. (2005a),encodes in each particle the k centroids.

2. One centroid: In contrast to the “all centroids” represen-tation, this representation Cohen and de Castro (2006);Ahmadi et al. (2007a,b) encodes in each particle one cen-troid. A well-known example is the PSO-based clusteringalgorithm (PSC) Cohen and de Castro (2006) in whicheach particle encodes a centroid.

3. Cluster ID: Another representation Jarboui et al. (2007);Karthi et al. (2009) encodes in each particle the clusterID of each pattern.

4. Binary: To automatically determine the “optimal” numberof clusters, the approach taken by Omran et al. (2005b,2006) is to encode in each particle a binary string of lengthk in which the value of i th bit denotes if the i th centroidwill be used in that particle so that a possibly differentnumber of centroids can be selected from a pool thatwould optimize the clustering results.

5. Hybrid: This representation Das et al. (2008); Chen etal. (2006) encodes in each particle the k centroids andassociated thresholds for determining the number of cen-troids. This representation was used in Das et al. (2008)to explore the proper number of cluster centers, by usingthe PSO clustering algorithm alone.

2.4 Fitness function

An efficient way to improve the performance of PSO for clus-tering is to modify the fitness function of PSO. A straightfor-

ward method was presented in van der Merwe and Engel-brecht (2003), which modifies the method described inOmran et al. (2002) at which the fitness function is definedby

f (pi ,Mi ) = 1

k

k∑

j=1

⎣∑

∀x∈π j

d(x, c j )

|c j |

⎦ = Je,i , (12)

and uses the k-means algorithm to create the initial solu-tion of the PSO to improve the quality of the end result. Animproved version was presented in Omran et al. (2005a) inwhich Omran et al. extended their previous work Omran etal. (2002); van der Merwe and Engelbrecht (2003), by mod-ifying the fitness function of PSO for image clustering fromEqs. (9)–(13).

f (pi ,Mi ) = w1dmax(pi ,Mi )

+w2[zmax − dmin(pi )] + w3 Je,i , (13)

where Je,i represents the value of the i th particle as definedin Eq. (12). The simulation results described in Omran etal. (2005a) showed that the PSO-based clustering algorithmcan produce better results than k-means, k-harmonic means,fuzzy c-means, and genetic algorithm in terms of the inter-distance, intra-distance, and quantization errors. In Chen andYe (2004), an even simpler method was presented for calcu-lating the fitness of PSO, which is as given below:

f (pi ,Mi ) = k[∑kj=1

∑ni=1 d(x, c j )

]+ Jo

, (14)

where Jo is a constant with a small value. Their simulationresults showed that the PSO-based clustering algorithm pro-vides a better result than k-means and fuzzy c-means.

Aimed at using the domain knowledge of clustering toimprove the clustering results, several studies Li et al. (2012);Das et al. (2008); Omran et al. (2005b) used the so-calledclustering index, such as CS measure, Davis–Bouldin (DB)index, Xie–Beni (XB) index, and partition entropy (PE),which takes into account inter-cluster distance (distancesbetween centroids of different clusters) and intra-cluster dis-tance (distances between patterns in the same cluster) at thesame time to measure the quality of the clustering results. InOmran et al. (2005b),

V = [c × N (2, 1)+ 1] × intra

inter(15)

was used to measure the quality of the solutions of the imageclustering problem. In Eq. (15), c denotes a predefined con-stant; N (2, 1) denotes a Gaussian distribution with mean 2and standard deviation 1; “intra” is defined as

intra = 1

n

k∑

j=1

∀x∈π j

d(x, c j ); (16)

123

Page 5: A fast particle swarm optimization for clustering

A fast particle swarm optimization

and “inter” is defined as

inter = d(c j , c j ′). (17)

A later study Das et al. (2008) used a modified version of theCS measure to measure the quality of the solutions obtainedby their kernel-based PSO clustering algorithm, which isdefined as follows:

f (pi ,Mi ) = 1

CSkerneli (k)+ ε, (18)

CSkerneli (k) =∑k

j=1

[1

|c j |∑

xq ∈π jmaxxq′ ∈π j 2

(1 − K(xq , xq ′ )

)]

∑kj=1

[min j∈k, j ′ �= j 2

(1 − K(c j , c j ′ )

)] ,

(19)

where ε denotes a small constant [the value of which is setequal to 0.0002 in Das et al. (2008)], and K denotes thekernelized distance. Generally speaking, similar concepts areused in other studies Omran et al. (2005b); Das et al. (2008)to improve the clustering results of PSO.

2.5 Transition

Different from the fitness function of PSO that is used todetermine the later search directions, the transition (alsocalled recombination) operator (Cohen and de Castro 2006;Ahmadyfard and Modares 2008; Omran et al. 2005a,b; Jar-boui et al. 2007; Marinakis et al. 2008; Abraham et al. 2007;Niknam et al. 2009) plays the role of perturbing, constructing,and adjusting the trajectory of particles. Because the transi-tion operator of PSO has to match the representation used,three kinds of transitions are usually used by PSO for clus-tering: continuous transition, discrete transition, and othertransitions. The details are as given below.

1. Continuous transition Cohen and de Castro (2006);Ahmadyfard and Modares (2008): This kind of transi-tion operator inherits most characteristics of simple PSOthat uses continuous representation to encode the parti-cles; thus, the transition methods are similar to Eqs. 7 and8. To improve the performance of PSO, a common prac-tice is to modify the velocity and position update rules. InCohen and de Castro (2006), the position of each particlepi and the input pattern that is closest to pi are included inthe velocity update rule. Another common practice is asdescribed in Ahmadyfard and Modares (2008) in whichparticles are bounded by their maximum and minimumvalues.

2. Discrete transition Omran et al. (2005b); Jarboui et al.(2007); Marinakis et al. (2008): For discrete representa-tion of particles, i.e., binary or cluster ID, the transitionoperator has to take care of transforming the velocityfrom continuous space to discrete space. One common

method is to use the sign function or its variants to deter-mine the value of the position of a particle. For instance,the position update rule of Omran et al. (2005b) is definedas

pt+1i j =

{0 if r t

j ≥ S(vt+1i j ),

1 if r tj < S(vt+1

i j ),(20)

where the sigmoid function is defined as S(x) = 1/(1 +e−x ); r t

j is a random number in [0, 1]; pt+1i j and vt+1

i j arethe position and velocity of the j th dimension of the i thparticle, respectively.

3. Other transitions Omran et al. (2005a,b); Abraham etal. (2007); Ahmadyfard and Modares (2008); Niknamet al. (2009): This kind of transition operator repre-sents studies that add operators to, or modify the oper-ators of, PSO. A good example is the addition of one-iteration k-means (i.e., one assignment and update) tothe PSO Omran et al. (2005a,b); Abraham et al. (2007).Another example is the addition of simulated anneal-ing to check the global best to fine-tune the solutionfound by PSO at each iteration on its convergence processNiknam et al. (2009).

2.6 Parameter adjusting

In Shi and Eberhart (1999), a linearly varying inertia weightω defined below was presented to improve the performanceof PSO.

ω = (ω1 − ω2)× tmax − t

tmax+ ω2, (21)

where ω1 and ω2 denote, respectively, the initial and finalvalues of the inertia weight ω; tmax the maximum numberof iterations; t the current iteration number. To enhance theperformance, Ratnaweera et al. (2004) not only redefined theacceleration coefficients a1 and a2 by

a1 = (a1 − a1)× t

tmax+ a1, (22)

a2 = (a2 − a2)× t

tmax+ a2, (23)

where a1, a1, a2, and a2 are all predefined constants2 but alsoemployed a mutation-like operator to increase the diversityof search for the PSO-based algorithm they propose.

2 In other words, the modified acceleration coefficients a1 and a2 beginwith the value a·, increase or decrease linearly proportional to the dif-ference between a· and a· as the number of iterations grows, and endwith the value a·.

123

Page 6: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

3 Proposed algorithm

3.1 Concept

In this paper, we present an efficient method, based on thenotion of pattern reduction, to reduce the computation time ofPSO-based clustering algorithms called MPREPSO (Multi-start Pattern Reduction-Enhanced Particle Swarm Optimiza-tion). The main features of the proposed algorithm can besummarized, as follows:

1. As mentioned previously, most of the pattern compressionmethods for the clustering problem Bradley et al. (1998b);Tsai et al. (2007); Chiang et al. (2011) are designed for andused in single-solution-based clustering algorithms. Theproposed algorithm, however, is designed for population-based clustering algorithms, and here we are using PSOas an example to show how it works.

2. The reduction method described in Chiang et al. (2011)considers a pattern as static if it is within a predefineddistance to its centroid whereas the new method presentedin this paper considers a pattern as static if it stays in thesame cluster for a certain number of iterations. In additionto using the reduction method described in Chiang et al.(2011), we add an efficient method to check whether apattern can be considered as static.

3. We also present a new multistart method to improve thequality of the end result of pattern reduction-enhancedclustering algorithms. In other words, the multistartmethod described herein is aimed to “restart” MPREPSOto either find a better solution or keep MPREPSO fromfalling into local optima.

Moreover, our previous work Tsai et al. (2010) shows thatthe original pattern reduction can reduce most of the com-putation time of PSO for the codebook generation problem,but the quality of the end result will be degraded. As such,a fuzzy inference method is employed to solve this prob-lem, but a small loss of the quality remains. Therefore, twodifferent detection operators and a multistart operator arepresented in this paper. The consequence is that the algo-rithm presented in this paper can not only reduce most of thecomputation time of PSO, but it also has a better chance toimprove the quality of the end result.

The proposed algorithm is as outlined in Fig. 2. The refine-ment method described in lines 1 and 2 is essentially optionaland is intended to improve the accuracy rate of the cluster-ing result. Briefly, the refinement method works as follows:Given the population size m, the refinement method first ran-domly selects m subsets of patterns from the set of input pat-terns X . Each subset is composed of a certain percentage ofthe patterns in X and is associated with a particle of PSO. Therefinement method then applies the k-means to cluster eachsubset and uses the result thus obtained as the initial posi-tion of the corresponding particle of PSO. This completesthe initialization step of MPREPSO.

Like PSO for clustering, MPREPSO consists of all theoperators of PSO, namely, operator to update the centroids,operator to assign patterns to all the clusters, operator toupdate the personal best and the global best, and operator tochange the velocities and positions. Unlike PSO, MPREPSOadds two additional operators to PSO: the pattern reductionoperator and the multistart operator. The pattern reductionoperator in lines 10 and 11 is added to reduce the compu-tation time of PSO, by detecting and compressing the staticpatterns. It is worth noting that by the static patterns, we mean

1. Randomly select m subsets of patterns, denoted si for i = 1 2 m, from X by sampling.2. Create an initial population of m particles each of which encodes the k centroids obtained by applying k-means to the corresponding

si.3. For each particle i4. For each pattern x X5. Calculate the distance of x to all the centroids, denoted ci j for j = 1 2 k.6. Assign x to the nearest cluster.7. End8. Calculate the fitness value.9. For each cluster ci j

10. Detect the set of static patterns R.11. Compress the set of static patterns R into a single pattern r and remove R; i.e., X = (X r ) R.12. End13. End14. Update the personal best pbi and the global best gb.15. Change the velocities and positions.16. Perform the multistart operator.17. If the stop criterion is satisfied, then stop and output the best particle; otherwise, go to step 3.

Fig. 2 Outline of MPREPSO for the clustering problem

123

Page 7: A fast particle swarm optimization for clustering

A fast particle swarm optimization

patterns that are unlikely to change the clusters to which theybelong at later iterations. The multistart operator in line 14 isadded to improve the quality of the end result, by enforcingthe diversity of the population of MPREPSO.

3.2 Pattern reduction and multistart operators

In this section, we turn our discussion to the pattern reductionand multistart operators. The pattern reduction operator canbe divided into two sub-operators: the detection operator andthe compression operator.

3.2.1 Detection operator

The detection operator is aimed at finding the so-called staticpatterns. Two methods are used to check whether patternshave a high probability of not changing the clusters to whichthey belong and thus can be considered as static: (1) patternsthat are within a certain distance to their centroid, and (2)patterns that remain in the same cluster for a certain numberof iterations in a row.

In this study, a simple and fast approach3 is taken to deter-mine whether patterns in a cluster are within a certain dis-tance to their centroid so that they can be considered as staticfor the first method Chiang et al. (2011). More precisely, theapproach works by considering that the top α% of the pat-terns in each cluster are static. As a consequence, to locate,say, the top α% of the patterns that are close to the centroidof a cluster, the detection operator only needs to compute theaverage distance μ of the patterns to their centroid and thestandard deviation σ . A pattern will be in the top α% if itsdistance to the centroid is smaller than the distance

γ = μ± bσ, (24)

where b ≥ 0 denotes the number ofσ ’s from the mean neededto obtain the distance γ and thus the percentage α. In otherwords, the distance γ is used not only as a threshold for filter-ing out patterns which are unlikely to be reassigned to othergroups at later iteration but also as a parameter to balancethe accuracy rate and the convergence speed of the proposedalgorithm.

As Fig. 3 shows, the center of circle 1 represents the cen-troid of cluster ci at iteration t , and all the patterns insidecircle 1 belong to the same cluster. Once the average distanceμ of the patterns to their centroid and the standard deviationσ are computed, the proposed algorithm can easily find thetop α% of patterns that are to be considered as static. Theright side of Fig. 3 shows that when the centroid moves atiteration t + 1, the patterns that are static at iteration t have ahigher probability of not moving to another cluster than theactive patterns. That is, patterns within the shaded circle at

3 The approach is fast because no sorting is required.

Fig. 3 A simple example illustrating how the detection operator works

iteration t will not change the cluster to which they belongat iteration t + 1. Note that as far as this paper is concerned,α = 50 (i.e., a in Eq. (24) is set equal to 0, which impliesthat γ = μ ± 0σ = μ and thus α = 50) is used by the firstmethod. Also note that the simple and fast approach describedhere is not exact in the sense that it may not return exactlyα% of patterns as static because mean instead of median isused in the calculation of γ . Fortunately, the proposed algo-rithm does not require that the number of patterns returnedbe exactly as specified, and our simulation eventually showsthat the effect of this inexactness on the performance of theproposed algorithm is negligible. The impact of γ will bediscussed in Sect. 4.1.2.

The second method checks to see if a pattern is static bycounting the number of iterations that pattern stays in thesame cluster. Intuitively, the higher the number of iterationsa pattern remains in the same cluster, the more likely thepattern is static. How many iterations a pattern needs to stayin the same group depend on the convergence speed or thequality of the end result. If we set the number of iterations to alarge value, the accuracy rate will be high, but the downside isthat the computation time will increase. If we set the numberof iterations to a small value, then the result will be the otherway around. Again, note that as far as this paper is concerned,two iterations in a row are used, the details of which will bediscussed in Sect. 4.1.2.

3.2.2 Compression operator

The compression operator takes the responsibility of record-ing and passing the information of all the “static” patterns tothe other operators of PSO to ensure that no redundant com-putations are taken. First, the compression operator com-presses all the patterns in R into a single pattern r . Recallthat as described in lines 10 and 11 of Fig. 2, R denotesthe set of static patterns belonging to a particular cluster.Then, the compression operator ensures that only the pat-tern r is seen by all the other operators at later iterations, byadding r to X and removing all the patterns in R from X , i.e.,

123

Page 8: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

Fig. 4 A simple example illustrating how the compression operatorworks for integer and binary representations

X = (X ∪ {r})\R. This way, all the computations involvingpatterns in R can be eliminated.

The proposed algorithm can be easily applied to algo-rithms using binary and cluster ID representations, as dis-cussed in Sect. 2.3, because the compression operator needsto know to which cluster each pattern belongs to determinewhich patterns of each cluster can be compressed. As theexample in Fig. 4 illustrates, one of the solutions (particles)has eight patterns before compression. The detection oper-ator decides that two of the eight patterns in cluster 2 canbe considered as static. After recording the relevant informa-tion, the compression operator will then compress these twopatterns into a single pattern r , i.e., using a single pattern torepresent all the static patterns compressed. After that, onlyseven patterns (six patterns plus r ) need to be computed bythe other operators of PSO. As such, MPREPSO can reducethe computation time by “eliminating” on the fly patterns thatare static. It is important to note that the implementation of thecompression operator may be different because it depends onthe representation of the solutions. However, no matter whatrepresentation is used and how the compression operator isimplemented, the compression operator, by design, will nevercompress static patterns belonging to different clusters. Thatis, the compression operator, by design, will only compressstatic patterns belonging to the same cluster.

For algorithms using “all centroids”, “hybrid”, and “onecentroid” representations, the detection and compressionoperators of the proposed algorithm need to be modified.In other words, a transformation is required to deduce theinformation regarding to which cluster each pattern belongsbefore the compression operator can be applied. However,

even in this case, the amount of time the transformation takesis small when compared to those cases for which no trans-formation is required, especially when the datasets are large.In other words, the performance of the proposed algorithmis largely independent of whether or not a transformation isrequired.

3.3 Multistart operator

To prevent the premature convergence problem of the pro-posed algorithm from happening that may degrade the qualityof the end result, the multistart operator is used to enforce thediversity of MPREPSO. Here is how it works. First, the mul-tistart operator removes the particles whose fitness values arebelow the average of the population so as to focus on bettersearch directions. Next, the multistart operator creates newparticles by recombining the selected particles. That is, thesub-solutions (cluster ID or centroids) of each new particleare cloned randomly from the corresponding sub-solutionsof the remaining particles. By using this approach, the mul-tistart operator can then construct particles that have poten-tial to find better solutions than the current search directionsbecause the solutions are constructed from multiple high-quality solutions. For this reason, the multistart operator canprevent MPREPSO from falling into local minima.

As Fig. 5 shows, the multistart operator takes three steps.The first step is to remove the particles the fitness of which arebelow the average fitness of all the particles. The second stepis to create new particles by using a random clone method.The third step is to combine particles selected in step 1 andparticles constructed in step 2 to generate a new populationfor the MPREPSO. As Fig. 5 shows, after step 1, two ofthe remaining particles are p1 = {1, 1, 2, 1, 2, 2, 2, 2} andp2 = {1, 2, 1, 1, 2, 1, 2, 1} each of which encodes to whichclusters the eight patterns belong. The i th sub-solution of thenew solution can take its value from the i th sub-solution ofeither p1 or p2 by using the random clone method describedin step 2. In this way, MPREPSO will create a set of particlesthat have a structure similar to the high-performance particlesbut not exactly the same at step 3.

Same as for the detection and compression operators, atransformation is required for the multistart operator whenapplied to the other representations discussed previously,namely, “all centroids” and “hybrid”. Figure 6 gives an exam-

Fig. 5 A simple exampleillustrating how the multistartoperator works for the binaryand cluster ID representations

1 1222111

1 2 2 1 2 1 2 2

p3

p4

1 1 2 1 2 2 2 2

1 2 1 1 2 1 2 1

p1

p21 1222111

1 2 2 1 2 1 2 2

p3

p42 1 2 2 2 2

2 1 2 1 1 1 1 2

1 1

Step 1

p3

p4

Step 2

1 1 2 1 2 2 2 2

1 2 1 1 2 1 2 1

p1

p2

Step 3

fitne

ss

123

Page 9: A fast particle swarm optimization for clustering

A fast particle swarm optimization

1.3 2.1 3.3p40.1 3.2 5.3

0.2 4.2 4.3

1.1 2.2 3.3

1.3 2.1 3.2 1.1 2.1 3.3

1.1 2.2 3.3

1.3 2.1 3.2

1.1 2.1 3.3

1.3 2.1 3.3

c22 c13

c13

c11

c21 c22

p1

p2

p3

p4

Step 1

p3

p1

p2

p3

p4

Step 2 Step 3fit

ness

Fig. 6 A simple example illustrating how the multistart operator worksfor the “all centroids” representation

1.3 2.1 3.3p40.1 3.2 5.3

0.2 4.2 4.3

1.1 2.2 3.3

1.3 2.1 3.2 1.1 2.1 3.3

1.1 2.2 3.3

1.3 2.1 3.2

1.1 2.1 3.3

1.3 2.1 3.3

d22 d13

d13

d11

d21 d22

p1

p2

p3

p4

Step 1

p3

p1

p2

p3

p4

Step 2 Step 3

fitne

ss

Fig. 7 A simple example illustrating how the multistart operator worksfor the “one centroid” representation

ple to show how the multistart operator works when the“all centroids” representation is used for one-dimensionaldatasets. If the remaining particles after step 1 are p1 ={c11, c12, c13} and p2 = {c21, c22, c23} each of whichencodes three centroids, then the first centroid of the newparticle can take the value of either c11 or c21, the second cantake the value of either c12 or c22, and so on. For example,the sub-solutions of p′

3 are c11, c22, and c13, respectively.The third step is to combine the particles that have a highfitness value (p1 and p2) with the new particles p′

3 and p′4 to

generate a new population for the next iteration.As depicted in Fig. 7, we also consider the case of using

the “one centroid” representation as described in Cohen andde Castro (2006). In this case, the strategies for recombiningthe solutions are similar to those using the “all centroids” and“hybrid” representations except that instead of the centroids,it is the data in each dimension (denoted di j ) that is com-bined, as shown in step 2 of Fig. 7. One thing that is worthemphasizing is that no matter which representation is used,the underlying concept of the multistart operator remains thesame; i.e., to increase the diversity of search, by recombiningthe high-performance solutions.

4 Performance evaluation

The performance of the proposed algorithm is evaluated byapplying it to the following five PSO-based clustering algo-rithms: PSO Omran et al. (2005a), standard PSO (SPSO,http://particleswarm.info), comparative PSO (CPSO) Yanget al. (2008), local best PSO (LPSO) Bratton and Kennedy

Table 1 Datasets for benchmarks

Dataset Name Type

DSD-1 Iris Data

DSD-2 Wine

DSD-3 Breast cancer

DSD-4 Car evaluation

DSD-5 Statlog

DSD-6 Yeast

DSI-1 Lena Image

DSI-2 Baboon

DSI-3 F16

DSI-4 Peppers

DSI-5 Goldhill

DSI-6 Boots

(2007), and evolutionary PSO (EPSO) Miranda et al. (2008).The empirical analysis was conducted on an IBM X3400machine with 2.0 GHz Xeon CPU and 8GB of memory run-ning CentOS 5.0 with Linux 2.6.18, and the programs arewritten in Java 1.6.0_07.

To simplify the discussion that follows, we will use PR1 todenote the proposed algorithm Tsai et al. (2007); Chiang et al.(2011) using only the first detection method, PR2 to denotethe proposed algorithm using both detection methods, andMPR2 to denote the proposed algorithm (i.e., MPREPSO)using both detection methods and the multistart operator, asdiscussed in Sects. 3.2.1 and 3.3.4

As shown in Table 1, two different types of datasets,denoted DSD5 (for data) from UCI and DSI6 (for images),are used to evaluate the performance of these algorithms.In addition, the images in DSI are of size from 64 × 64 to512 × 512 and in 8-bit grayscale. For DSD, the number ofclusters is predefined by the problems themselves; For DSI,the number of clusters is set equal to 8. Note that the num-ber of clusters used in the computation of AR for the dataclustering problem is provided in UCI datasets. For the imageclustering and codebook generation problems, AR is not usedbecause the number of clusters for the image datasets is notprovided.

All the simulations are carried out for 30 runs. Thus, unlessotherwise stated, all the simulation results presented in thispaper are the average of 30 runs. For all the clustering algo-

4 Since no confusion is possible, throughout the rest of the paper, wewill use MPREPSO and MPR2 interchangeably to mean the proposedalgorithm using both detection methods and the multistart operator fromtime to time.5 These datasets are available for download at http://archive.ics.uci.edu/ml/datasets.html.6 These datasets are available for download at http://www.inf.uni-konstanz.de/cgip/lehre/dip_w0910/demos.html.

123

Page 10: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

Table 2 Parameter settings for the clustering algorithms evaluated inthe paper

Algorithm Parameter settings

PSO ω = 0.72 a1 = a2 = 1.49 m = 20

SPSO ω = 0.72 a1 = a2 = 1.49 m = 20

CPSO ω = [0.4, 0.9] a1 = a2 = 2 m = 20

LPSO χ = 0.72984 a1 = a2 = 2.05 m = 50

EPSO ω1 = ω2 = ω3 = ω4 = 0.5 p = 0.2 m = 20

rithms, the maximum number of fitness evaluations is setequal to 20,000. The other parameter settings are as sum-marized in Table 2 where m denotes the population size; χthe constriction factor of LPSO; and p the communicationprobability of EPSO. For each particle, MPREPSO uses 2 %of the input patterns as the samples to create the initial solu-tion. The maximum velocity vmax of PSO-based algorithmsis set equal to 0.01 Cohen and de Castro (2006) for DSDand 255 Omran et al. (2005a) for DSI. In addition, the othersettings of the PSO-based algorithms we compared in thispaper follow the settings of the corresponding papers. Thedetection operator of MPREPSO considers a pattern as staticif its distance to the centroid is no larger than γ = μ and ifit stays in the same group for two iterations.

We will also use the following conventions. Let�β denotethe enhancement of βφ (new algorithm) with respect to βψ(original algorithm) in percentage defined by

�β = βφ − βψ

βψ× 100 %, (25)

where β = AR and β = PSNR denote, respectively, thequality of the clustering result measured in terms of AR andPSNR; β = T denotes the computation time. Note thatthroughout this paper, a larger value of �AR and �PSNR

implies a greater enhancement whereas a smaller value of�T implies a greater enhancement. Note also that for DSD,the quality of the clustering result is measured in terms ofAR while for DSI, the quality is measured in terms of PSNR.

4.1 Impact of search strategies

4.1.1 Impact of refinement method

As the simulation results in Tsai et al. (2007); Chiang etal. (2011) show, the pattern reduction algorithm is very sen-sitive to the initial solutions. As such, we use sampling toreduce the sensitivity in this study. To understand the impactthe refinement method may have on MPREPSO, we com-pare “MPREPSO with sampling” with “MPREPSO withoutsampling” for both data and image clustering problems. Thedatasets used in the evaluation are DSD-1 to DSD-6 andDSI-1 to DSI-6. All the images tested are 512 × 512 8-bit

grayscale, and the number of clusters is 8. A simple samplingmethod is used as the refinement method for MPREPSO,which works as follows: two percent of the input patterns arerandomly sampled first and then k-means is applied to createthe initial population for the proposed algorithm. The sim-ulation results demonstrate that compared to “MPREPSOwithout sampling”, “MPREPSO with sampling” improvesthe AR and PSNR by 13.34 and 1.87, respectively, for bothDSD and DSI. The simulation results also demonstrate thateven such a simple sampling method can effectively reducethe sensitivity of MPREPSO, thus significantly improvingthe quality of the end result of MPREPSO.

4.1.2 Impact of detection method

In this paper, two detection methods are employed to checkwhether a computation is redundant, and only patterns thatpass both checks will be considered as static by the proposedalgorithm.

Table 3 shows the results of PR1 Chiang et al. (2011)and PR2 for the data clustering problem. The loss of qualitydue to pattern reduction can be mitigated when using bothdetection methods while keeping the amount of computationtime reduced almost unaffected. The other kind of clusteringproblems gives the same results. For instance, by cascadingthe first and second detection methods for solving the imageclustering problem (512 × 512 8-bit grayscale Lena), thequality can be improved when compared with using onlythe first detection method (PR1). To better understand theimpact of γ on the performance of the first detection operator,

Table 3 The proposed algorithm using only the first detection method(PR1) vs. the proposed algorithm using both detection methods (PR2)

Dataset �AR �T

PR1 PR2 PR1 PR2

DSD-1 −4.14 −3.49 −91.38 −87.69

DSD-2 −1.68 −0.73 −95.23 −93.46

DSD-3 −2.18 −1.25 −94.68 −93.34

DSD-4 −9.73 −6.41 −94.66 −93.61

DSD-5 −2.58 −0.82 −96.92 −96.70

DSD-6 −21.81 −9.95 −96.45 −95.04

Dataset �PSNR �T

PR1 PR2 PR1 PR2

DSI-1 1.99 2.91 −95.87 −94.80

DSI-2 −5.77 −0.92 −95.76 −95.02

DSI-3 −2.35 −1.73 −95.71 −94.88

DSI-4 −2.44 −2.06 −95.89 −94.96

DSI-5 −1.01 −0.46 −95.80 −94.92

DSI-6 −0.43 2.56 −95.84 −94.99

123

Page 11: A fast particle swarm optimization for clustering

A fast particle swarm optimization

Table 4 Impact of γ on the first detection method

Dataset �AR

μ− 2σ μ− σ μ μ+ σ μ+ 2σ

DSD-1 2.51 0.00 2.40 1.74 −3.81

DSD-2 −0.45 −0.35 0.98 −0.28 −0.31

DSD-3 −1.25 −0.52 0.21 0.00 0.42

DSD-4 18.79 14.02 7.17 1.90 5.10

DSD-5 6.41 4.00 6.02 4.85 3.30

DSD-6 −2.29 0.93 6.26 −4.87 4.64

Dataset �PSNR

μ− 2σ μ− σ μ μ+ σ μ+ 2σ

DSI-1 4.27 4.06 4.52 4.14 3.19

DSI-2 4.24 1.51 2.03 0.82 0.49

DSI-3 1.01 0.76 0.47 0.32 −0.35

DSI-4 1.97 1.50 1.43 0.98 0.27

DSI-5 0.48 −0.06 0.20 −0.07 −0.10

DSI-6 7.92 6.11 4.92 3.61 2.84

Dataset �T

μ− 2σ μ− σ μ μ+ σ μ+ 2σ

DSD-1 −80.45 −81.22 −82.04 −84.81 −85.84

DSD-2 −88.13 −88.17 −88.78 −89.39 −90.75

DSD-3 −88.49 −89.09 −89.39 −91.04 −91.61

DSD-4 −92.36 −92.40 −92.47 −92.89 −93.26

DSD-5 −95.90 −95.93 −96.15 −96.18 −96.21

DSD-6 −93.89 −93.95 −93.64 −94.06 −94.65

DSI-1 −92.02 −92.09 −92.53 −92.97 −93.05

DSI-2 −92.52 −92.90 −93.22 −93.27 −93.34

DSI-3 −91.50 −92.01 −92.36 −92.59 −92.80

DSI-4 −92.03 −92.89 −92.97 −93.70 −93.89

DSI-5 −91.45 −91.61 −92.12 −92.90 −93.13

DSI-6 −93.14 −93.34 −93.40 −93.75 −93.98

Table 4 gives the simulation results of using both detectionoperators and the multistart operator (MPR2) with γ for thefirst detection operator assuming different values. The resultsshow that for most datasets, the smaller the value of γ , theless the number of patterns that can be compressed and thusthe less the computation time that can be reduced. The resultsfurther show that in terms of the accuracy rate, γ = μ givesthe best results for most of the simulations; thus, γ = μ isused in all the simulations described in Table 4.

To understand the impact of the second detection method,we use datasets DSI-1 to DSI-67 to test the number of itera-tions patterns need to remain in the same group to be consid-ered as static. This test is aimed at understanding the impact

7 The number of clusters is set equal to 8.

the number of iterations patterns need to remain in the samegroup to be considered as static may have on the performanceof the proposed algorithm. The results in Fig. 8 show that thequality of the clustering result can be improved by applyingthe second detection method, but the downside is that it slowsdown the convergence speed, as noted previously. Our sim-ulation results described in Fig. 8 show that two iterationsare generally sufficient to get roughly the same clusteringresults as three or more iterations are used. Thus, the numberof iterations is set equal to 2 for all the simulations describedherein.

4.1.3 Impact of multistart method

In this section, we analyze the performance of the multistartmethod and its impact on improving the quality of the endresult of MPREPSO. Figure 9 gives the results of DSI-1 toDSI-6 using different number of iterations as the thresholdto perform the multistart operator. In other words, this testis aimed at understanding the impact changing the multi-start threshold may have on the performance of the proposedalgorithm. The results show that the more frequent the mul-tistart operator is applied, the longer the MPREPSO takesto converge. However, the quality may not be significantlyimproved when the multistart operator is performed too fre-quently in the convergence process of MPREPSO. This canbe easily justified as follows: If the multistart operator isperformed while the search results are not matured yet, thesearch directions of MPREPSO will be diverged. The con-

10

11

12

13

14

2 3 4 5 6 7 8 9 10 0

80

160

240

320

Qua

lity

Com

puta

tion

time

Number of iterations

PSNR Time

Fig. 8 Example showing the impact of using different number of iter-ations to check if a pattern is static

13

13.2

13.4

13.6

13.8

2 5 10 25 50 75 100 250 500 0

600

1200

1800

2400

Qua

lity

Com

puta

tion

time

Number of iterationsPSNR Time

Fig. 9 Example showing the impact of using different number of iter-ations as the threshold to perform the multistart operator

123

Page 12: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

12.9

13

13.1

13.2

13.3

500 1000 2000 3000

Qua

lity

Number of iterations5% 10% 20% 30% 40% 50%

(a)

0

200

400

600

800

500 1000 2000 3000

Com

puta

tion

time

Number of iterations5% 10% 20% 30% 40% 50%

(b)

Fig. 10 Example showing the impact of using different percentagesfor the number of iterations to remain intact in a row to perform themultistart operator in terms of (a) the quality and (b) the computationtime

sequence is a worse result. For example, as Fig. 9 shows,when the number of iterations of multistart is set equal to 2,5, 10, and 25, the quality is worse than when the number ofiterations of multistart is set equal to 50, 75, 100, 250, and500. Figure 10 gives the simulation results of DSI-1 to DSI-6. The maximum number of iterations is set equal to 500,1,000, 2,000, and 3,000 while the multistart threshold is setequal to 5, 10, 20, 30, 40, and 50 % of the maximum numberof iterations.

The results in Fig. 10 show that setting the multistartthreshold to 10 % of the maximum number of iterations pro-vides the best result for most of the simulations in terms ofboth the computation time and the quality of the end result.For this reason, the multistart threshold used in all the other

simulations described in this paper is 10 % of the maximumnumber of iterations. Figure 10 gives the results that are sim-ilar to those depicted in Fig. 9; that is, the more frequently themultistart operator is performed, the longer the MPREPSOtakes to converge because of the additional computation timetaken by the multistart operator.

Our simulations show that setting the number of itera-tions to 100 (i.e., one tenth the maximum number of itera-tions which is set equal to 1,000 as far as our simulationsare concerned) gives the best results in terms of the qual-ity in most cases. Briefly speaking, the multistart operatorcan effectively improve the clustering result of MPREPSO ifthe multistart operator is applied at a suitable time. Althoughmultistart will increase the computation time of the proposedalgorithm, the increase is very small compared to the overallcomputation time. In summary, the sampling and multistartmethods provide two different ways to improve the qualityof the end result of MPREPSO. Although MPREPSO withsampling and multistart is computationally more expensive,our experimental results show that MPREPSO with samplingand multistart is eventually no slower than PSO alone.

To better understand the impact of multistart on the per-formance of the proposed algorithm, Fig. 11 compares “thesearch diversity of PSO” with “the search diversity of PRPSOwith multistart” in terms of the average Hamming distancebetween all the particles defined as

¯dH = 1

n

2

m(m − 1)

m∑

i �= j

dH (pi , p j ), (26)

where dH (·) denotes the Hamming distance between the i thand j th particles pi and p j , i.e., the number of sub-solutions(cluster IDs to which patterns belong) in pi and p j that aredifferent from each other; m the number of particles; and n thenumber of patterns. Figure 11 shows that the average Ham-ming distance of PRPSO changes significantly once every100 iterations, i.e., every time the multistart operator is per-

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600 700 800 900 1000

Ave

rage

Ham

min

g di

stan

ce

Generation

PSO PRPSO

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 100 200 300 400 500 600 700 800 900 1000

Ave

rage

Ham

min

g di

stan

ce

Generation

PSOMPR2+PSO

(b)(a)

Fig. 11 Example illustrating the search diversity of the proposed algorithm with multistart in terms of the average Hamming distance (a) theresults of using IRIS and (b) the results of using Lena

123

Page 13: A fast particle swarm optimization for clustering

A fast particle swarm optimization

Table 5 Enhancement ofquality and running time ofSPSO, CPSO, LPSO, and EPSOfor the data clustering problem

Dataset SPSO CPSO LPSO EPSO

PR2 MPR2 PR2 MPR2 PR2 MPR2 PR2 MPR2

�AR

DSD-1 0.15 0.19 −7.41 2.40 −5.01 1.52 −3.15 1.41

DSD-2 0.00 0.00 −1.68 0.14 −1.54 1.68 −2.36 0.83

DSD-3 0.00 0.00 −1.97 0.21 −1.53 1.15 −2.51 1.05

DSD-4 −0.17 1.22 −4.57 15.74 −9.85 4.92 −6.90 5.57

DSD-5 0.02 0.02 −9.09 2.83 −10.38 1.64 −7.49 11.74

DSD-6 0.04 0.44 −11.15 −0.73 −9.07 −4.67 −7.89 2.85

Average 0.01 0.31 −5.98 3.43 −6.23 1.04 −5.05 3.91

�T

DSD-1 −82.77 −81.45 −86.07 −82.37 −90.91 −89.81 −90.92 −88.28

DSD-2 −89.12 −87.77 −90.87 −87.49 −92.85 −92.64 −95.26 −91.07

DSD-3 −89.47 −87.91 −92.76 −90.98 −95.03 −93.90 −91.05 −89.69

DSD-4 −91.39 −89.63 −94.52 −92.15 −94.95 −92.81 −95.38 −93.73

DSD-5 −96.64 −95.18 −97.61 −96.47 −96.52 −95.73 −97.35 −95.79

DSD-6 −93.46 −92.09 −96.10 −92.79 −95.24 −93.85 −95.77 −92.28

Average −90.47 −89.00 −92.99 −90.38 −94.25 −93.12 −94.29 −91.81

formed. The figure further shows that the search diversityof the proposed algorithm with multistart oscillates up anddown randomly on the convergence process. In other words,although not always the case, the multistart operator has agood chance to increase the search diversity of the proposedalgorithm in terms of the average Hamming distance. As aconsequence, it can be used to avoid falling into local opti-mum at early iterations.

4.2 Simulation results

4.2.1 Results of data clustering problem

Table 5 gives the results of applying SPSO, CPSO Yang etal. (2008), LPSO Bratton and Kennedy (2007), and EPSOMiranda et al. (2008) to the data clustering problem (DSD-1to DSD-6) in terms of both the quality (AR) and the run-ning time (T ). As the results in Table 5 show, the proposedalgorithm without multistart (i.e., PR2) can reduce the com-putation time of PSO-based algorithms we tested in this studyfrom 90.47 up to 94.29 %, but it may degrade the quality of theclustering results by up to −6.23 % on average. As depictedin Table 5, the multistart method provides an effective way todecrease the loss of quality caused by the pattern reductionoperator. As a result, the proposed algorithm MPREPSO (i.e.,MPR2) can significantly improve the quality of the clusteringresults. For instance, compared with EPSO, MPREPSO canimprove the quality of the end result of PSO from −5.05 to3.91 %. In summary, MPREPSO can either retain the qualityor even provide a better result than PSO-based algorithms

alone while at the same time significantly reducing the com-putation time of PSO-based clustering algorithms.

4.2.2 Results of image clustering problem

To understand the performance of the proposed algorithmfor large-scale datasets, Table 6 gives the results of applyingMPREPSO and PSO-based clustering algorithms to DSI (allthe images are of size 512×512 and in 8-bit grayscale), againin terms of both the quality (PSNR) and the running time(T ). The results in Table 6 show that, same as for the dataclustering problem, the proposed algorithm can reduce thecomputation time of these clustering algorithms from 90.23up to 94.85%. Similar to the results of DSD, these resultsindicate that the proposed algorithm MPREPSO provides abetter quality than the PSO-based algorithms alone in mostcases. A closer look at the results shows that the proposedalgorithm can reduce a very high percentage of the compu-tation time of PSO and its variants in computing fitness andupdating the clusters to which the patterns belong. However,since the operators used by each PSO-based algorithm maydiffer, the amount of time that MPREPSO can reduce varies.

Table 7 gives the statistical analysis of the results of theimage clustering problem. The datasets tested are DSI-1 toDSI-6. The images are of size 64×64, 128×128, 256×256,and 512 × 512. A two-tailed Wilcoxon test Derrac et al.(2011) with a level of significance α = 0.01 is used tounderstand the difference between the original PSO-basedalgorithm and the proposed algorithm, where “+” denotesthat the difference is significant and “−” denotes that the

123

Page 14: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

Table 6 Enhancement ofquality and running time ofSPSO, CPSO, LPSO, and EPSOfor the image clustering problem

Dataset SPSO CPSO LPSO EPSO

PR2 MPR2 PR2 MPR2 PR2 MPR2 PR2 MPR2

�PSNR

DSI-1 −2.76 12.98 −4.08 1.98 1.90 3.05 −2.57 −0.87

DSI-2 −2.25 1.76 −1.15 5.81 −4.00 4.41 −0.78 2.57

DSI-3 0.85 2.50 −0.08 0.17 −0.09 0.46 −1.53 0.45

DSI-4 −4.77 3.63 −0.10 0.89 −2.83 −1.76 0.04 1.25

DSI-5 −4.45 10.59 −0.12 −0.05 −0.02 0.20 −0.18 0.04

DSI-6 −1.73 0.60 −2.92 5.06 2.64 4.30 −0.27 5.57

Average −2.52 5.34 −1.41 2.31 −0.40 1.78 −0.88 1.50

�T

DSI-1 −93.99 −92.98 −94.72 −93.15 −92.57 −89.64 −93.19 −91.38

DSI-2 −94.53 −93.29 −94.63 −93.21 −91.60 −89.92 −93.44 −91.01

DSI-3 −93.75 −92.61 −95.15 −93.24 −92.17 −90.20 −93.93 −91.85

DSI-4 −93.77 −92.69 −94.72 −92.61 −91.95 −89.72 −93.13 −91.31

DSI-5 −94.37 −93.10 −94.71 −93.10 −92.85 −89.68 −93.21 −91.33

DSI-6 −94.30 −93.14 −95.17 −92.88 −94.39 −92.20 −94.04 −91.99

Average −94.12 −92.97 −94.85 −93.03 −92.59 −90.23 −93.49 −91.48

Table 7 Wilcoxon test analysisof image clustering results SPSO CPSO LPSO EPSO

PR2 MPR2 PR2 MPR2 PR2 MPR2 PR2 MPR2

�PSNR + − + + − − + +p value 0.000 0.012 0.000 0.000 0.217 0.011 0.004 0.003

�T + + + + + + + +p value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

difference is insignificant. The statistical analysis shows thatPSO and the proposed algorithm are not significantly dif-ferent in terms of the quality of the end result in somecases, but not in all cases. That the results between PSOand the proposed algorithm are not significantly differentimplies that the results of the proposed algorithm are simi-lar to those of PSO. Our observation shows that in case theend results of the PSO and the proposed algorithm are sig-nificant different, the results of PR2 are worse than thoseof PSO while the results of MPR2 are better than those ofPSO. In terms of the computation time, the results of the pro-posed algorithm are significantly different from those of PSO,implying that the proposed algorithm is much faster in allcases.

4.2.3 Results of codebook generation problem

Table 8 gives the results of applying the proposed algorithmand the PSO alone to the codebook generation problem where

k denotes the number of codewords (i.e., the codebook size).In other words, this test is aimed at understanding the impactthe codebook size may have on the performance of the pro-posed algorithm.

As shown in Table 8, the computation time that MPREPSOcan reduce is much more for the data and image clusteringproblem than for the codebook generation problem. The rea-son is that the number of clusters for the simulations depictedin Table 8, from 64 to 512, is larger than the number of clus-ters for the simulations given in Tables 5 and 6. Anotherimportant reason is that the number of dimensions of eachpattern in the datasets shown in Table 8 is also larger thanthose given in Tables 5 and 6.

More precisely, PR2 can reduce the computation time byabout 97 % while maintaining the loss of quality to no morethan 1.51 %. Table 8 also gives a two-tailed Wilcoxon testDerrac et al. (2011) for PSO and the proposed algorithmfor the codebook generation problem. Just like the statisticalanalysis for the image clustering problem depicted in Table7 shows, the statistical analysis here shows that PSO and the

123

Page 15: A fast particle swarm optimization for clustering

A fast particle swarm optimization

Table 8 Enhancement of quality and running time of PSO for the code-book generation problem

Dataset k �PSNR �T

PR2 MPR2 PR2 MPR2

DSI-1 64 −0.59 0.17 −97.31 −92.81

128 −0.19 0.33 −97.29 −92.98

256 −0.12 0.10 −97.38 −92.44

512 −0.18 0.15 −97.05 −92.00

DSI-2 64 −0.29 0.82 −97.37 −93.60

128 −0.46 0.30 −97.43 −93.17

256 −0.74 1.11 −97.32 −93.20

512 −0.92 0.30 −97.22 −92.43

DSI-3 64 −1.50 0.72 −97.27 −93.10

128 −0.91 0.18 −97.35 −93.11

256 −0.03 0.08 −97.34 −92.52

512 −0.02 0.16 −97.02 −92.27

DSI-4 64 −1.03 0.87 −97.79 −94.63

128 −0.61 0.17 −97.35 −92.74

256 −0.72 0.17 −97.38 −92.83

512 −0.17 0.15 −97.01 −92.21

DSI-5 64 −1.32 0.07 −97.41 −93.23

128 −0.41 0.24 −97.40 −92.98

256 −0.20 0.19 −97.22 −92.81

512 −0.35 0.07 −97.10 −92.25

DSI-6 64 −1.51 0.74 −97.39 −93.22

128 −0.16 0.22 −97.30 −92.84

256 −0.26 0.12 −97.29 −92.75

512 −0.25 0.10 −97.06 −92.54

Wilcoxon test + + + +p value 0.000 0.000 0.000 0.000

proposed algorithm are significantly different in terms of thecomputation time. Of course, as the statistical analysis on theend result shows, not only can MPREPSO provide a betterresult than the PSO alone in most cases we tested, but it canalso still significantly reduce the computation time of PSO.This implies that multistart is a valuable option to adjust theperformance of the proposed algorithm.

4.2.4 Results of large datasets

In this section, two large-scale images, denoted DSI-7 andDSI-8, are used to measure the performance of the proposedalgorithm for large datasets. These images are taken fromthe web site of National Aeronautics and Space Adminis-tration (NASA)8. The images available for download areof size 1,024 × 1,024 and 2,048 × 2,048. The simulation

8 These datasets are available for download at http://photojournal.jpl.nasa.gov/catalog/PIA14873 and http://photojournal.jpl.nasa.gov/catalog/PIA14872.

Table 9 Enhancement of quality and running time of PSO for the code-book generation problem

Dataset k �PSNR �T

PR2 MPR2 PR2 MPR2

DSI-7 1,024 −0.38 4.19 −94.09 −92.88

2,048 −0.50 0.41 −96.28 −95.41

DSI-8 1,024 −0.46 0.19 −94.19 −93.80

2,048 −0.08 0.08 −96.17 −95.19

results depicted in Table 9 show that the proposed algo-rithm performs equally well for large-scale image cluster-ing problems (cf. images of size 512 × 512 in Sect. 4.2.2).That is, the proposed algorithm is highly scalable in termsof both the computation time and the quality of the endresult.

4.3 Analysis and discussion

At first glance, it may seem that the analysis given in thisstudy is exactly the same as that described in Chiang etal. (2011), the fact is that this is not the case. The differ-ence is in that MPREPSO is a population-based algorithmwhereas the method described in Chiang et al. (2011) is asingle-solution-based algorithm. On the other hand, sincethe number of patterns the proposed algorithm has to dealwith at each iteration keeps decreasing, theoretically, thetime complexity of MPREPSO can essentially be reducedfrom O(mnk�t) down to O(mnk�) for the clustering prob-lem where m denotes the population size, n the number ofinput patterns, k the number of clusters, � the number ofdimensions, and t the number of iterations that PSO andMPREPSO are performed. The simulation results describedpreviously are consistent with the time complexity analy-sis given here, i.e., the proposed algorithm can significantlyreduce the computation time of PSO-based algorithms forclustering. According to our observation, a limitation of theproposed algorithm MPREPSO is that the number of clustershas to be given. Although several clustering algorithms canautomatically determine the number of clusters, it still is alimitation and research issue of MPREPSO because the pro-posed algorithm will compress partial solutions during theconvergence process that may affect the end result of a clus-tering algorithm that automatically determines the numberof clusters.

To better understand the performance of the proposedalgorithm, in addition to the Wilcoxon test, the ANOVA andBonferroni post doc methods Derrac et al. (2011) with a levelof significance α = 0.01 are also employed to compare theresults of PSO, PR2, and MPR2. For the image clustering andcodebook generation problems, the statistical analysis shows

123

Page 16: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

0

500

1000

1500

2000

2500

3000

64 128 256 512

Run

ning

tim

e in

sec

onds

Image size

SPSOPR2+SPSO MPR2+SPSO

0

500

1000

1500

2000

2500

3000

3500

64 128 256 512

Run

ning

tim

e in

sec

onds

Image size

CPSOPR2+CPSO MPR2+CPSO

0

500

1000

1500

2000

2500

3000

64 128 256 512

Run

ning

tim

e in

sec

onds

Image size

LPSOPR2+LPSO MPR2+LPSO

0

500

1000

1500

2000

2500

64 128 256 512

Run

ning

tim

e in

sec

onds

Image size

EPSOPR2+EPSO MPR2+EPSO

Fig. 12 Simulation results showing running time vs. image size

11.4

11.6

11.8

12

12.2

12.4

12.6

12.8

13

64 128 256 512

PSN

R

Image size

SPSOPR2+SPSO MPR2+SPSO

10

10.5

11

11.5

12

12.5

13

13.5

64 128 256 512

PSN

R

Image size

CPSOPR2+CPSO MPR2+CPSO

10

10.5

11

11.5

12

12.5

13

13.5

64 128 256 512

PSN

R

Image size

LPSOPR2+LPSO MPR2+LPSO

10

10.5

11

11.5

12

12.5

13

13.5

64 128 256 512

PSN

R

Image size

EPSOPR2+EPSO MPR2+EPSO

Fig. 13 Simulation results showing PSNR vs. image size

that in terms of the quality of the end results, PSO, PR2, andMPR2 are not significantly different; in terms of the com-putation time, PSO vs. PR2, PSO vs. MPR2, and MPR2 vs.PSO are all significantly different while PR2 vs. MPR2 is notsignificantly different. In other words, for the image cluster-ing and codebook generation problems, the statistical analy-sis indicates that in terms of the quality of the end results,PSO, PR2, and MPR2 are similar to each other. The statisticalanalysis also indicates that in terms of the computation time,

the proposed algorithms (PR2 and MPR2), the computationtime of which are very close to each other, are both faster thanPSO. We also compare the proposed algorithm with severalstate-of-the-art algorithms in terms of both the running timeand the quality of the end result, by using these algorithmsto cluster images in DSI-1 to DSI-6 the size of which arefrom 64 to 512. The results are as shown in Figs. 12 and13 where the number of clusters is 8, and PR2 and MPR2

are as defined previously. In other words, PR2+A denotes A

123

Page 17: A fast particle swarm optimization for clustering

A fast particle swarm optimization

with the proposed method without multistart while MPR2+Adenotes A with the proposed method with multistart. Theresults in Fig. 12 show that not only for the SPSO but alsofor the CPSO, LPSO, and EPSO, the amount of computationtime that can be reduced by the proposed algorithm is propor-tional to the size of the dataset. On the other hand, as Fig. 13shows, in terms of PSNR, MPREPSO provides a result thatis either similar to or better than the PSO-based algorithmsalone. In other words, PR2+A provides similar results whencompared with PSO-based algorithms alone. But the qualityof the end result can be improved by adding multistart toPR2, i.e., by using the proposed algorithm MPREPSO (alsocalled MPR2).

5 Conclusions

This paper presents a high-performance method, based on thenotion of pattern reduction, to reduce the time complexityof PSO-based clustering algorithms. Two detection meth-ods are employed in the proposed algorithm to determinewhether a pattern can be considered as static and thus can becompressed. Also, the multistart method is used to improvethe quality of the end result. The simulation results show thatmany of the computations on the convergence process of PSOare essentially redundant and can be eliminated. The simula-tion results show further that the proposed algorithm can notonly significantly reduce the computation time of PSO-basedalgorithms for clustering problems, but it can also provide abetter result than the other algorithms we compared in thisstudy in most cases. In the future, our goal is to focus on find-ing an even more efficient detection and multistart method toenhance the quality of MPREPSO.

Acknowledgments The authors would like to thank the editors andanonymous reviewers for their valuable comments and suggestions onthe paper that greatly improve the quality of the paper. The authorswould also like to thank Mr. Jui-Le Chen for the implementation of stan-dard PSO to make the comparisons given in the paper more complete.This work was supported in part by the National Science Council ofTaiwan, R.O.C., under Contracts NSC102-2221-E-041-006, NSC102-2221-E-110-054, and NSC102-2219-E-006-001.

References

Abraham A, Das S, Konar A (2007) Kernel based automatic clusteringusing modified particle swarm optimization algorithm, In: Proceed-ings of the Annual Conference on Genetic and Evolutionary Com-putation, pp 2–9

Ahmadi A, Karray F, Kamel M (2007a) Cooperative swarms for clus-tering phoneme data, In: Proceedings of the IEEE/SP Workshop onStatistical, Signal Processing, pp 606–610

Ahmadi A, Karray F, Kamel M (2007b) Multiple cooperating swarmsfor data clustering, In: Proceedings of the IEEE Swarm IntelligenceSymposium, pp 206–212

Ahmadyfard A, Modares H (2008) Combining PSO and k-means toenhance data clustering, In: Proceedings of the International Sym-posium on Telecommunications, pp 688–691

Bagirov AM, Ugon J, Webb D (2011) Fast modified global k-means algorithm for incremental cluster construction. Patt Recogn44(4):866–876

Banks A, Vincent J, Anyakoha C (2008) A review of particle swarmoptimization. part II: hybridisation, combinatorial, multicriteria andconstrained optimization, and indicative applications. Nat Comput7(1):109–124

Bradley PS, Fayyad UM (1998) Refining initial points for k-means clus-tering, In: Proceedings of the International Conference on MachineLearning, pp 91–99

Bradley PS, Fayyad UM, Reina C (1998) Scaling clustering algorithmsto large databases, In: Proceedings of the International Conferenceon Knowledge Discovery and Data Mining, pp 9–15

Bratton D, Kennedy J (2007) Defining a standard for particle swarmoptimization, In: Proceedings of the IEEE Swarm Intelligence Sym-posium, pp 120–127

Buzo A, Gray AH Jr, Gray RM, Markel JD (1980) Speech coding basedupon vector quantization. IEEE Trans Acoust Speech Signal Proc28(5):562–574

Cai W, Chen S, Zhang D (2007) Fast and robust fuzzy c-means cluster-ing algorithms incorporating local information for image segmenta-tion. Patt Recogn 40(3):825–838

Chen CY, Ye F (2004) Particle swarm optimization algorithm and itsapplication to clustering analysis, In: Proceedings of the IEEE Inter-national Conference on Networking, Sensing & Control, 2:789–794

Chen CY, Feng HM, Ye F (2006) Automatic particle swarm optimizationclustering algorithm. Intern J Electr Eng 13(4):379–387

Cheng TW, Goldgof DB, Hall LO (1998) Fast fuzzy clustering. Fuzzysets and systems 93(1):49–56

Chen Q, Yang J, Gou J (2005) Image compression method usingimproved PSO vector quantization, In: Proceedings of the Advancesin Natural Computation, pp 490–495

Chiang MC, Tsai CW, Yang CS (2011) A time-efficient pattern reduc-tion algorithm for k-means clustering. Info Sci 181(4):716–731

Cohen SCM, de Castro LN (2006) Data clustering with particle swarms,In: Proceedings of the IEEE Congress on Evolutionary Computation,pp 1792–1798

Das S, Abraham A, Konar A (2008) Automatic kernel clustering witha multi-elitist particle swarm optimization algorithm. Patt RecognLett 29(5):688–699

Derrac J, García S, Molina D, Herrera F (2011) A practical tutorialon the use of nonparametric statistical tests as a methodology forcomparing evolutionary and swarm intelligence algorithms. SwarmEvol Comput 1(1):3–18

Ding C, He X (2004) K -means clustering via principal componentanalysis, In: Proceedings of the International Conference on MachineLearning, 69:225–232

Elkan C (2003) Using the triangle inequality to accelerate k-means, In:Proceedings of the International Conference on Machine Learning,pp 147–153

Engelbrecht AP (2006) Fundamentals of computational swarm intelli-gence. Wiley, West Sussex, England

Eschrich S, Ke J, Hall LO, Goldgof DB (2003) Fast accurate fuzzyclustering through data reduction. IEEE Trans Fuzzy Syst 11(2):262–270

Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithmfor discovering clusters in large spatial databases with noise, In: Pro-ceedings of the International Conference on Knowledge Discoveryand Data Mining, pp 226–231

Feng HM, Chen CY, Ye F (2007) Evolutionary fuzzy particle swarmoptimization vector quantization learning scheme in image compres-sion. Exp Syst Appl 32(1):213–222

Getz G, Gal H, Kela I, Notterman DA, Domany E (2003) Coupledtwo-way clustering analysis of breast cancer and colon cancer geneexpression data. Bioinformatics 19(9):1079–1089

123

Page 18: A fast particle swarm optimization for clustering

C.-W. Tsai et al.

Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003)Clustering data streams: theory and practice. IEEE Trans KnowlData Eng 15(3):515–528

Hammouda KM, Kamel MS (2004) Efficient phrase-based documentindexing for web document clustering. IEEE Trans Knowl Data Eng16(10):1279–1296

Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACMComput Surv 31(3):264–323

Jarboui B, Cheikh M, Siarry P, Rebai A (2007) Combinatorial particleswarm optimization (CPSO) for partitional clustering problem. ApplMath Comput 192(2):337–345

Karthi R, Arumugam S, RameshKumar K (2009) A novel discrete par-ticle swarm clustering algorithm for data clustering, In: Proceedingsof the Bangalore Annual Compute Conference, pp 16:1–16:4

Kaukoranta T, Fränti P, Nevalainen O (2000) A fast exact GLA based oncode vector activity detection. IEEE Trans Image Proc 9(8):1337–1342

Kekre HB, Sarode TK (2009) Fast codebook search algorithm for vectorquantization using sorting technique, In: Proceedings of the Interna-tional Conference on Advances in Computing, Communication andControl, pp 317–325

Kogan J (2007) Introduction to clustering large and high-dimensionaldata. Cambridge University Press, New York

Kulkarni RV, Venayagamoorthy GK (2011) Particle swarm optimizationin wireless-sensor networks: a brief survey. IEEE Trans Syst ManCybernet Part C 41(2):262–267

Kuo RJ, Wang MJ, Huang TW (2011) An application of particleswarm optimization algorithm to clustering analysis. Soft Comput15(3):533–542

Lai JZC, Liaw YC, Liu J (2008) A fast VQ codebook generationalgorithm using codeword displacement. Patt Recogn 41(1):315–319

Lai JZC, Huang TJ, Liaw YC (2009) A fast k-means clustering algo-rithm using cluster center displacement. Patt Recogn 42(11):2551–2556

Leuski A (2001) Evaluating document clustering for interactive infor-mation retrieval, In: Proceedings of the International Conference onInformation and Knowledge Management, pp 33–40

Li C, Zhou J, Kou P, Xiao J (2012) A novel chaotic particle swarm opti-mization based fuzzy clustering algorithm. Neurocomputing 83:98–109

Lughofer E (2008) Extensions of vector quantization for incrementalclustering. Patt Recogn 41(3):995–1011

Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004) FGKA: a fast genetic k-means clustering algorithm, In: Proceedings of the ACM Symposiumon Applied, Computing, pp 622–623

Marinakis Y, Marinaki M, Matsatsinis N (2008) A stochastic natureinspired metaheuristic for clustering analysis. Intern J Bus Intel DataMining 3(1):30–44

Miranda V, Keko H, Duque AJ (2008) Stochastic star communicationtopology in evolutionary particle swarms (EPSO). Intern J ComputIntel Res 4(2):105–116

Ng RT, Han J (2002) CLARANS: a method for clustering objects forspatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016

Niknam T, Amiri B, Olamaei J, Arefi A (2009) An efficient hybrid evolu-tionary optimization algorithm based on PSO and SA for clustering.J Zhejiang Univ SCI A 10(4):512–519

Omran MGH, Salman AA, Engelbrecht AP (2002) Image classifica-tion using particle swarm optimization, In: Proceedings of the Asia-Pacific Conference on Simulated Evolution and Learning, pp 370–374

Omran MGH, Engelbrecht AP, Salman AA (2005a) Particle swarmoptimization method for image clustering. Intern J Patt Recogn ArtifIntel 19(3):297–321

Omran MGH, Engelbrecht AP, Salman AA (2005b) Dynamic clusteringusing particle swarm optimization with application in unsupervisedimage segmentation. Proc World Acad Sci Eng Technol 2005:199–204

Omran MGH, Salman AA, Engelbrecht AP (2006) Dynamic clusteringusing particle swarm optimization with application in image seg-mentation. Patt Anal Appl 8(4):332–344

Ordonez C, Omiecinski E (2004) Efficient disk-based k-means cluster-ing for relational databases. IEEE Trans Knowl Data Eng 16(8):909–921

Parsopoulos KE, Vrahatis MN (2010) Particle swarm optimization andintelligence: advances and applications. IGI Global Snippet

Paterlini S, Krink T (2006) Differential evolution and particle swarmoptimisation in partitional clustering. Comput Stat Data Anal50(5):1220–1247

Ratnaweera A, Halgamuge SK, Watson HC (2004) Self-organizing hier-archical particle swarm optimizer with time-varying accelerationcoefficients. IEEE Trans Evol Comput 8(3):240–255

Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimiza-tion, In: Proceedings of the Congress on Evolutionary Computation,3:1945–1950

Theodoridis S, Koutroumbas K (2009) Chapter 16: cluster validity, inpattern recognition, 4th edn. Academic Press, Boston

Tillett JC, Rao RM, Sahin F, Rao TM (2003) Particle swarm optimiza-tion for the clustering of wireless sensors, In: Proceedings of SPIE5100:73–83

Tsai CW, Yang CS, Chiang MC (2007) A time efficient pattern reductionalgorithm for k-means based clustering, In: Proceeding of the IEEEInternational Conference on Systems, Man and Cybernetics, pp 504–509

Tsai CW, Lin CF, Chiang MC, Yang CS (2010) A fast particle swarmoptimization algorithm for vector quantization. ICIC Expr Lett PartB 1(2):137–143

van der Merwe DW, Engelbrecht AP (2003) Data clustering using parti-cle swarm optimization, In: Proceedings of IEEE Congress on Evo-lutionary Computation, 1:215–220

Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metricfor data clustering and classification. Patt Recogn 41(12):3600–3612

Xiao X, Dow ER, Eberhart R, Miled ZB, Oppelt RJ (2003) Gene clus-tering using self-organizing maps and particle swarm optimization,In: Proceedings of the International Symposium on Parallel and Dis-tributed Processing

Xu R, Wunsch-II DC (2005) Survey of clustering algorithms. IEEETrans Neural Netw 16(3):645–678

Xu R, Wunsch-II DC (2008) Clustering. Wiley, Hoboken, New JerseyXu W, Liu X, Gong Y (2003) Document clustering based on non-

negative matrix factorization, In: Proceedings of the InternationalACM SIGIR Conference on Research and Development in, Infor-mation Retrieval, pp 267–273

Yang CS, Chuang LY, Ke CH, Yang CH (2008) Comparative parti-cle swarm optimization (CPSO) for solving optimization problems,In: Proceedings of the International Conference on Research, Inno-vation and Vision for the Future in Computing & CommunicationTechnologies, pp 86–90

Zhang WF, Liu CC, Yan H (2010) Clustering of temporal gene expres-sion data by regularized spline regression and an energy based sim-ilarity measure. Patt Recogn 43(12):3969–3976

Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient dataclustering method for very large databases, In: Proceedings of theACM SIGMOD International Conference on Management of Data,pp 103–114

123