Research Article An Extended Affinity Propagation...

Research ArticleAn Extended Affinity Propagation Clustering Method Based onDifferent Data Density Types

XiuLi Zhao12 and WeiXiang Xu3

1State Key Laboratory of Rail Traffic Control and Safety Beijing 100044 China2Business School Qilu University of Technology Jinan 250353 China3School of Traffic and Transportation Beijing Jiaotong University Beijing 100044 China

Correspondence should be addressed to XiuLi Zhao 07114212bjtueducn

Received 21 September 2014 Revised 10 December 2014 Accepted 18 December 2014

Academic Editor Yongjun Shen

Copyright copy 2015 X Zhao and W Xu This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Affinity propagation (AP) algorithm as a novel clustering method does not require the users to specify the initial cluster centers inadvance which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similardegree among the data points But inmany cases there exist some different intensive areaswithin the same data set whichmeans thatthe data set does not distribute homogeneously In such situation the AP algorithm cannot group the data points into ideal clustersIn this paper we proposed an extended AP clustering algorithm to deal with such a problem There are two steps in our methodfirstly the data set is partitioned into several data density types according to the nearest distances of each data point and then theAP clustering method is respectively used to group the data points into clusters in each data density type Two experiments arecarried out to evaluate the performance of our algorithm one utilizes an artificial data set and the other uses a real seismic data setThe experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithmitself

1 Introduction

Affinity propagation (AP) is a new partitioning clusteringalgorithm proposed by Frey and Dueck in 2007 [1] Bybeing different from the traditional partitioning clusteringmethods AP clustering algorithm does not need to specifythe initial cluster centers but automatically finds a subset ofexemplar points which can best describe the data groups byexchanging messages The messages to be exchanged amongthe data points are based on a set of similarities between thepairs of data points AP algorithm assigns each data point toits nearest exemplar which results in a partitioning of thewhole data set into some clusters AP algorithm convergesin the maximization of the overall sum of the similaritiesbetween data points and their exemplars

Most clustering algorithms store and refine a fixed num-ber of potential cluster centers while affinity propagationdoes not need them AP algorithm equally regards all datapoints as potential exemplars (also named cluster centers)Among the data points there are two kinds of messages to be

exchanged the availability and the responsibility The formerone sent from point 119896 to point 119894 indicates how much supportpoint 119896 has received from other points for being an exemplarThe latter one sent from point 119894 to point 119896 indicates how well-suited point 119896 is as an exemplar for point 119894 in contrast to otherpotential exemplarsWhen the sumof the responsibilities andavailabilities is maximized the proceeding of the message-passing ends and a clear set of exemplars and clusters emerge

But the AP clustering algorithm cannot deal with thenested clusters which have different data density typesThe toy example in Figure 1 illustrates clusters C1 C2 C3C4 and C5 with more intensive data but AP clusteringalgorithm cannot recognize them Figure 2 shows the resultof AP clustering algorithm which is in disagreement withthe reality Obviously we want to obtain the clusters asshown in Figure 3 In this paper we propose an extendedAP clustering algorithm that can cluster data point set intoclusters according to their different density types There aretwo novelties in our new extended AP clustering algorithm(1) according to the frequency distribution curve of the

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2015 Article ID 828057 8 pageshttpdxdoiorg1011552015828057

2 Computational Intelligence and Neuroscience

0 2 4 6 8 10

1

2

3

4

5

6

7

8

9

C2

C5

C3 C4

C1

Figure 1 Data set

nearest neighbor distance it can identify the number ofthe different density types in the whole data set (2) it canpartition the data set into clusters more effectively than theOPTICS and AP clustering algorithm itself

For the sake of simplicity we design a simple toy data setto explain the context of our method as shown in Figure 1In this data set there are some unknown data density typesand each of them may be nested to other clusters withother density types Firstly we transform data point set intoa distance matrix and recognize the point density typesaccording to the nearest neighbor distance matrix Then weuse AP clustering algorithm to classify the data points intoseveral clusters The whole clustering process consists of twosteps (as shown in Figure 4) The first step is to identifydividing value of data density and the second step is toestablish the mechanism to form clusters

This paper is organized as follows in Section 2 somerelated works on AP clustering algorithm and some conceptsare briefly introduced in Section 3 the new semisupervisedAP clustering algorithm is illustrated in detail In Section 4the proposed algorithm is applied to cluster some data setsboth simulated and real data and finally some conclusionsare summarized in Section 5

2 Related Work

21 Related Work on AP Cluster Methods The goal of clusteranalysis is to group similar data into some meaningful sub-classes (or named clusters) There are three main categoriesin the clustering methods partitioning hierarchical anddensity-based

The first clustering category tries to obtain a rationalpartition of the data set that the data in the same cluster aremore similar to each other than to the data in other clustersUsually the partitioning criterion takes the minimum sumof the Euclidian distance K-means and K-medoids are theclassical partitioning cluster methods

The second clustering category works by grouping dataobjects into a dendrogram tree of clusterThis category can befurther classified into eithermerging (bottom-up) or splitting

0 2 4 6 8 101

2

3

4

5

6

7

8

9

10

Figure 2 The clusters by AP

0 2 4 6 8 101

2

3

4

5

6

7

8

9

10

0 2 4 6 8 101

2

3

4

5

6

7

8

9

C3

C5

C4

C1 C2

Figure 3 The ideal clusters

Data point set

The data density types

Clusters

Detecting the thresholds of different data density types

Clustering by AP algorithm

Figure 4 The two-step process for discovering clusters withdifferent densities

Computational Intelligence and Neuroscience 3

(top-down)The former one begins by placing each data pointin its own cluster and then merges these atomic clusters intolarger and larger clusters until certain termination conditionsare satisfied The latter one does reverse to the merging oneby starting with all data points as a whole cluster and thensubdivides this cluster into smaller and smaller pieces until itsatisfies certain termination conditions (Han et al 2001) [2]

Differing from the above clusteringmethods the density-based clustering algorithms can discover intensive subclassesbased on the similar degree between a data point and itsneighbors Such subclass constitutes more dense local area[3 4]

In recent years as a new clustering technique AP cluster-ing algorithm proposed by Frey and Dueck in 2007 mainlydepends on message passing to obtain clusters effectivelyDuring the clustering process AP algorithm firstly considersall data points as potential exemplars and recursively trans-mits real-valued messages along the edges of the networkuntil a set of good centers are generated

In order to speed up convergence rate some improvedAP algorithms have been proposed Liu and Fu used theconstriction factor to regulate damping factor dynamicallyand sped up the AP algorithm [5] Gu et al utilize localdistribution to add constraint like semisupervised clusteringto construct sparse similarity matrix [6] Wang et al proposean adaptive AP method to overcome inherent imperfectionsof the algorithm [7] In the recent years AP algorithm hasbeen applied in many domains and it effectively solvesmany practical problems These domains are included inimage retrieval [8] biomedicine [9] text mining [10] imagecategorization [11] facility location [12] image segmentation[13] and key frame extraction of video [14]

22 Related Work on the Nearest Neighbor Cluster Method

221 The Nearest Neighbor Distance Let 119883 = 1199091sdot sdot sdot 119909119899

be a data set containing 119899 data point objects and let1198621 119862

119896 119896 ≪ 119899 be the types of different density For

simplicity and without any loss of generality we consider justone of the density types C

1

Suppose that all data points in the C1are denoted as

119901119894 119901119894isin 1198621 119894 = 1 119898 The nearest distance (named

Dm) of pi is defined as the distance between pi and its nearestneighbor AllDm forms a distancematrixD BecauseC

1obeys

a homogenous Poisson distribution the nearest distancematrixDm also has an equivalent likelihood distribution withconstant intensity (120582) For a randomly chosen data point piits nearest distance array Dm has a cumulative distributionfunction (cdf) as shown in the following formula

119866119863119898

(119889) = 119875 (119863119898ge 119889) = 1 minus

119898minus1

sum

119897=0

119890minus120582120587119889

2

(1205821205871198892)

119897 (1)

The formula is gained by supposing a circle of radius 119889centered at a data point More detailed if Dm is longer thand there must be one of 0 1 2 119898 minus 1 data points in the

circle Thus the pdf of the nearest distance119863119898(119889119898 120582) is the

derivative of 119866119863119898

(d) as shown in the following formula

119891119883119898(119909119898 120582) =

119890minus120582120587119909

2

2(120582120587)1198981199092119898minus1

(119898 minus 1) (2)

222 ProbabilityMixture Distribution of the Nearest DistanceHereafter we will discuss the situation of multiple datadensity types with various intensive data overlap in a localarea A mixture distribution can be modeled for such datapoints with different density types As shown in Figure 1there are two density types in the toy experiment threeclusters in one data density type and two clusters in anotherdata density type as well as plenty of background noise dataAccording to the nearest distance matrix D a histogram canbe drawn to show the different distribution of the differentdensity types In order to eliminate the disadvantage of theedge effect we transform the data point into toroidal edge-corrected data thus the points near the edges will have thesame mean value of the nearest distance with the data pointsin the inner area Amixture density can be assumed to expressthe distribution for 119896 types of data point density the equationis shown as follows

119863119898sim

119896

sum

119894=1

119908119894119891 (119889119898 120582

119894) =

119896

sum

119894=1

119908119894

119890minus1205821198941205871199092

2(120582119894120587)1198981198892119898minus1

(119898 minus 1) (3)

The weight of the 119894th density type is represented by thesymbol 119908

119894(119908119894gt 0 and sum119908

119894= 1) The symbol 119898 means

the distance order and 120582119894is the 119894th intensity type To a given

data point 119901119894there is a one-to-one counterpoint of its nearest

distance Dm and data points can be grouped by splittingthe mixture distribution curve in other words the number(k) of density types and their parameters (119908

119894 120582119894) can be

determinedFigure 5 shows the histogram of the nearest distance of

the toy data set and the fitted mixture probability densityfunction where fb is defined as the expectation of theparameter 120582

119894which is simulated

3 Some Concepts Relating to APClustering Algorithm

31 Some Terminologies on AP Clustering In the parti-tioning clustering algorithm most techniques identify thecandidate exemplars in advance by the users (eg k-centersand k-means clustering) However the affinity propagationalgorithm proposed by Frey and Dueck simultaneouslyconsiders all data points as candidate exemplars In theAP clustering algorithm there are two important conceptsthe responsibility (119877(119894 119896)) and availability (119860(119894 119896)) whichrepresent two messages indicating how well-suited a datapoint is to be a potential exemplar The sum of the valuesof 119877(119894 119896) and 119860(119894 119896) is the evaluation basis whether thecorresponding data point can be a candidate exemplar or notOnce a data point is chosen to be a candidate exemplar thoseother data points with more nearer distance will be assigned


0 20 40 60 80 100 1200

001002003004005006007008009

01

Distance

Prob

abili

ty

HistogramBeta updatedfb = 20000

fb = 1000

fb = 10

Figure 5 Histogram of the nearest distances

Data point kResponsibility

AvailabilityData point i

Figure 6 Message passing between data points

to this cluster The detailed meaning of the terminologies isas follows

ExemplarThe center of a cluster is an actual data point

Similarity The similarity value between two data points 119909119894

and 119909119895(119894 = 119895) is usually assigned the negative Euclidean

distance such as 119878(119894 119895) = minus119909119894minus 1199091198952

Preference This parameter is set to indicate the preferencethat the data point 119894 can be chosen as an exemplar whichusually is set by median(s) of all distances In other wordspreference is the willingness of the user to select the initialldquoself-similarityrdquo values

Responsibility 119877(119894 119896) is an accumulated value which reflectshow well the point 119896 is suited to be the candidate exemplar ofdata point i and then sends from the latter to the former thatis compared to other potential exemplars the point 119896 is thebest exemplar

Availability119860(119894 119896) is opposed to119877(119894 119896) and reflects howwell-suited it is for the point 119894 to choose point 119896 as its exemplarBased on the candidate exemplar point k the accumulatedmessage sent to the data point 119894 tells it that point 119896 is morequalified as an exemplar than others

The relationship between 119877(119894 119896) and 119860(119894 119896) is illustratedin Figure 6

32TheAlgorithmofAPClustering There are fourmain stepsin AP

Step 1 (initializing the parameters) Consider

119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)

Step 2 (updating responsibility) Consider

119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)

(119895 isin 1 2 119873 119895 = 119896)

(5)

Step 3 (updating availability) Consider

119860 (119894 119896) = min

0 119877 (119896 119896) +sum

119895

max (0 119877 (119895 119896))

(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)

119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)

(119895 isin 1 2 119873 119895 = 119896)

(6)

In order to speed up convergence rate the damping coeffi-cient is added to iterative process

119877119894+1(119894 119896) = lam lowast 119877

119894(119894 119896) + (1 minus lam) lowast 119877old

119894+1(119894 119896)

119860119894+1(119894 119896) = lam lowast 119860


119894+1(119894 119896)

(7)

Step 4 (assigning to corresponding cluster) Consider

119888lowast

119894larr997888 arg max

119896

119877 (119894 119896) + 119860 (119894 119895) (8)

The program flow diagram is shown in Figure 7

4 The Extended AP Clustering Algorithm andExperiments

41 The Extended AP Clustering Algorithm Based on themethod of splitting the nearest distance frequency distri-bution by detecting the knee in the curve we propose asemisupervised AP clustering algorithm for recognizing theclusters with different intensity in a data set as follows

(1) Calculate every data pointrsquos nearest distance (119863119898)

(2) Draw the frequency distribution (FD) curve of thenearest distance matrix (119863)

(3) Analyze the FDcurve and identify the partition valuesfor obtaining the number of the density types of thewhole data set

(4) Run the AP clustering algorithm within each sub-dataset with different data density types and extractthe final clusters


Initializingmessage matrices

Computingmessage matrices

Updatingmessage matrices

Iteration over

End

Searching clusters by the sum of message matrices

Similaritymatrix

Figure 7 The flow diagram of extended AP algorithm

0 100 200 300 400 500 600 700 800 900 10000

100200300400500600700800900

1000

Figure 8 The simulated data set used in Experiment 1

42 Experiments and Analysis

421 Experiment 1 In Experiment 1 we use a simulateddata set with three Poisson distributions There are 1524 datapoints distributed at five intensities in a 1000times 1000 rectangleas shown in Figure 8The most intensive data type includes 3clusters (ie the blue green and Cambridge blue cluster) themedium density data type includes two clusters (ie the redand purple cluster) and the low density is the noise whichis dispersed in the whole rectangle We firstly applied oursemisupervised AP clustering algorithm to this dataset andcompared the result with the most familiar cluster methodOPTICS

According to implementing steps in Section 41 we cal-culate the nearest distance and then draw the FD curve of thedistance descending as shown in Figure 9(a) which indicates3 processes and 5 clusters existing in a simulated data set

As a comparison we run the OPTICS and get thereachability-plot of OPTICS (Figure 9(b))

After running the second step (AP clustering program)the clusters are obtained which is clearly consistent with theexpected results Figure 10 illustrates the clustering result

422 Experiment 2 In Experiment 2 a real seismic data set isused to evaluate our algorithm The seismic data are selectedfrom Earthquake Catalogue in West China (1970ndash1975 119872)and EarthquakeCatalogue inWest China (1976ndash1979119872 ⩾ 1)[15 16] (119872 represents the magnitude on the Richter scale)The space area covers from 100∘ to 107∘ E and from 27∘ to34∘N We selected the119872 ⩾ 2 records during 1975 and 1976

In an ideal clustering experiment the strong earthquakesforeshocks and aftershock can be clustered clearly Theformer can show the locations of strong earthquakes andthe latter can supply help to understand the mechanism ofthe earthquakes Due to the interference of the backgroundearthquakes it is difficult to discover the earthquake clustersFurthermore the earthquake records include only the coordi-nates of the epicenter of each earthquake In this situation thekey work is to discover the locations of the earthquakes from


0 200 400 600 800 1000 1200 1400 1600 18000

20406080

100

Point order

Dist

ance

(a)

0 200 400 600 800 1000 1200 1400 1600 18000

1020304050

Point order

Dist

ance

(b)

Figure 9 The different data density types via FD curve and reachability-plot

Figure 10 The result of Experiment 1

0 1 2 3 4 5 6 7 8 9 100

020406

k

Prob

abili

ty

(a)

0 2 4 6 8 10 12 140

020406

Distance

Prob

abili

ty

(b)

Figure 11 (a) The posterior probability of 119896 (b) the histogram of the nearest distance and the fitted curve

the background earthquakes In this experiment we adopt theOPTICS as the comparison which can provide a reachability-plot and can show the estimation of the thresholds

When our AP clustering algorithm had converged theposterior probabilities of 119896 are obtained as shown inFigure 11(a) It shows that the maximal posterior probabilityexists at k = 2 Also the histogram of the nearest distanceand its fitted probability distribution function are displayedin Figure 11(b)

The experiment result is shown in Figure 12 The seismicdata are grouped into three clusters red green and blueThenwe apply theOPTICS algorithm to obtain a reachability-plot of the seismic data set and an estimated threshold forthe classification which is shown in Figure 13 There are fourgrouped clusters when Eps

1

lowast = 35 times 104 (m) as shownin Figure 14 This result has two different aspects from our

method firstly there appears a brown cluster secondly thered green and blue clusters are held in common by twoalgorithms that contain more earthquakes in OPTICS thanin our algorithm

Further we analyze the results of two experiments indetail It is obvious that the clusters discovered by ouralgorithm are more accurate than those by OPTICS Theformer clusters can indicate the location of forthcomingstrong earthquakes and the following ones

According to Figure 12 three earthquake clusters arediscovered by the algorithm proposed in this paper Thered cluster indicates the foreshocks of Songpan earthquakewhich occurred in Songpan County at 32∘421015840N 104∘061015840 Eon August 16 1976 and its earthquake intensity is 72 onthe Richter scale The blue cluster means the aftershocksof Kangding-Jiulong event (119872 = 62) which occurred on


100 101 102 103 104 105 106 1072728293031323334

Background quakesClustered quakes AClustered quakes B

Clustered quakes CMain quakes

Figure 12 Seismic anomaly detection by our method

0 50 100 150 200 250 3000123456789

10times10

4

Figure 13 The reachability-plot of the regional seismic data

January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes

Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes

5 Conclusion and Future Work

In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an

100 101 102 103 104 105 106 1072728293031323334


Clustered quakes CClustered quakes DMain quakes

Figure 14 Seismic anomaly detection by OPTICS

extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS

One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University

References

[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007

[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001

[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005


[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006

[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011

[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010

[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007

[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010

[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008

[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010

[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007

[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010

[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009

[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010

[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)

[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)

[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003

[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006

[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011

[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011

[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



0 2 4 6 8 10

1

2

3

4

5

6

7

8

9

C2

C5

C3 C4

C1

Figure 1 Data set

nearest neighbor distance it can identify the number ofthe different density types in the whole data set (2) it canpartition the data set into clusters more effectively than theOPTICS and AP clustering algorithm itself

For the sake of simplicity we design a simple toy data setto explain the context of our method as shown in Figure 1In this data set there are some unknown data density typesand each of them may be nested to other clusters withother density types Firstly we transform data point set intoa distance matrix and recognize the point density typesaccording to the nearest neighbor distance matrix Then weuse AP clustering algorithm to classify the data points intoseveral clusters The whole clustering process consists of twosteps (as shown in Figure 4) The first step is to identifydividing value of data density and the second step is toestablish the mechanism to form clusters

This paper is organized as follows in Section 2 somerelated works on AP clustering algorithm and some conceptsare briefly introduced in Section 3 the new semisupervisedAP clustering algorithm is illustrated in detail In Section 4the proposed algorithm is applied to cluster some data setsboth simulated and real data and finally some conclusionsare summarized in Section 5

2 Related Work

21 Related Work on AP Cluster Methods The goal of clusteranalysis is to group similar data into some meaningful sub-classes (or named clusters) There are three main categoriesin the clustering methods partitioning hierarchical anddensity-based

The first clustering category tries to obtain a rationalpartition of the data set that the data in the same cluster aremore similar to each other than to the data in other clustersUsually the partitioning criterion takes the minimum sumof the Euclidian distance K-means and K-medoids are theclassical partitioning cluster methods

The second clustering category works by grouping dataobjects into a dendrogram tree of clusterThis category can befurther classified into eithermerging (bottom-up) or splitting

0 2 4 6 8 101

2

3

4

5

6

7

8

9

10

Figure 2 The clusters by AP

0 2 4 6 8 101

2

3

4

5

6

7

8

9

10

0 2 4 6 8 101

2

3

4

5

6

7

8

9

C3

C5

C4

C1 C2

Figure 3 The ideal clusters

Data point set

The data density types

Clusters

Detecting the thresholds of different data density types

Clustering by AP algorithm

Figure 4 The two-step process for discovering clusters withdifferent densities











1




1obeys


119866119863119898

(119889) = 119875 (119863119898ge 119889) = 1 minus

119898minus1

sum

119897=0

119890minus120582120587119889

2

(1205821205871198892)

119897 (1)



derivative of 119866119863119898


119891119883119898(119909119898 120582) =

119890minus120582120587119909

2

2(120582120587)1198981199092119898minus1

(119898 minus 1) (2)


119863119898sim

119896

sum

119894=1

119908119894119891 (119889119898 120582

119894) =

119896

sum

119894=1

119908119894

119890minus1205821198941205871199092

2(120582119894120587)1198981198892119898minus1

(119898 minus 1) (3)


119894(119908119894gt 0 and sum119908





119894 120582119894) can be







0 20 40 60 80 100 1200

001002003004005006007008009

01

Distance

Prob

abili

ty


fb = 1000

fb = 10
















119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)


119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)

(119895 isin 1 2 119873 119895 = 119896)

(5)


119860 (119894 119896) = min

0 119877 (119896 119896) +sum

119895

max (0 119877 (119895 119896))

(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)

119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)

(119895 isin 1 2 119873 119895 = 119896)

(6)


119877119894+1(119894 119896) = lam lowast 119877


119894+1(119894 119896)

119860119894+1(119894 119896) = lam lowast 119860


119894+1(119894 119896)

(7)


119888lowast


119896

119877 (119894 119896) + 119860 (119894 119895) (8)












Iteration over

End


Similaritymatrix


0 100 200 300 400 500 600 700 800 900 10000

100200300400500600700800900

1000










0 200 400 600 800 1000 1200 1400 1600 18000

20406080

100

Point order

Dist

ance

(a)

0 200 400 600 800 1000 1200 1400 1600 18000

1020304050

Point order

Dist

ance

(b)



0 1 2 3 4 5 6 7 8 9 100

020406

k

Prob

abili

ty

(a)

0 2 4 6 8 10 12 140

020406

Distance

Prob

abili

ty

(b)





1






100 101 102 103 104 105 106 1072728293031323334




0 50 100 150 200 250 3000123456789

10times10

4






100 101 102 103 104 105 106 1072728293031323334








Acknowledgments


References






























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in













1




1obeys


119866119863119898

(119889) = 119875 (119863119898ge 119889) = 1 minus

119898minus1

sum

119897=0

119890minus120582120587119889

2

(1205821205871198892)

119897 (1)



derivative of 119866119863119898


119891119883119898(119909119898 120582) =

119890minus120582120587119909

2

2(120582120587)1198981199092119898minus1

(119898 minus 1) (2)


119863119898sim

119896

sum

119894=1

119908119894119891 (119889119898 120582

119894) =

119896

sum

119894=1

119908119894

119890minus1205821198941205871199092

2(120582119894120587)1198981198892119898minus1

(119898 minus 1) (3)


119894(119908119894gt 0 and sum119908





119894 120582119894) can be







0 20 40 60 80 100 1200

001002003004005006007008009

01

Distance

Prob

abili

ty


fb = 1000

fb = 10
















119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)


119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)

(119895 isin 1 2 119873 119895 = 119896)

(5)


119860 (119894 119896) = min

0 119877 (119896 119896) +sum

119895

max (0 119877 (119895 119896))

(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)

119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)

(119895 isin 1 2 119873 119895 = 119896)

(6)


119877119894+1(119894 119896) = lam lowast 119877


119894+1(119894 119896)

119860119894+1(119894 119896) = lam lowast 119860


119894+1(119894 119896)

(7)


119888lowast


119896

119877 (119894 119896) + 119860 (119894 119895) (8)












Iteration over

End


Similaritymatrix


0 100 200 300 400 500 600 700 800 900 10000

100200300400500600700800900

1000










0 200 400 600 800 1000 1200 1400 1600 18000

20406080

100

Point order

Dist

ance

(a)

0 200 400 600 800 1000 1200 1400 1600 18000

1020304050

Point order

Dist

ance

(b)



0 1 2 3 4 5 6 7 8 9 100

020406

k

Prob

abili

ty

(a)

0 2 4 6 8 10 12 140

020406

Distance

Prob

abili

ty

(b)





1






100 101 102 103 104 105 106 1072728293031323334




0 50 100 150 200 250 3000123456789

10times10

4






100 101 102 103 104 105 106 1072728293031323334








Acknowledgments


References






























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




0 20 40 60 80 100 1200

001002003004005006007008009

01

Distance

Prob

abili

ty


fb = 1000

fb = 10
















119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)


119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)

(119895 isin 1 2 119873 119895 = 119896)

(5)


119860 (119894 119896) = min

0 119877 (119896 119896) +sum

119895

max (0 119877 (119895 119896))

(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)

119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)

(119895 isin 1 2 119873 119895 = 119896)

(6)


119877119894+1(119894 119896) = lam lowast 119877


119894+1(119894 119896)

119860119894+1(119894 119896) = lam lowast 119860


119894+1(119894 119896)

(7)


119888lowast


119896

119877 (119894 119896) + 119860 (119894 119895) (8)












Iteration over

End


Similaritymatrix


0 100 200 300 400 500 600 700 800 900 10000

100200300400500600700800900

1000










0 200 400 600 800 1000 1200 1400 1600 18000

20406080

100

Point order

Dist

ance

(a)

0 200 400 600 800 1000 1200 1400 1600 18000

1020304050

Point order

Dist

ance

(b)



0 1 2 3 4 5 6 7 8 9 100

020406

k

Prob

abili

ty

(a)

0 2 4 6 8 10 12 140

020406

Distance

Prob

abili

ty

(b)





1






100 101 102 103 104 105 106 1072728293031323334




0 50 100 150 200 250 3000123456789

10times10

4






100 101 102 103 104 105 106 1072728293031323334








Acknowledgments


References






























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in







Iteration over

End


Similaritymatrix


0 100 200 300 400 500 600 700 800 900 10000

100200300400500600700800900

1000










0 200 400 600 800 1000 1200 1400 1600 18000

20406080

100

Point order

Dist

ance

(a)

0 200 400 600 800 1000 1200 1400 1600 18000

1020304050

Point order

Dist

ance

(b)



0 1 2 3 4 5 6 7 8 9 100

020406

k

Prob

abili

ty

(a)

0 2 4 6 8 10 12 140

020406

Distance

Prob

abili

ty

(b)





1






100 101 102 103 104 105 106 1072728293031323334




0 50 100 150 200 250 3000123456789

10times10

4






100 101 102 103 104 105 106 1072728293031323334








Acknowledgments


References






























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




0 200 400 600 800 1000 1200 1400 1600 18000

20406080

100

Point order

Dist

ance

(a)

0 200 400 600 800 1000 1200 1400 1600 18000

1020304050

Point order

Dist

ance

(b)



0 1 2 3 4 5 6 7 8 9 100

020406

k

Prob

abili

ty

(a)

0 2 4 6 8 10 12 140

020406

Distance

Prob

abili

ty

(b)





1






100 101 102 103 104 105 106 1072728293031323334




0 50 100 150 200 250 3000123456789

10times10

4






100 101 102 103 104 105 106 1072728293031323334








Acknowledgments


References






























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




100 101 102 103 104 105 106 1072728293031323334




0 50 100 150 200 250 3000123456789

10times10

4






100 101 102 103 104 105 106 1072728293031323334








Acknowledgments


References






























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



Research Article An Extended Affinity Propagation...

Documents

Transcript of Research Article An Extended Affinity Propagation...