Research Article An Extended Affinity Propagation...
Transcript of Research Article An Extended Affinity Propagation...
Research ArticleAn Extended Affinity Propagation Clustering Method Based onDifferent Data Density Types
XiuLi Zhao12 and WeiXiang Xu3
1State Key Laboratory of Rail Traffic Control and Safety Beijing 100044 China2Business School Qilu University of Technology Jinan 250353 China3School of Traffic and Transportation Beijing Jiaotong University Beijing 100044 China
Correspondence should be addressed to XiuLi Zhao 07114212bjtueducn
Received 21 September 2014 Revised 10 December 2014 Accepted 18 December 2014
Academic Editor Yongjun Shen
Copyright copy 2015 X Zhao and W Xu This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Affinity propagation (AP) algorithm as a novel clustering method does not require the users to specify the initial cluster centers inadvance which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similardegree among the data points But inmany cases there exist some different intensive areaswithin the same data set whichmeans thatthe data set does not distribute homogeneously In such situation the AP algorithm cannot group the data points into ideal clustersIn this paper we proposed an extended AP clustering algorithm to deal with such a problem There are two steps in our methodfirstly the data set is partitioned into several data density types according to the nearest distances of each data point and then theAP clustering method is respectively used to group the data points into clusters in each data density type Two experiments arecarried out to evaluate the performance of our algorithm one utilizes an artificial data set and the other uses a real seismic data setThe experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithmitself
1 Introduction
Affinity propagation (AP) is a new partitioning clusteringalgorithm proposed by Frey and Dueck in 2007 [1] Bybeing different from the traditional partitioning clusteringmethods AP clustering algorithm does not need to specifythe initial cluster centers but automatically finds a subset ofexemplar points which can best describe the data groups byexchanging messages The messages to be exchanged amongthe data points are based on a set of similarities between thepairs of data points AP algorithm assigns each data point toits nearest exemplar which results in a partitioning of thewhole data set into some clusters AP algorithm convergesin the maximization of the overall sum of the similaritiesbetween data points and their exemplars
Most clustering algorithms store and refine a fixed num-ber of potential cluster centers while affinity propagationdoes not need them AP algorithm equally regards all datapoints as potential exemplars (also named cluster centers)Among the data points there are two kinds of messages to be
exchanged the availability and the responsibility The formerone sent from point 119896 to point 119894 indicates how much supportpoint 119896 has received from other points for being an exemplarThe latter one sent from point 119894 to point 119896 indicates how well-suited point 119896 is as an exemplar for point 119894 in contrast to otherpotential exemplarsWhen the sumof the responsibilities andavailabilities is maximized the proceeding of the message-passing ends and a clear set of exemplars and clusters emerge
But the AP clustering algorithm cannot deal with thenested clusters which have different data density typesThe toy example in Figure 1 illustrates clusters C1 C2 C3C4 and C5 with more intensive data but AP clusteringalgorithm cannot recognize them Figure 2 shows the resultof AP clustering algorithm which is in disagreement withthe reality Obviously we want to obtain the clusters asshown in Figure 3 In this paper we propose an extendedAP clustering algorithm that can cluster data point set intoclusters according to their different density types There aretwo novelties in our new extended AP clustering algorithm(1) according to the frequency distribution curve of the
Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2015 Article ID 828057 8 pageshttpdxdoiorg1011552015828057
2 Computational Intelligence and Neuroscience
0 2 4 6 8 10
1
2
3
4
5
6
7
8
9
C2
C5
C3 C4
C1
Figure 1 Data set
nearest neighbor distance it can identify the number ofthe different density types in the whole data set (2) it canpartition the data set into clusters more effectively than theOPTICS and AP clustering algorithm itself
For the sake of simplicity we design a simple toy data setto explain the context of our method as shown in Figure 1In this data set there are some unknown data density typesand each of them may be nested to other clusters withother density types Firstly we transform data point set intoa distance matrix and recognize the point density typesaccording to the nearest neighbor distance matrix Then weuse AP clustering algorithm to classify the data points intoseveral clusters The whole clustering process consists of twosteps (as shown in Figure 4) The first step is to identifydividing value of data density and the second step is toestablish the mechanism to form clusters
This paper is organized as follows in Section 2 somerelated works on AP clustering algorithm and some conceptsare briefly introduced in Section 3 the new semisupervisedAP clustering algorithm is illustrated in detail In Section 4the proposed algorithm is applied to cluster some data setsboth simulated and real data and finally some conclusionsare summarized in Section 5
2 Related Work
21 Related Work on AP Cluster Methods The goal of clusteranalysis is to group similar data into some meaningful sub-classes (or named clusters) There are three main categoriesin the clustering methods partitioning hierarchical anddensity-based
The first clustering category tries to obtain a rationalpartition of the data set that the data in the same cluster aremore similar to each other than to the data in other clustersUsually the partitioning criterion takes the minimum sumof the Euclidian distance K-means and K-medoids are theclassical partitioning cluster methods
The second clustering category works by grouping dataobjects into a dendrogram tree of clusterThis category can befurther classified into eithermerging (bottom-up) or splitting
0 2 4 6 8 101
2
3
4
5
6
7
8
9
10
Figure 2 The clusters by AP
0 2 4 6 8 101
2
3
4
5
6
7
8
9
10
0 2 4 6 8 101
2
3
4
5
6
7
8
9
C3
C5
C4
C1 C2
Figure 3 The ideal clusters
Data point set
The data density types
Clusters
Detecting the thresholds of different data density types
Clustering by AP algorithm
Figure 4 The two-step process for discovering clusters withdifferent densities
Computational Intelligence and Neuroscience 3
(top-down)The former one begins by placing each data pointin its own cluster and then merges these atomic clusters intolarger and larger clusters until certain termination conditionsare satisfied The latter one does reverse to the merging oneby starting with all data points as a whole cluster and thensubdivides this cluster into smaller and smaller pieces until itsatisfies certain termination conditions (Han et al 2001) [2]
Differing from the above clusteringmethods the density-based clustering algorithms can discover intensive subclassesbased on the similar degree between a data point and itsneighbors Such subclass constitutes more dense local area[3 4]
In recent years as a new clustering technique AP cluster-ing algorithm proposed by Frey and Dueck in 2007 mainlydepends on message passing to obtain clusters effectivelyDuring the clustering process AP algorithm firstly considersall data points as potential exemplars and recursively trans-mits real-valued messages along the edges of the networkuntil a set of good centers are generated
In order to speed up convergence rate some improvedAP algorithms have been proposed Liu and Fu used theconstriction factor to regulate damping factor dynamicallyand sped up the AP algorithm [5] Gu et al utilize localdistribution to add constraint like semisupervised clusteringto construct sparse similarity matrix [6] Wang et al proposean adaptive AP method to overcome inherent imperfectionsof the algorithm [7] In the recent years AP algorithm hasbeen applied in many domains and it effectively solvesmany practical problems These domains are included inimage retrieval [8] biomedicine [9] text mining [10] imagecategorization [11] facility location [12] image segmentation[13] and key frame extraction of video [14]
22 Related Work on the Nearest Neighbor Cluster Method
221 The Nearest Neighbor Distance Let 119883 = 1199091sdot sdot sdot 119909119899
be a data set containing 119899 data point objects and let1198621 119862
119896 119896 ≪ 119899 be the types of different density For
simplicity and without any loss of generality we consider justone of the density types C
1
Suppose that all data points in the C1are denoted as
119901119894 119901119894isin 1198621 119894 = 1 119898 The nearest distance (named
Dm) of pi is defined as the distance between pi and its nearestneighbor AllDm forms a distancematrixD BecauseC
1obeys
a homogenous Poisson distribution the nearest distancematrixDm also has an equivalent likelihood distribution withconstant intensity (120582) For a randomly chosen data point piits nearest distance array Dm has a cumulative distributionfunction (cdf) as shown in the following formula
119866119863119898
(119889) = 119875 (119863119898ge 119889) = 1 minus
119898minus1
sum
119897=0
119890minus120582120587119889
2
(1205821205871198892)
119897 (1)
The formula is gained by supposing a circle of radius 119889centered at a data point More detailed if Dm is longer thand there must be one of 0 1 2 119898 minus 1 data points in the
circle Thus the pdf of the nearest distance119863119898(119889119898 120582) is the
derivative of 119866119863119898
(d) as shown in the following formula
119891119883119898(119909119898 120582) =
119890minus120582120587119909
2
2(120582120587)1198981199092119898minus1
(119898 minus 1) (2)
222 ProbabilityMixture Distribution of the Nearest DistanceHereafter we will discuss the situation of multiple datadensity types with various intensive data overlap in a localarea A mixture distribution can be modeled for such datapoints with different density types As shown in Figure 1there are two density types in the toy experiment threeclusters in one data density type and two clusters in anotherdata density type as well as plenty of background noise dataAccording to the nearest distance matrix D a histogram canbe drawn to show the different distribution of the differentdensity types In order to eliminate the disadvantage of theedge effect we transform the data point into toroidal edge-corrected data thus the points near the edges will have thesame mean value of the nearest distance with the data pointsin the inner area Amixture density can be assumed to expressthe distribution for 119896 types of data point density the equationis shown as follows
119863119898sim
119896
sum
119894=1
119908119894119891 (119889119898 120582
119894) =
119896
sum
119894=1
119908119894
119890minus1205821198941205871199092
2(120582119894120587)1198981198892119898minus1
(119898 minus 1) (3)
The weight of the 119894th density type is represented by thesymbol 119908
119894(119908119894gt 0 and sum119908
119894= 1) The symbol 119898 means
the distance order and 120582119894is the 119894th intensity type To a given
data point 119901119894there is a one-to-one counterpoint of its nearest
distance Dm and data points can be grouped by splittingthe mixture distribution curve in other words the number(k) of density types and their parameters (119908
119894 120582119894) can be
determinedFigure 5 shows the histogram of the nearest distance of
the toy data set and the fitted mixture probability densityfunction where fb is defined as the expectation of theparameter 120582
119894which is simulated
3 Some Concepts Relating to APClustering Algorithm
31 Some Terminologies on AP Clustering In the parti-tioning clustering algorithm most techniques identify thecandidate exemplars in advance by the users (eg k-centersand k-means clustering) However the affinity propagationalgorithm proposed by Frey and Dueck simultaneouslyconsiders all data points as candidate exemplars In theAP clustering algorithm there are two important conceptsthe responsibility (119877(119894 119896)) and availability (119860(119894 119896)) whichrepresent two messages indicating how well-suited a datapoint is to be a potential exemplar The sum of the valuesof 119877(119894 119896) and 119860(119894 119896) is the evaluation basis whether thecorresponding data point can be a candidate exemplar or notOnce a data point is chosen to be a candidate exemplar thoseother data points with more nearer distance will be assigned
4 Computational Intelligence and Neuroscience
0 20 40 60 80 100 1200
001002003004005006007008009
01
Distance
Prob
abili
ty
HistogramBeta updatedfb = 20000
fb = 1000
fb = 10
Figure 5 Histogram of the nearest distances
Data point kResponsibility
AvailabilityData point i
Figure 6 Message passing between data points
to this cluster The detailed meaning of the terminologies isas follows
ExemplarThe center of a cluster is an actual data point
Similarity The similarity value between two data points 119909119894
and 119909119895(119894 = 119895) is usually assigned the negative Euclidean
distance such as 119878(119894 119895) = minus119909119894minus 1199091198952
Preference This parameter is set to indicate the preferencethat the data point 119894 can be chosen as an exemplar whichusually is set by median(s) of all distances In other wordspreference is the willingness of the user to select the initialldquoself-similarityrdquo values
Responsibility 119877(119894 119896) is an accumulated value which reflectshow well the point 119896 is suited to be the candidate exemplar ofdata point i and then sends from the latter to the former thatis compared to other potential exemplars the point 119896 is thebest exemplar
Availability119860(119894 119896) is opposed to119877(119894 119896) and reflects howwell-suited it is for the point 119894 to choose point 119896 as its exemplarBased on the candidate exemplar point k the accumulatedmessage sent to the data point 119894 tells it that point 119896 is morequalified as an exemplar than others
The relationship between 119877(119894 119896) and 119860(119894 119896) is illustratedin Figure 6
32TheAlgorithmofAPClustering There are fourmain stepsin AP
Step 1 (initializing the parameters) Consider
119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)
Step 2 (updating responsibility) Consider
119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)
(119895 isin 1 2 119873 119895 = 119896)
(5)
Step 3 (updating availability) Consider
119860 (119894 119896) = min
0 119877 (119896 119896) +sum
119895
max (0 119877 (119895 119896))
(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)
119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)
(119895 isin 1 2 119873 119895 = 119896)
(6)
In order to speed up convergence rate the damping coeffi-cient is added to iterative process
119877119894+1(119894 119896) = lam lowast 119877
119894(119894 119896) + (1 minus lam) lowast 119877old
119894+1(119894 119896)
119860119894+1(119894 119896) = lam lowast 119860
119894(119894 119896) + (1 minus lam) lowast 119860old
119894+1(119894 119896)
(7)
Step 4 (assigning to corresponding cluster) Consider
119888lowast
119894larr997888 arg max
119896
119877 (119894 119896) + 119860 (119894 119895) (8)
The program flow diagram is shown in Figure 7
4 The Extended AP Clustering Algorithm andExperiments
41 The Extended AP Clustering Algorithm Based on themethod of splitting the nearest distance frequency distri-bution by detecting the knee in the curve we propose asemisupervised AP clustering algorithm for recognizing theclusters with different intensity in a data set as follows
(1) Calculate every data pointrsquos nearest distance (119863119898)
(2) Draw the frequency distribution (FD) curve of thenearest distance matrix (119863)
(3) Analyze the FDcurve and identify the partition valuesfor obtaining the number of the density types of thewhole data set
(4) Run the AP clustering algorithm within each sub-dataset with different data density types and extractthe final clusters
Computational Intelligence and Neuroscience 5
Initializingmessage matrices
Computingmessage matrices
Updatingmessage matrices
Iteration over
End
Searching clusters by the sum of message matrices
Similaritymatrix
Figure 7 The flow diagram of extended AP algorithm
0 100 200 300 400 500 600 700 800 900 10000
100200300400500600700800900
1000
Figure 8 The simulated data set used in Experiment 1
42 Experiments and Analysis
421 Experiment 1 In Experiment 1 we use a simulateddata set with three Poisson distributions There are 1524 datapoints distributed at five intensities in a 1000times 1000 rectangleas shown in Figure 8The most intensive data type includes 3clusters (ie the blue green and Cambridge blue cluster) themedium density data type includes two clusters (ie the redand purple cluster) and the low density is the noise whichis dispersed in the whole rectangle We firstly applied oursemisupervised AP clustering algorithm to this dataset andcompared the result with the most familiar cluster methodOPTICS
According to implementing steps in Section 41 we cal-culate the nearest distance and then draw the FD curve of thedistance descending as shown in Figure 9(a) which indicates3 processes and 5 clusters existing in a simulated data set
As a comparison we run the OPTICS and get thereachability-plot of OPTICS (Figure 9(b))
After running the second step (AP clustering program)the clusters are obtained which is clearly consistent with theexpected results Figure 10 illustrates the clustering result
422 Experiment 2 In Experiment 2 a real seismic data set isused to evaluate our algorithm The seismic data are selectedfrom Earthquake Catalogue in West China (1970ndash1975 119872)and EarthquakeCatalogue inWest China (1976ndash1979119872 ⩾ 1)[15 16] (119872 represents the magnitude on the Richter scale)The space area covers from 100∘ to 107∘ E and from 27∘ to34∘N We selected the119872 ⩾ 2 records during 1975 and 1976
In an ideal clustering experiment the strong earthquakesforeshocks and aftershock can be clustered clearly Theformer can show the locations of strong earthquakes andthe latter can supply help to understand the mechanism ofthe earthquakes Due to the interference of the backgroundearthquakes it is difficult to discover the earthquake clustersFurthermore the earthquake records include only the coordi-nates of the epicenter of each earthquake In this situation thekey work is to discover the locations of the earthquakes from
6 Computational Intelligence and Neuroscience
0 200 400 600 800 1000 1200 1400 1600 18000
20406080
100
Point order
Dist
ance
(a)
0 200 400 600 800 1000 1200 1400 1600 18000
1020304050
Point order
Dist
ance
(b)
Figure 9 The different data density types via FD curve and reachability-plot
Figure 10 The result of Experiment 1
0 1 2 3 4 5 6 7 8 9 100
020406
k
Prob
abili
ty
(a)
0 2 4 6 8 10 12 140
020406
Distance
Prob
abili
ty
(b)
Figure 11 (a) The posterior probability of 119896 (b) the histogram of the nearest distance and the fitted curve
the background earthquakes In this experiment we adopt theOPTICS as the comparison which can provide a reachability-plot and can show the estimation of the thresholds
When our AP clustering algorithm had converged theposterior probabilities of 119896 are obtained as shown inFigure 11(a) It shows that the maximal posterior probabilityexists at k = 2 Also the histogram of the nearest distanceand its fitted probability distribution function are displayedin Figure 11(b)
The experiment result is shown in Figure 12 The seismicdata are grouped into three clusters red green and blueThenwe apply theOPTICS algorithm to obtain a reachability-plot of the seismic data set and an estimated threshold forthe classification which is shown in Figure 13 There are fourgrouped clusters when Eps
1
lowast = 35 times 104 (m) as shownin Figure 14 This result has two different aspects from our
method firstly there appears a brown cluster secondly thered green and blue clusters are held in common by twoalgorithms that contain more earthquakes in OPTICS thanin our algorithm
Further we analyze the results of two experiments indetail It is obvious that the clusters discovered by ouralgorithm are more accurate than those by OPTICS Theformer clusters can indicate the location of forthcomingstrong earthquakes and the following ones
According to Figure 12 three earthquake clusters arediscovered by the algorithm proposed in this paper Thered cluster indicates the foreshocks of Songpan earthquakewhich occurred in Songpan County at 32∘421015840N 104∘061015840 Eon August 16 1976 and its earthquake intensity is 72 onthe Richter scale The blue cluster means the aftershocksof Kangding-Jiulong event (119872 = 62) which occurred on
Computational Intelligence and Neuroscience 7
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CMain quakes
Figure 12 Seismic anomaly detection by our method
0 50 100 150 200 250 3000123456789
10times10
4
Figure 13 The reachability-plot of the regional seismic data
January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes
Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes
5 Conclusion and Future Work
In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CClustered quakes DMain quakes
Figure 14 Seismic anomaly detection by OPTICS
extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS
One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University
References
[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007
[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001
[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 Computational Intelligence and Neuroscience
0 2 4 6 8 10
1
2
3
4
5
6
7
8
9
C2
C5
C3 C4
C1
Figure 1 Data set
nearest neighbor distance it can identify the number ofthe different density types in the whole data set (2) it canpartition the data set into clusters more effectively than theOPTICS and AP clustering algorithm itself
For the sake of simplicity we design a simple toy data setto explain the context of our method as shown in Figure 1In this data set there are some unknown data density typesand each of them may be nested to other clusters withother density types Firstly we transform data point set intoa distance matrix and recognize the point density typesaccording to the nearest neighbor distance matrix Then weuse AP clustering algorithm to classify the data points intoseveral clusters The whole clustering process consists of twosteps (as shown in Figure 4) The first step is to identifydividing value of data density and the second step is toestablish the mechanism to form clusters
This paper is organized as follows in Section 2 somerelated works on AP clustering algorithm and some conceptsare briefly introduced in Section 3 the new semisupervisedAP clustering algorithm is illustrated in detail In Section 4the proposed algorithm is applied to cluster some data setsboth simulated and real data and finally some conclusionsare summarized in Section 5
2 Related Work
21 Related Work on AP Cluster Methods The goal of clusteranalysis is to group similar data into some meaningful sub-classes (or named clusters) There are three main categoriesin the clustering methods partitioning hierarchical anddensity-based
The first clustering category tries to obtain a rationalpartition of the data set that the data in the same cluster aremore similar to each other than to the data in other clustersUsually the partitioning criterion takes the minimum sumof the Euclidian distance K-means and K-medoids are theclassical partitioning cluster methods
The second clustering category works by grouping dataobjects into a dendrogram tree of clusterThis category can befurther classified into eithermerging (bottom-up) or splitting
0 2 4 6 8 101
2
3
4
5
6
7
8
9
10
Figure 2 The clusters by AP
0 2 4 6 8 101
2
3
4
5
6
7
8
9
10
0 2 4 6 8 101
2
3
4
5
6
7
8
9
C3
C5
C4
C1 C2
Figure 3 The ideal clusters
Data point set
The data density types
Clusters
Detecting the thresholds of different data density types
Clustering by AP algorithm
Figure 4 The two-step process for discovering clusters withdifferent densities
Computational Intelligence and Neuroscience 3
(top-down)The former one begins by placing each data pointin its own cluster and then merges these atomic clusters intolarger and larger clusters until certain termination conditionsare satisfied The latter one does reverse to the merging oneby starting with all data points as a whole cluster and thensubdivides this cluster into smaller and smaller pieces until itsatisfies certain termination conditions (Han et al 2001) [2]
Differing from the above clusteringmethods the density-based clustering algorithms can discover intensive subclassesbased on the similar degree between a data point and itsneighbors Such subclass constitutes more dense local area[3 4]
In recent years as a new clustering technique AP cluster-ing algorithm proposed by Frey and Dueck in 2007 mainlydepends on message passing to obtain clusters effectivelyDuring the clustering process AP algorithm firstly considersall data points as potential exemplars and recursively trans-mits real-valued messages along the edges of the networkuntil a set of good centers are generated
In order to speed up convergence rate some improvedAP algorithms have been proposed Liu and Fu used theconstriction factor to regulate damping factor dynamicallyand sped up the AP algorithm [5] Gu et al utilize localdistribution to add constraint like semisupervised clusteringto construct sparse similarity matrix [6] Wang et al proposean adaptive AP method to overcome inherent imperfectionsof the algorithm [7] In the recent years AP algorithm hasbeen applied in many domains and it effectively solvesmany practical problems These domains are included inimage retrieval [8] biomedicine [9] text mining [10] imagecategorization [11] facility location [12] image segmentation[13] and key frame extraction of video [14]
22 Related Work on the Nearest Neighbor Cluster Method
221 The Nearest Neighbor Distance Let 119883 = 1199091sdot sdot sdot 119909119899
be a data set containing 119899 data point objects and let1198621 119862
119896 119896 ≪ 119899 be the types of different density For
simplicity and without any loss of generality we consider justone of the density types C
1
Suppose that all data points in the C1are denoted as
119901119894 119901119894isin 1198621 119894 = 1 119898 The nearest distance (named
Dm) of pi is defined as the distance between pi and its nearestneighbor AllDm forms a distancematrixD BecauseC
1obeys
a homogenous Poisson distribution the nearest distancematrixDm also has an equivalent likelihood distribution withconstant intensity (120582) For a randomly chosen data point piits nearest distance array Dm has a cumulative distributionfunction (cdf) as shown in the following formula
119866119863119898
(119889) = 119875 (119863119898ge 119889) = 1 minus
119898minus1
sum
119897=0
119890minus120582120587119889
2
(1205821205871198892)
119897 (1)
The formula is gained by supposing a circle of radius 119889centered at a data point More detailed if Dm is longer thand there must be one of 0 1 2 119898 minus 1 data points in the
circle Thus the pdf of the nearest distance119863119898(119889119898 120582) is the
derivative of 119866119863119898
(d) as shown in the following formula
119891119883119898(119909119898 120582) =
119890minus120582120587119909
2
2(120582120587)1198981199092119898minus1
(119898 minus 1) (2)
222 ProbabilityMixture Distribution of the Nearest DistanceHereafter we will discuss the situation of multiple datadensity types with various intensive data overlap in a localarea A mixture distribution can be modeled for such datapoints with different density types As shown in Figure 1there are two density types in the toy experiment threeclusters in one data density type and two clusters in anotherdata density type as well as plenty of background noise dataAccording to the nearest distance matrix D a histogram canbe drawn to show the different distribution of the differentdensity types In order to eliminate the disadvantage of theedge effect we transform the data point into toroidal edge-corrected data thus the points near the edges will have thesame mean value of the nearest distance with the data pointsin the inner area Amixture density can be assumed to expressthe distribution for 119896 types of data point density the equationis shown as follows
119863119898sim
119896
sum
119894=1
119908119894119891 (119889119898 120582
119894) =
119896
sum
119894=1
119908119894
119890minus1205821198941205871199092
2(120582119894120587)1198981198892119898minus1
(119898 minus 1) (3)
The weight of the 119894th density type is represented by thesymbol 119908
119894(119908119894gt 0 and sum119908
119894= 1) The symbol 119898 means
the distance order and 120582119894is the 119894th intensity type To a given
data point 119901119894there is a one-to-one counterpoint of its nearest
distance Dm and data points can be grouped by splittingthe mixture distribution curve in other words the number(k) of density types and their parameters (119908
119894 120582119894) can be
determinedFigure 5 shows the histogram of the nearest distance of
the toy data set and the fitted mixture probability densityfunction where fb is defined as the expectation of theparameter 120582
119894which is simulated
3 Some Concepts Relating to APClustering Algorithm
31 Some Terminologies on AP Clustering In the parti-tioning clustering algorithm most techniques identify thecandidate exemplars in advance by the users (eg k-centersand k-means clustering) However the affinity propagationalgorithm proposed by Frey and Dueck simultaneouslyconsiders all data points as candidate exemplars In theAP clustering algorithm there are two important conceptsthe responsibility (119877(119894 119896)) and availability (119860(119894 119896)) whichrepresent two messages indicating how well-suited a datapoint is to be a potential exemplar The sum of the valuesof 119877(119894 119896) and 119860(119894 119896) is the evaluation basis whether thecorresponding data point can be a candidate exemplar or notOnce a data point is chosen to be a candidate exemplar thoseother data points with more nearer distance will be assigned
4 Computational Intelligence and Neuroscience
0 20 40 60 80 100 1200
001002003004005006007008009
01
Distance
Prob
abili
ty
HistogramBeta updatedfb = 20000
fb = 1000
fb = 10
Figure 5 Histogram of the nearest distances
Data point kResponsibility
AvailabilityData point i
Figure 6 Message passing between data points
to this cluster The detailed meaning of the terminologies isas follows
ExemplarThe center of a cluster is an actual data point
Similarity The similarity value between two data points 119909119894
and 119909119895(119894 = 119895) is usually assigned the negative Euclidean
distance such as 119878(119894 119895) = minus119909119894minus 1199091198952
Preference This parameter is set to indicate the preferencethat the data point 119894 can be chosen as an exemplar whichusually is set by median(s) of all distances In other wordspreference is the willingness of the user to select the initialldquoself-similarityrdquo values
Responsibility 119877(119894 119896) is an accumulated value which reflectshow well the point 119896 is suited to be the candidate exemplar ofdata point i and then sends from the latter to the former thatis compared to other potential exemplars the point 119896 is thebest exemplar
Availability119860(119894 119896) is opposed to119877(119894 119896) and reflects howwell-suited it is for the point 119894 to choose point 119896 as its exemplarBased on the candidate exemplar point k the accumulatedmessage sent to the data point 119894 tells it that point 119896 is morequalified as an exemplar than others
The relationship between 119877(119894 119896) and 119860(119894 119896) is illustratedin Figure 6
32TheAlgorithmofAPClustering There are fourmain stepsin AP
Step 1 (initializing the parameters) Consider
119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)
Step 2 (updating responsibility) Consider
119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)
(119895 isin 1 2 119873 119895 = 119896)
(5)
Step 3 (updating availability) Consider
119860 (119894 119896) = min
0 119877 (119896 119896) +sum
119895
max (0 119877 (119895 119896))
(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)
119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)
(119895 isin 1 2 119873 119895 = 119896)
(6)
In order to speed up convergence rate the damping coeffi-cient is added to iterative process
119877119894+1(119894 119896) = lam lowast 119877
119894(119894 119896) + (1 minus lam) lowast 119877old
119894+1(119894 119896)
119860119894+1(119894 119896) = lam lowast 119860
119894(119894 119896) + (1 minus lam) lowast 119860old
119894+1(119894 119896)
(7)
Step 4 (assigning to corresponding cluster) Consider
119888lowast
119894larr997888 arg max
119896
119877 (119894 119896) + 119860 (119894 119895) (8)
The program flow diagram is shown in Figure 7
4 The Extended AP Clustering Algorithm andExperiments
41 The Extended AP Clustering Algorithm Based on themethod of splitting the nearest distance frequency distri-bution by detecting the knee in the curve we propose asemisupervised AP clustering algorithm for recognizing theclusters with different intensity in a data set as follows
(1) Calculate every data pointrsquos nearest distance (119863119898)
(2) Draw the frequency distribution (FD) curve of thenearest distance matrix (119863)
(3) Analyze the FDcurve and identify the partition valuesfor obtaining the number of the density types of thewhole data set
(4) Run the AP clustering algorithm within each sub-dataset with different data density types and extractthe final clusters
Computational Intelligence and Neuroscience 5
Initializingmessage matrices
Computingmessage matrices
Updatingmessage matrices
Iteration over
End
Searching clusters by the sum of message matrices
Similaritymatrix
Figure 7 The flow diagram of extended AP algorithm
0 100 200 300 400 500 600 700 800 900 10000
100200300400500600700800900
1000
Figure 8 The simulated data set used in Experiment 1
42 Experiments and Analysis
421 Experiment 1 In Experiment 1 we use a simulateddata set with three Poisson distributions There are 1524 datapoints distributed at five intensities in a 1000times 1000 rectangleas shown in Figure 8The most intensive data type includes 3clusters (ie the blue green and Cambridge blue cluster) themedium density data type includes two clusters (ie the redand purple cluster) and the low density is the noise whichis dispersed in the whole rectangle We firstly applied oursemisupervised AP clustering algorithm to this dataset andcompared the result with the most familiar cluster methodOPTICS
According to implementing steps in Section 41 we cal-culate the nearest distance and then draw the FD curve of thedistance descending as shown in Figure 9(a) which indicates3 processes and 5 clusters existing in a simulated data set
As a comparison we run the OPTICS and get thereachability-plot of OPTICS (Figure 9(b))
After running the second step (AP clustering program)the clusters are obtained which is clearly consistent with theexpected results Figure 10 illustrates the clustering result
422 Experiment 2 In Experiment 2 a real seismic data set isused to evaluate our algorithm The seismic data are selectedfrom Earthquake Catalogue in West China (1970ndash1975 119872)and EarthquakeCatalogue inWest China (1976ndash1979119872 ⩾ 1)[15 16] (119872 represents the magnitude on the Richter scale)The space area covers from 100∘ to 107∘ E and from 27∘ to34∘N We selected the119872 ⩾ 2 records during 1975 and 1976
In an ideal clustering experiment the strong earthquakesforeshocks and aftershock can be clustered clearly Theformer can show the locations of strong earthquakes andthe latter can supply help to understand the mechanism ofthe earthquakes Due to the interference of the backgroundearthquakes it is difficult to discover the earthquake clustersFurthermore the earthquake records include only the coordi-nates of the epicenter of each earthquake In this situation thekey work is to discover the locations of the earthquakes from
6 Computational Intelligence and Neuroscience
0 200 400 600 800 1000 1200 1400 1600 18000
20406080
100
Point order
Dist
ance
(a)
0 200 400 600 800 1000 1200 1400 1600 18000
1020304050
Point order
Dist
ance
(b)
Figure 9 The different data density types via FD curve and reachability-plot
Figure 10 The result of Experiment 1
0 1 2 3 4 5 6 7 8 9 100
020406
k
Prob
abili
ty
(a)
0 2 4 6 8 10 12 140
020406
Distance
Prob
abili
ty
(b)
Figure 11 (a) The posterior probability of 119896 (b) the histogram of the nearest distance and the fitted curve
the background earthquakes In this experiment we adopt theOPTICS as the comparison which can provide a reachability-plot and can show the estimation of the thresholds
When our AP clustering algorithm had converged theposterior probabilities of 119896 are obtained as shown inFigure 11(a) It shows that the maximal posterior probabilityexists at k = 2 Also the histogram of the nearest distanceand its fitted probability distribution function are displayedin Figure 11(b)
The experiment result is shown in Figure 12 The seismicdata are grouped into three clusters red green and blueThenwe apply theOPTICS algorithm to obtain a reachability-plot of the seismic data set and an estimated threshold forthe classification which is shown in Figure 13 There are fourgrouped clusters when Eps
1
lowast = 35 times 104 (m) as shownin Figure 14 This result has two different aspects from our
method firstly there appears a brown cluster secondly thered green and blue clusters are held in common by twoalgorithms that contain more earthquakes in OPTICS thanin our algorithm
Further we analyze the results of two experiments indetail It is obvious that the clusters discovered by ouralgorithm are more accurate than those by OPTICS Theformer clusters can indicate the location of forthcomingstrong earthquakes and the following ones
According to Figure 12 three earthquake clusters arediscovered by the algorithm proposed in this paper Thered cluster indicates the foreshocks of Songpan earthquakewhich occurred in Songpan County at 32∘421015840N 104∘061015840 Eon August 16 1976 and its earthquake intensity is 72 onthe Richter scale The blue cluster means the aftershocksof Kangding-Jiulong event (119872 = 62) which occurred on
Computational Intelligence and Neuroscience 7
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CMain quakes
Figure 12 Seismic anomaly detection by our method
0 50 100 150 200 250 3000123456789
10times10
4
Figure 13 The reachability-plot of the regional seismic data
January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes
Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes
5 Conclusion and Future Work
In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CClustered quakes DMain quakes
Figure 14 Seismic anomaly detection by OPTICS
extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS
One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University
References
[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007
[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001
[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 3
(top-down)The former one begins by placing each data pointin its own cluster and then merges these atomic clusters intolarger and larger clusters until certain termination conditionsare satisfied The latter one does reverse to the merging oneby starting with all data points as a whole cluster and thensubdivides this cluster into smaller and smaller pieces until itsatisfies certain termination conditions (Han et al 2001) [2]
Differing from the above clusteringmethods the density-based clustering algorithms can discover intensive subclassesbased on the similar degree between a data point and itsneighbors Such subclass constitutes more dense local area[3 4]
In recent years as a new clustering technique AP cluster-ing algorithm proposed by Frey and Dueck in 2007 mainlydepends on message passing to obtain clusters effectivelyDuring the clustering process AP algorithm firstly considersall data points as potential exemplars and recursively trans-mits real-valued messages along the edges of the networkuntil a set of good centers are generated
In order to speed up convergence rate some improvedAP algorithms have been proposed Liu and Fu used theconstriction factor to regulate damping factor dynamicallyand sped up the AP algorithm [5] Gu et al utilize localdistribution to add constraint like semisupervised clusteringto construct sparse similarity matrix [6] Wang et al proposean adaptive AP method to overcome inherent imperfectionsof the algorithm [7] In the recent years AP algorithm hasbeen applied in many domains and it effectively solvesmany practical problems These domains are included inimage retrieval [8] biomedicine [9] text mining [10] imagecategorization [11] facility location [12] image segmentation[13] and key frame extraction of video [14]
22 Related Work on the Nearest Neighbor Cluster Method
221 The Nearest Neighbor Distance Let 119883 = 1199091sdot sdot sdot 119909119899
be a data set containing 119899 data point objects and let1198621 119862
119896 119896 ≪ 119899 be the types of different density For
simplicity and without any loss of generality we consider justone of the density types C
1
Suppose that all data points in the C1are denoted as
119901119894 119901119894isin 1198621 119894 = 1 119898 The nearest distance (named
Dm) of pi is defined as the distance between pi and its nearestneighbor AllDm forms a distancematrixD BecauseC
1obeys
a homogenous Poisson distribution the nearest distancematrixDm also has an equivalent likelihood distribution withconstant intensity (120582) For a randomly chosen data point piits nearest distance array Dm has a cumulative distributionfunction (cdf) as shown in the following formula
119866119863119898
(119889) = 119875 (119863119898ge 119889) = 1 minus
119898minus1
sum
119897=0
119890minus120582120587119889
2
(1205821205871198892)
119897 (1)
The formula is gained by supposing a circle of radius 119889centered at a data point More detailed if Dm is longer thand there must be one of 0 1 2 119898 minus 1 data points in the
circle Thus the pdf of the nearest distance119863119898(119889119898 120582) is the
derivative of 119866119863119898
(d) as shown in the following formula
119891119883119898(119909119898 120582) =
119890minus120582120587119909
2
2(120582120587)1198981199092119898minus1
(119898 minus 1) (2)
222 ProbabilityMixture Distribution of the Nearest DistanceHereafter we will discuss the situation of multiple datadensity types with various intensive data overlap in a localarea A mixture distribution can be modeled for such datapoints with different density types As shown in Figure 1there are two density types in the toy experiment threeclusters in one data density type and two clusters in anotherdata density type as well as plenty of background noise dataAccording to the nearest distance matrix D a histogram canbe drawn to show the different distribution of the differentdensity types In order to eliminate the disadvantage of theedge effect we transform the data point into toroidal edge-corrected data thus the points near the edges will have thesame mean value of the nearest distance with the data pointsin the inner area Amixture density can be assumed to expressthe distribution for 119896 types of data point density the equationis shown as follows
119863119898sim
119896
sum
119894=1
119908119894119891 (119889119898 120582
119894) =
119896
sum
119894=1
119908119894
119890minus1205821198941205871199092
2(120582119894120587)1198981198892119898minus1
(119898 minus 1) (3)
The weight of the 119894th density type is represented by thesymbol 119908
119894(119908119894gt 0 and sum119908
119894= 1) The symbol 119898 means
the distance order and 120582119894is the 119894th intensity type To a given
data point 119901119894there is a one-to-one counterpoint of its nearest
distance Dm and data points can be grouped by splittingthe mixture distribution curve in other words the number(k) of density types and their parameters (119908
119894 120582119894) can be
determinedFigure 5 shows the histogram of the nearest distance of
the toy data set and the fitted mixture probability densityfunction where fb is defined as the expectation of theparameter 120582
119894which is simulated
3 Some Concepts Relating to APClustering Algorithm
31 Some Terminologies on AP Clustering In the parti-tioning clustering algorithm most techniques identify thecandidate exemplars in advance by the users (eg k-centersand k-means clustering) However the affinity propagationalgorithm proposed by Frey and Dueck simultaneouslyconsiders all data points as candidate exemplars In theAP clustering algorithm there are two important conceptsthe responsibility (119877(119894 119896)) and availability (119860(119894 119896)) whichrepresent two messages indicating how well-suited a datapoint is to be a potential exemplar The sum of the valuesof 119877(119894 119896) and 119860(119894 119896) is the evaluation basis whether thecorresponding data point can be a candidate exemplar or notOnce a data point is chosen to be a candidate exemplar thoseother data points with more nearer distance will be assigned
4 Computational Intelligence and Neuroscience
0 20 40 60 80 100 1200
001002003004005006007008009
01
Distance
Prob
abili
ty
HistogramBeta updatedfb = 20000
fb = 1000
fb = 10
Figure 5 Histogram of the nearest distances
Data point kResponsibility
AvailabilityData point i
Figure 6 Message passing between data points
to this cluster The detailed meaning of the terminologies isas follows
ExemplarThe center of a cluster is an actual data point
Similarity The similarity value between two data points 119909119894
and 119909119895(119894 = 119895) is usually assigned the negative Euclidean
distance such as 119878(119894 119895) = minus119909119894minus 1199091198952
Preference This parameter is set to indicate the preferencethat the data point 119894 can be chosen as an exemplar whichusually is set by median(s) of all distances In other wordspreference is the willingness of the user to select the initialldquoself-similarityrdquo values
Responsibility 119877(119894 119896) is an accumulated value which reflectshow well the point 119896 is suited to be the candidate exemplar ofdata point i and then sends from the latter to the former thatis compared to other potential exemplars the point 119896 is thebest exemplar
Availability119860(119894 119896) is opposed to119877(119894 119896) and reflects howwell-suited it is for the point 119894 to choose point 119896 as its exemplarBased on the candidate exemplar point k the accumulatedmessage sent to the data point 119894 tells it that point 119896 is morequalified as an exemplar than others
The relationship between 119877(119894 119896) and 119860(119894 119896) is illustratedin Figure 6
32TheAlgorithmofAPClustering There are fourmain stepsin AP
Step 1 (initializing the parameters) Consider
119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)
Step 2 (updating responsibility) Consider
119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)
(119895 isin 1 2 119873 119895 = 119896)
(5)
Step 3 (updating availability) Consider
119860 (119894 119896) = min
0 119877 (119896 119896) +sum
119895
max (0 119877 (119895 119896))
(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)
119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)
(119895 isin 1 2 119873 119895 = 119896)
(6)
In order to speed up convergence rate the damping coeffi-cient is added to iterative process
119877119894+1(119894 119896) = lam lowast 119877
119894(119894 119896) + (1 minus lam) lowast 119877old
119894+1(119894 119896)
119860119894+1(119894 119896) = lam lowast 119860
119894(119894 119896) + (1 minus lam) lowast 119860old
119894+1(119894 119896)
(7)
Step 4 (assigning to corresponding cluster) Consider
119888lowast
119894larr997888 arg max
119896
119877 (119894 119896) + 119860 (119894 119895) (8)
The program flow diagram is shown in Figure 7
4 The Extended AP Clustering Algorithm andExperiments
41 The Extended AP Clustering Algorithm Based on themethod of splitting the nearest distance frequency distri-bution by detecting the knee in the curve we propose asemisupervised AP clustering algorithm for recognizing theclusters with different intensity in a data set as follows
(1) Calculate every data pointrsquos nearest distance (119863119898)
(2) Draw the frequency distribution (FD) curve of thenearest distance matrix (119863)
(3) Analyze the FDcurve and identify the partition valuesfor obtaining the number of the density types of thewhole data set
(4) Run the AP clustering algorithm within each sub-dataset with different data density types and extractthe final clusters
Computational Intelligence and Neuroscience 5
Initializingmessage matrices
Computingmessage matrices
Updatingmessage matrices
Iteration over
End
Searching clusters by the sum of message matrices
Similaritymatrix
Figure 7 The flow diagram of extended AP algorithm
0 100 200 300 400 500 600 700 800 900 10000
100200300400500600700800900
1000
Figure 8 The simulated data set used in Experiment 1
42 Experiments and Analysis
421 Experiment 1 In Experiment 1 we use a simulateddata set with three Poisson distributions There are 1524 datapoints distributed at five intensities in a 1000times 1000 rectangleas shown in Figure 8The most intensive data type includes 3clusters (ie the blue green and Cambridge blue cluster) themedium density data type includes two clusters (ie the redand purple cluster) and the low density is the noise whichis dispersed in the whole rectangle We firstly applied oursemisupervised AP clustering algorithm to this dataset andcompared the result with the most familiar cluster methodOPTICS
According to implementing steps in Section 41 we cal-culate the nearest distance and then draw the FD curve of thedistance descending as shown in Figure 9(a) which indicates3 processes and 5 clusters existing in a simulated data set
As a comparison we run the OPTICS and get thereachability-plot of OPTICS (Figure 9(b))
After running the second step (AP clustering program)the clusters are obtained which is clearly consistent with theexpected results Figure 10 illustrates the clustering result
422 Experiment 2 In Experiment 2 a real seismic data set isused to evaluate our algorithm The seismic data are selectedfrom Earthquake Catalogue in West China (1970ndash1975 119872)and EarthquakeCatalogue inWest China (1976ndash1979119872 ⩾ 1)[15 16] (119872 represents the magnitude on the Richter scale)The space area covers from 100∘ to 107∘ E and from 27∘ to34∘N We selected the119872 ⩾ 2 records during 1975 and 1976
In an ideal clustering experiment the strong earthquakesforeshocks and aftershock can be clustered clearly Theformer can show the locations of strong earthquakes andthe latter can supply help to understand the mechanism ofthe earthquakes Due to the interference of the backgroundearthquakes it is difficult to discover the earthquake clustersFurthermore the earthquake records include only the coordi-nates of the epicenter of each earthquake In this situation thekey work is to discover the locations of the earthquakes from
6 Computational Intelligence and Neuroscience
0 200 400 600 800 1000 1200 1400 1600 18000
20406080
100
Point order
Dist
ance
(a)
0 200 400 600 800 1000 1200 1400 1600 18000
1020304050
Point order
Dist
ance
(b)
Figure 9 The different data density types via FD curve and reachability-plot
Figure 10 The result of Experiment 1
0 1 2 3 4 5 6 7 8 9 100
020406
k
Prob
abili
ty
(a)
0 2 4 6 8 10 12 140
020406
Distance
Prob
abili
ty
(b)
Figure 11 (a) The posterior probability of 119896 (b) the histogram of the nearest distance and the fitted curve
the background earthquakes In this experiment we adopt theOPTICS as the comparison which can provide a reachability-plot and can show the estimation of the thresholds
When our AP clustering algorithm had converged theposterior probabilities of 119896 are obtained as shown inFigure 11(a) It shows that the maximal posterior probabilityexists at k = 2 Also the histogram of the nearest distanceand its fitted probability distribution function are displayedin Figure 11(b)
The experiment result is shown in Figure 12 The seismicdata are grouped into three clusters red green and blueThenwe apply theOPTICS algorithm to obtain a reachability-plot of the seismic data set and an estimated threshold forthe classification which is shown in Figure 13 There are fourgrouped clusters when Eps
1
lowast = 35 times 104 (m) as shownin Figure 14 This result has two different aspects from our
method firstly there appears a brown cluster secondly thered green and blue clusters are held in common by twoalgorithms that contain more earthquakes in OPTICS thanin our algorithm
Further we analyze the results of two experiments indetail It is obvious that the clusters discovered by ouralgorithm are more accurate than those by OPTICS Theformer clusters can indicate the location of forthcomingstrong earthquakes and the following ones
According to Figure 12 three earthquake clusters arediscovered by the algorithm proposed in this paper Thered cluster indicates the foreshocks of Songpan earthquakewhich occurred in Songpan County at 32∘421015840N 104∘061015840 Eon August 16 1976 and its earthquake intensity is 72 onthe Richter scale The blue cluster means the aftershocksof Kangding-Jiulong event (119872 = 62) which occurred on
Computational Intelligence and Neuroscience 7
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CMain quakes
Figure 12 Seismic anomaly detection by our method
0 50 100 150 200 250 3000123456789
10times10
4
Figure 13 The reachability-plot of the regional seismic data
January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes
Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes
5 Conclusion and Future Work
In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CClustered quakes DMain quakes
Figure 14 Seismic anomaly detection by OPTICS
extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS
One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University
References
[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007
[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001
[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 Computational Intelligence and Neuroscience
0 20 40 60 80 100 1200
001002003004005006007008009
01
Distance
Prob
abili
ty
HistogramBeta updatedfb = 20000
fb = 1000
fb = 10
Figure 5 Histogram of the nearest distances
Data point kResponsibility
AvailabilityData point i
Figure 6 Message passing between data points
to this cluster The detailed meaning of the terminologies isas follows
ExemplarThe center of a cluster is an actual data point
Similarity The similarity value between two data points 119909119894
and 119909119895(119894 = 119895) is usually assigned the negative Euclidean
distance such as 119878(119894 119895) = minus119909119894minus 1199091198952
Preference This parameter is set to indicate the preferencethat the data point 119894 can be chosen as an exemplar whichusually is set by median(s) of all distances In other wordspreference is the willingness of the user to select the initialldquoself-similarityrdquo values
Responsibility 119877(119894 119896) is an accumulated value which reflectshow well the point 119896 is suited to be the candidate exemplar ofdata point i and then sends from the latter to the former thatis compared to other potential exemplars the point 119896 is thebest exemplar
Availability119860(119894 119896) is opposed to119877(119894 119896) and reflects howwell-suited it is for the point 119894 to choose point 119896 as its exemplarBased on the candidate exemplar point k the accumulatedmessage sent to the data point 119894 tells it that point 119896 is morequalified as an exemplar than others
The relationship between 119877(119894 119896) and 119860(119894 119896) is illustratedin Figure 6
32TheAlgorithmofAPClustering There are fourmain stepsin AP
Step 1 (initializing the parameters) Consider
119877 (119894 119896) = 0 119860 (119894 119895) = 0 forall119894 119896 (4)
Step 2 (updating responsibility) Consider
119877 (119894 119896) = 119878 (119894 119896) minusmax 119860 (119894 119895) + 119878 (119894 119895)
(119895 isin 1 2 119873 119895 = 119896)
(5)
Step 3 (updating availability) Consider
119860 (119894 119896) = min
0 119877 (119896 119896) +sum
119895
max (0 119877 (119895 119896))
(119895 isin 1 2 119873 119895 = 119894 119895 = 119896)
119860 (119896 119896) = 119875 (119896) minusmax 119860 (119896 119895) + 119878 (119896 119895)
(119895 isin 1 2 119873 119895 = 119896)
(6)
In order to speed up convergence rate the damping coeffi-cient is added to iterative process
119877119894+1(119894 119896) = lam lowast 119877
119894(119894 119896) + (1 minus lam) lowast 119877old
119894+1(119894 119896)
119860119894+1(119894 119896) = lam lowast 119860
119894(119894 119896) + (1 minus lam) lowast 119860old
119894+1(119894 119896)
(7)
Step 4 (assigning to corresponding cluster) Consider
119888lowast
119894larr997888 arg max
119896
119877 (119894 119896) + 119860 (119894 119895) (8)
The program flow diagram is shown in Figure 7
4 The Extended AP Clustering Algorithm andExperiments
41 The Extended AP Clustering Algorithm Based on themethod of splitting the nearest distance frequency distri-bution by detecting the knee in the curve we propose asemisupervised AP clustering algorithm for recognizing theclusters with different intensity in a data set as follows
(1) Calculate every data pointrsquos nearest distance (119863119898)
(2) Draw the frequency distribution (FD) curve of thenearest distance matrix (119863)
(3) Analyze the FDcurve and identify the partition valuesfor obtaining the number of the density types of thewhole data set
(4) Run the AP clustering algorithm within each sub-dataset with different data density types and extractthe final clusters
Computational Intelligence and Neuroscience 5
Initializingmessage matrices
Computingmessage matrices
Updatingmessage matrices
Iteration over
End
Searching clusters by the sum of message matrices
Similaritymatrix
Figure 7 The flow diagram of extended AP algorithm
0 100 200 300 400 500 600 700 800 900 10000
100200300400500600700800900
1000
Figure 8 The simulated data set used in Experiment 1
42 Experiments and Analysis
421 Experiment 1 In Experiment 1 we use a simulateddata set with three Poisson distributions There are 1524 datapoints distributed at five intensities in a 1000times 1000 rectangleas shown in Figure 8The most intensive data type includes 3clusters (ie the blue green and Cambridge blue cluster) themedium density data type includes two clusters (ie the redand purple cluster) and the low density is the noise whichis dispersed in the whole rectangle We firstly applied oursemisupervised AP clustering algorithm to this dataset andcompared the result with the most familiar cluster methodOPTICS
According to implementing steps in Section 41 we cal-culate the nearest distance and then draw the FD curve of thedistance descending as shown in Figure 9(a) which indicates3 processes and 5 clusters existing in a simulated data set
As a comparison we run the OPTICS and get thereachability-plot of OPTICS (Figure 9(b))
After running the second step (AP clustering program)the clusters are obtained which is clearly consistent with theexpected results Figure 10 illustrates the clustering result
422 Experiment 2 In Experiment 2 a real seismic data set isused to evaluate our algorithm The seismic data are selectedfrom Earthquake Catalogue in West China (1970ndash1975 119872)and EarthquakeCatalogue inWest China (1976ndash1979119872 ⩾ 1)[15 16] (119872 represents the magnitude on the Richter scale)The space area covers from 100∘ to 107∘ E and from 27∘ to34∘N We selected the119872 ⩾ 2 records during 1975 and 1976
In an ideal clustering experiment the strong earthquakesforeshocks and aftershock can be clustered clearly Theformer can show the locations of strong earthquakes andthe latter can supply help to understand the mechanism ofthe earthquakes Due to the interference of the backgroundearthquakes it is difficult to discover the earthquake clustersFurthermore the earthquake records include only the coordi-nates of the epicenter of each earthquake In this situation thekey work is to discover the locations of the earthquakes from
6 Computational Intelligence and Neuroscience
0 200 400 600 800 1000 1200 1400 1600 18000
20406080
100
Point order
Dist
ance
(a)
0 200 400 600 800 1000 1200 1400 1600 18000
1020304050
Point order
Dist
ance
(b)
Figure 9 The different data density types via FD curve and reachability-plot
Figure 10 The result of Experiment 1
0 1 2 3 4 5 6 7 8 9 100
020406
k
Prob
abili
ty
(a)
0 2 4 6 8 10 12 140
020406
Distance
Prob
abili
ty
(b)
Figure 11 (a) The posterior probability of 119896 (b) the histogram of the nearest distance and the fitted curve
the background earthquakes In this experiment we adopt theOPTICS as the comparison which can provide a reachability-plot and can show the estimation of the thresholds
When our AP clustering algorithm had converged theposterior probabilities of 119896 are obtained as shown inFigure 11(a) It shows that the maximal posterior probabilityexists at k = 2 Also the histogram of the nearest distanceand its fitted probability distribution function are displayedin Figure 11(b)
The experiment result is shown in Figure 12 The seismicdata are grouped into three clusters red green and blueThenwe apply theOPTICS algorithm to obtain a reachability-plot of the seismic data set and an estimated threshold forthe classification which is shown in Figure 13 There are fourgrouped clusters when Eps
1
lowast = 35 times 104 (m) as shownin Figure 14 This result has two different aspects from our
method firstly there appears a brown cluster secondly thered green and blue clusters are held in common by twoalgorithms that contain more earthquakes in OPTICS thanin our algorithm
Further we analyze the results of two experiments indetail It is obvious that the clusters discovered by ouralgorithm are more accurate than those by OPTICS Theformer clusters can indicate the location of forthcomingstrong earthquakes and the following ones
According to Figure 12 three earthquake clusters arediscovered by the algorithm proposed in this paper Thered cluster indicates the foreshocks of Songpan earthquakewhich occurred in Songpan County at 32∘421015840N 104∘061015840 Eon August 16 1976 and its earthquake intensity is 72 onthe Richter scale The blue cluster means the aftershocksof Kangding-Jiulong event (119872 = 62) which occurred on
Computational Intelligence and Neuroscience 7
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CMain quakes
Figure 12 Seismic anomaly detection by our method
0 50 100 150 200 250 3000123456789
10times10
4
Figure 13 The reachability-plot of the regional seismic data
January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes
Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes
5 Conclusion and Future Work
In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CClustered quakes DMain quakes
Figure 14 Seismic anomaly detection by OPTICS
extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS
One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University
References
[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007
[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001
[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 5
Initializingmessage matrices
Computingmessage matrices
Updatingmessage matrices
Iteration over
End
Searching clusters by the sum of message matrices
Similaritymatrix
Figure 7 The flow diagram of extended AP algorithm
0 100 200 300 400 500 600 700 800 900 10000
100200300400500600700800900
1000
Figure 8 The simulated data set used in Experiment 1
42 Experiments and Analysis
421 Experiment 1 In Experiment 1 we use a simulateddata set with three Poisson distributions There are 1524 datapoints distributed at five intensities in a 1000times 1000 rectangleas shown in Figure 8The most intensive data type includes 3clusters (ie the blue green and Cambridge blue cluster) themedium density data type includes two clusters (ie the redand purple cluster) and the low density is the noise whichis dispersed in the whole rectangle We firstly applied oursemisupervised AP clustering algorithm to this dataset andcompared the result with the most familiar cluster methodOPTICS
According to implementing steps in Section 41 we cal-culate the nearest distance and then draw the FD curve of thedistance descending as shown in Figure 9(a) which indicates3 processes and 5 clusters existing in a simulated data set
As a comparison we run the OPTICS and get thereachability-plot of OPTICS (Figure 9(b))
After running the second step (AP clustering program)the clusters are obtained which is clearly consistent with theexpected results Figure 10 illustrates the clustering result
422 Experiment 2 In Experiment 2 a real seismic data set isused to evaluate our algorithm The seismic data are selectedfrom Earthquake Catalogue in West China (1970ndash1975 119872)and EarthquakeCatalogue inWest China (1976ndash1979119872 ⩾ 1)[15 16] (119872 represents the magnitude on the Richter scale)The space area covers from 100∘ to 107∘ E and from 27∘ to34∘N We selected the119872 ⩾ 2 records during 1975 and 1976
In an ideal clustering experiment the strong earthquakesforeshocks and aftershock can be clustered clearly Theformer can show the locations of strong earthquakes andthe latter can supply help to understand the mechanism ofthe earthquakes Due to the interference of the backgroundearthquakes it is difficult to discover the earthquake clustersFurthermore the earthquake records include only the coordi-nates of the epicenter of each earthquake In this situation thekey work is to discover the locations of the earthquakes from
6 Computational Intelligence and Neuroscience
0 200 400 600 800 1000 1200 1400 1600 18000
20406080
100
Point order
Dist
ance
(a)
0 200 400 600 800 1000 1200 1400 1600 18000
1020304050
Point order
Dist
ance
(b)
Figure 9 The different data density types via FD curve and reachability-plot
Figure 10 The result of Experiment 1
0 1 2 3 4 5 6 7 8 9 100
020406
k
Prob
abili
ty
(a)
0 2 4 6 8 10 12 140
020406
Distance
Prob
abili
ty
(b)
Figure 11 (a) The posterior probability of 119896 (b) the histogram of the nearest distance and the fitted curve
the background earthquakes In this experiment we adopt theOPTICS as the comparison which can provide a reachability-plot and can show the estimation of the thresholds
When our AP clustering algorithm had converged theposterior probabilities of 119896 are obtained as shown inFigure 11(a) It shows that the maximal posterior probabilityexists at k = 2 Also the histogram of the nearest distanceand its fitted probability distribution function are displayedin Figure 11(b)
The experiment result is shown in Figure 12 The seismicdata are grouped into three clusters red green and blueThenwe apply theOPTICS algorithm to obtain a reachability-plot of the seismic data set and an estimated threshold forthe classification which is shown in Figure 13 There are fourgrouped clusters when Eps
1
lowast = 35 times 104 (m) as shownin Figure 14 This result has two different aspects from our
method firstly there appears a brown cluster secondly thered green and blue clusters are held in common by twoalgorithms that contain more earthquakes in OPTICS thanin our algorithm
Further we analyze the results of two experiments indetail It is obvious that the clusters discovered by ouralgorithm are more accurate than those by OPTICS Theformer clusters can indicate the location of forthcomingstrong earthquakes and the following ones
According to Figure 12 three earthquake clusters arediscovered by the algorithm proposed in this paper Thered cluster indicates the foreshocks of Songpan earthquakewhich occurred in Songpan County at 32∘421015840N 104∘061015840 Eon August 16 1976 and its earthquake intensity is 72 onthe Richter scale The blue cluster means the aftershocksof Kangding-Jiulong event (119872 = 62) which occurred on
Computational Intelligence and Neuroscience 7
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CMain quakes
Figure 12 Seismic anomaly detection by our method
0 50 100 150 200 250 3000123456789
10times10
4
Figure 13 The reachability-plot of the regional seismic data
January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes
Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes
5 Conclusion and Future Work
In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CClustered quakes DMain quakes
Figure 14 Seismic anomaly detection by OPTICS
extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS
One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University
References
[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007
[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001
[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 Computational Intelligence and Neuroscience
0 200 400 600 800 1000 1200 1400 1600 18000
20406080
100
Point order
Dist
ance
(a)
0 200 400 600 800 1000 1200 1400 1600 18000
1020304050
Point order
Dist
ance
(b)
Figure 9 The different data density types via FD curve and reachability-plot
Figure 10 The result of Experiment 1
0 1 2 3 4 5 6 7 8 9 100
020406
k
Prob
abili
ty
(a)
0 2 4 6 8 10 12 140
020406
Distance
Prob
abili
ty
(b)
Figure 11 (a) The posterior probability of 119896 (b) the histogram of the nearest distance and the fitted curve
the background earthquakes In this experiment we adopt theOPTICS as the comparison which can provide a reachability-plot and can show the estimation of the thresholds
When our AP clustering algorithm had converged theposterior probabilities of 119896 are obtained as shown inFigure 11(a) It shows that the maximal posterior probabilityexists at k = 2 Also the histogram of the nearest distanceand its fitted probability distribution function are displayedin Figure 11(b)
The experiment result is shown in Figure 12 The seismicdata are grouped into three clusters red green and blueThenwe apply theOPTICS algorithm to obtain a reachability-plot of the seismic data set and an estimated threshold forthe classification which is shown in Figure 13 There are fourgrouped clusters when Eps
1
lowast = 35 times 104 (m) as shownin Figure 14 This result has two different aspects from our
method firstly there appears a brown cluster secondly thered green and blue clusters are held in common by twoalgorithms that contain more earthquakes in OPTICS thanin our algorithm
Further we analyze the results of two experiments indetail It is obvious that the clusters discovered by ouralgorithm are more accurate than those by OPTICS Theformer clusters can indicate the location of forthcomingstrong earthquakes and the following ones
According to Figure 12 three earthquake clusters arediscovered by the algorithm proposed in this paper Thered cluster indicates the foreshocks of Songpan earthquakewhich occurred in Songpan County at 32∘421015840N 104∘061015840 Eon August 16 1976 and its earthquake intensity is 72 onthe Richter scale The blue cluster means the aftershocksof Kangding-Jiulong event (119872 = 62) which occurred on
Computational Intelligence and Neuroscience 7
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CMain quakes
Figure 12 Seismic anomaly detection by our method
0 50 100 150 200 250 3000123456789
10times10
4
Figure 13 The reachability-plot of the regional seismic data
January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes
Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes
5 Conclusion and Future Work
In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CClustered quakes DMain quakes
Figure 14 Seismic anomaly detection by OPTICS
extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS
One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University
References
[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007
[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001
[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 7
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CMain quakes
Figure 12 Seismic anomaly detection by our method
0 50 100 150 200 250 3000123456789
10times10
4
Figure 13 The reachability-plot of the regional seismic data
January 15 1975 (29∘261015840N 101∘481015840E)The green cluster meansthe aftershocks of the Daguan event (119872 = 71) whichhit at 28∘061015840N 104∘001015840E on May 11 1974 Those discoveredearthquakes are also detected in the work of Pei et al with thesame location and size The difference between our methodand Peirsquos is that the numbers of the earthquake clusters areslightly underestimated in Figure 12 The reason is that theborder points are not treated in our method while in Pei etalrsquos they are treated [17] Some clusters are obtained in theborder in the work from Pei et al [18] but we can confirmthat those are background earthquakes
Finally we compare our results with the outputs byOPTICS After having carefully analyzed the seismic recordsit can be found that the brown earthquake cluster is a falsepositive and the others are overestimated by the OPTICSalgorithm the blue and green clusters cover more back-ground earthquakes
5 Conclusion and Future Work
In the same data set clusters and noise with different densitytypes usually coexist Current AP clustering methods cannoteffectively deal with the data preprocessing for separatingmeaningful data groups from noise In particular when thereare several data density types overlapping in a given areathere are more than a few existing grouping methods thatcan identify the numbers of the data density types in anobjective and accurate way In this paper we propose an
100 101 102 103 104 105 106 1072728293031323334
Background quakesClustered quakes AClustered quakes B
Clustered quakes CClustered quakes DMain quakes
Figure 14 Seismic anomaly detection by OPTICS
extended AP clustering algorithm to deal with such complexsituation in which the data set has several overlapping datadensity types The advantage of our algorithm is that it candetermine the number of data density types and clusterthe data set into groups accurately Experiments show thatour semisupervised AP clustering algorithm outperforms thetraditional clustering algorithm OPTICS
One limitation of our method is that the identificationof the density types depends on the users The followingresearch will be focused on finding a more efficient methodto identify splitting value automaticallyThe other aspect is todevelop newAP versions to deal with the spatiotemporal datasets as in the researches by Zhao [19 20] andMeng et al [21]
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This study was supported by the National Natural ScienceFoundation of China (Grant no 61272029) by Bureau ofStatistics of Shandong Province (Grant no KT13013) andpartially by the MOE Key Laboratory for TransportationComplex Systems Theory and Technology School of Trafficand Transportation Beijing Jiaotong University
References
[1] B J Frey and D Dueck ldquoClustering by passing messagesbetween data pointsrdquo Science vol 315 no 5814 pp 972ndash9762007
[2] J W Han M Kamber and A K H Tung ldquoSpatial clusteringmethods in data miningrdquo in Geographic Data Mining andKnowledge Discovery H J Miller and J W Han Eds pp 188ndash217 Taylor amp Francis London UK 2001
[3] S Roy and D K Bhattacharyya ldquoAn approach to find embed-ded clusters using density based techniquesrdquo in DistributedComputing and Internet Technology vol 3816 of Lecture Notesin Computer Science pp 523ndash535 Springer Berlin Germany2005
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 Computational Intelligence and Neuroscience
[4] M D Nanni and D Pedreschi ldquoTime-focused clustering oftrajectories ofmoving objectsrdquo Journal of Intelligent InformationSystems vol 27 no 3 pp 267ndash289 2006
[5] X-Y Liu and H Fu ldquoA fast affinity propagation clustering algo-rithmrdquo Journal of Shandong University (Engineering Science)vol 41 no 4 pp 20ndash28 2011
[6] R-J Gu J-C Wang G Chen and S-L Chen ldquoAffinity propa-gation clustering for large scale datasetrdquo Computer Engineeringvol 36 no 23 pp 22ndash24 2010
[7] K J Wang J Li J Y Zhang et al ldquoSemi-supervised affinitypropagation clusteringrdquo Computer Engineering vol 33 no 23pp 197ndash198 2007
[8] P-S Xiang ldquoNew CBIR system based on the affinity propaga-tion clustering algorithmrdquo Journal of Southwest University forNationalities Natural Science vol 36 no 4 pp 624ndash627 2010
[9] D Dueck B J Frey N Jojic et al ldquoConstructing treatmentportfolios using affinity propagationrdquo inProceedings of the Inter-national Conference on Research in Computational MolecularBiology (RECOMB rsquo08) pp 360ndash371 Springer Singapore April2008
[10] R C Guan Z H Pei X H Shi et al ldquoWeight affinitypropagation and its application to text clusteringrdquo Journal ofComputer Research and Development vol 47 no 10 pp 1733ndash1740 2010
[11] D Dueck and B J Frey ldquoNon-metric affinity propagation forunsupervised image categorizationrdquo in Proceedings of the IEEE11th International Conference onComputer Vision (ICCV rsquo11) pp1ndash8 IEEE Press Rio de Janeiro Brazil October 2007
[12] DM TangQ X Zhu F Yang et al ldquoSolving large scale locationproblem using affinity propagation clusteringrdquo ApplicationResearch of Computers vol 27 no 3 pp 841ndash844 2010
[13] R-Y Zhang H-L Zhao X Lu and M-Y Cao ldquoGrey imagesegmentationmethod based on affinity propagation clusteringrdquoJournal of Naval University of Engineering vol 21 no 3 pp 33ndash37 2009
[14] W-Z Xu and L-H Xu ldquoAdaptive key-frame extraction basedon affinity propagation clusteringrdquo Computer Science no 1 pp268ndash270 2010
[15] H Feng and D Y Huang Earthquake Catalogue in West China(1970ndash1975 M ge 1) Seismological Press Beijing China 1980(Chinese)
[16] H Feng and D Y Huang Earthquake Catalogue inWest China(1976ndash1979 M ge 1) Seismological Press Beijing China 1989(Chinese)
[17] T Pei M Yang J S Zhang C H Zhou J C Luo and QL Li ldquoMulti-scale expression of spatial activity anomalies ofearthquakes and its indicative significance on the space andtime attributes of strong earthquakesrdquoActa Seismologica Sinicavol 25 no 3 pp 292ndash303 2003
[18] T Pei A-X Zhu C H Zhou B L Li and C Z Qin ldquoAnew approach to the nearest-neighbour method to discovercluster features in overlaid spatial point processesrdquo InternationalJournal of Geographical Information Science vol 20 no 2 pp153ndash168 2006
[19] X L Zhao and W X Xu ldquoA new measurement methodto calculate similarity of moving object spatio-temporal tra-jectories by compact representationrdquo International Journal ofComputational Intelligence Systems vol 4 no 6 pp 1140ndash11472011
[20] X L Zhao ldquoProgressive refinement for clustering spatio-temporal semantic trajectoriesrdquo in Proceedings of the Interna-tional Conference on Computer Science and Network Technology(ICCSNT rsquo11) vol 4 pp 2695ndash2699 IEEE 2011
[21] X-L Meng B-M Cui and L-M Jia ldquoLine planning inemergencies for railway networksrdquoKybernetes vol 43 no 1 pp40ndash52 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014