Kernelized fuzzy attribute C-means clustering algorithm

18
Fuzzy Sets and Systems 159 (2008) 2428 – 2445 www.elsevier.com/locate/fss Kernelized fuzzy attribute C-means clustering algorithm Jingwei Liu a , , Meizhi Xu b a LMIB and Department of Mathematics, Beijing University of Aeronautics and Astronautics, Beijing 100083, PR China b Department of Mathematics, Tsinghua University, Beijing 100084, PR China Received 30 October 2006; received in revised form 17 March 2008; accepted 18 March 2008 Available online 26 March 2008 Abstract A novel kernelized fuzzy attribute C-means clustering algorithm is proposed in this paper. Since attribute means clustering algorithm is an extension of fuzzy C-means algorithm with weighting exponent m = 2, and fuzzy attribute C-means clustering is a general type of attribute means clustering with weighting exponent m> 1, we modify the distance in fuzzy attribute C-means clustering algorithm with kernel-induced distance, and obtain kernelized fuzzy attribute C-means clustering algorithm. Kernelized fuzzy attribute C-means clustering algorithm is a natural generalization of kernelized fuzzy C-means algorithm with stable function. Experimental results on standard Iris database and tumor/normal gene chip expression data demonstrate that kernelized fuzzy attribute C-means clustering algorithm with Gaussian radial basis kernel function and Cauchy stable function is more effective and robust than fuzzy C-means, fuzzy attribute C-means clustering and kernelized fuzzy C-means as well. © 2008 Elsevier B.V. All rights reserved. Keywords: Fuzzy clustering; Fuzzy C-means; Attribute means clustering; Kernelized fuzzy C-means 1. Introduction Based on fuzzy theory proposed by Zadeh [17], fuzzy clustering is a partition method that divides data points into groups (clusters) according to the membership grade or degree. Fuzzy C-means (FCM) is one of the most popular unsupervised fuzzy clustering algorithm, which is widely used in pattern recognition, image recognition, gene clas- sification, etc. FCM was first derived from hard C-means algorithm by Ruspini, and then extended by Dunn. Finally, Bezdek extended a general FCM algorithm based on fuzzy weighting exponent m. Hence, Dunn type FCM is a special case of Bezdek type FCM with m = 2 [5,1]. Cheng extended the FCM algorithm by introducing stable function, and presented an attribute means clustering (AMC) algorithm, where FCM with m = 2 is a special case of AMC [2,3]. Furthermore, AMC algorithm is extended with exponential weight m and a Bezdek type AMC is proposed, called fuzzy fuzzy attribute C-means clustering (FAMC) algorithm [10]. FAMC takes AMC and FCM as its special case, respectively. Recently, tremendous works focus on using kernel method [15,11–13,6,18–20,7,4], which first maps the data into high dimension space to gain high discriminant capability, and then calculates the measure of the samples in their original data space with Mercer kernel. This trend of kernalization method can be treated as modifying the distance measure of data samples with kernel function. Kernelized FCM (KFCM) is proposed by substituting the Euclidean distance with kernel function. The previous works show that KFCM performs better than FCM [18–20]. Corresponding author. Tel.: +86 10 82317934. E-mail addresses: [email protected], [email protected] (J. Liu). 0165-0114/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2008.03.018

Transcript of Kernelized fuzzy attribute C-means clustering algorithm

Page 1: Kernelized fuzzy attribute C-means clustering algorithm

Fuzzy Sets and Systems 159 (2008) 2428–2445www.elsevier.com/locate/fss

Kernelized fuzzy attribute C-means clustering algorithm

Jingwei Liua,∗, Meizhi Xub

aLMIB and Department of Mathematics, Beijing University of Aeronautics and Astronautics, Beijing 100083, PR ChinabDepartment of Mathematics, Tsinghua University, Beijing 100084, PR China

Received 30 October 2006; received in revised form 17 March 2008; accepted 18 March 2008Available online 26 March 2008

Abstract

A novel kernelized fuzzy attribute C-means clustering algorithm is proposed in this paper. Since attribute means clusteringalgorithm is an extension of fuzzy C-means algorithm with weighting exponent m = 2, and fuzzy attribute C-means clustering isa general type of attribute means clustering with weighting exponent m > 1, we modify the distance in fuzzy attribute C-meansclustering algorithm with kernel-induced distance, and obtain kernelized fuzzy attribute C-means clustering algorithm. Kernelizedfuzzy attribute C-means clustering algorithm is a natural generalization of kernelized fuzzy C-means algorithm with stable function.Experimental results on standard Iris database and tumor/normal gene chip expression data demonstrate that kernelized fuzzyattribute C-means clustering algorithm with Gaussian radial basis kernel function and Cauchy stable function is more effective androbust than fuzzy C-means, fuzzy attribute C-means clustering and kernelized fuzzy C-means as well.© 2008 Elsevier B.V. All rights reserved.

Keywords: Fuzzy clustering; Fuzzy C-means; Attribute means clustering; Kernelized fuzzy C-means

1. Introduction

Based on fuzzy theory proposed by Zadeh [17], fuzzy clustering is a partition method that divides data points intogroups (clusters) according to the membership grade or degree. Fuzzy C-means (FCM) is one of the most popularunsupervised fuzzy clustering algorithm, which is widely used in pattern recognition, image recognition, gene clas-sification, etc. FCM was first derived from hard C-means algorithm by Ruspini, and then extended by Dunn. Finally,Bezdek extended a general FCM algorithm based on fuzzy weighting exponent m. Hence, Dunn type FCM is a specialcase of Bezdek type FCM with m = 2 [5,1]. Cheng extended the FCM algorithm by introducing stable function, andpresented an attribute means clustering (AMC) algorithm, where FCM with m = 2 is a special case of AMC [2,3].Furthermore, AMC algorithm is extended with exponential weight m and a Bezdek type AMC is proposed, calledfuzzy fuzzy attribute C-means clustering (FAMC) algorithm [10]. FAMC takes AMC and FCM as its special case,respectively.

Recently, tremendous works focus on using kernel method [15,11–13,6,18–20,7,4], which first maps the data intohigh dimension space to gain high discriminant capability, and then calculates the measure of the samples in theiroriginal data space with Mercer kernel. This trend of kernalization method can be treated as modifying the distancemeasure of data samples with kernel function. Kernelized FCM (KFCM) is proposed by substituting the Euclideandistance with kernel function. The previous works show that KFCM performs better than FCM [18–20].

∗ Corresponding author. Tel.: +86 10 82317934.E-mail addresses: [email protected], [email protected] (J. Liu).

0165-0114/$ - see front matter © 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.fss.2008.03.018

Page 2: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2429

Since FAMC is an extension of both FCM and AMC [10], we replace the Euclidean distance in FAMC with kerneldistance and propose kernelized FAMC (KFAMC) algorithm in this paper. To demonstrate the performance of KFAMC,we compare the recognition rate of FCM, FAMC, KFCM, and KFAMC on standard Iris database and tumor/normalgene chip expression data. The experimental results show that FAMC has better performance than FCM, and KFAMChas better recognition performance and robustness than FCM, FAMC, and KFCM algorithm.

The rest of the paper is organized as follows: Section 2 briefly reviews FCM, AMC and FAMC. Section 3 discussesKernelized FCM and Kernelized FAMC; Section 4 gives the updating scheme of FCM, FAMC, KFCM, and KFAMC;Section 5 introduces the fuzzy decision for pattern recognition; Section 6 introduces the experimental databases andreports the experimental results of comparison of classification accuracy and robustness with FCM, FAMC, KFCM,and KFAMC. And the discussion and conclusion are given in the last section.

2. Brief reviews of FCM, AMC, and FAMC

The general fuzzy C-means clustering algorithm was proposed by Bezdek based on fuzziness degree m [5,1]. Byintroducing the stable function, an iterative algorithm, AMC, was proposed by Cheng [2], where Dunn type FCM is aspecial case of AMC. Generalizing AMC algorithm, FAMC algorithm is proposed and it is an extension of both Bezdektype FCM and AMC [10].

2.1. General frame of fuzzy clustering

Suppose X ⊂ Rd is any finite sample set, where X = {x1, x2, . . . , xN }, and each sample is xn = {xn1, xn2, . . . , xnd},(1�n�N). The category of attribute space is F = {C1, C2, . . . , Cc}, where c is the cluster number. For ∀x ∈ X ,let ux(Ck) denote the attribute measure of x, where

∑ck=1 ux(Ck) = 1. Let pk = (pk1, pk2, . . . , pkd) denote the

kth prototype of cluster Ck , where 1�k�c. Let ukn denote the attribute measure of the nth sample belonging tothe kth cluster, that is ukn = uxn(pk), U = (ukn), p = (p1, p2, . . . , pk). The task of fuzzy cluster analysis is tocalculate the attribute measure ukn, and decide the cluster which xn belongs to according to the maximum cluster indexarg max1�k �c ukn.

2.2. Brief review of FCM

Bezdek type FCM is an inner product induced distance based least-squared error criterion non-linear optimizationalgorithm with constrains,⎧⎪⎪⎨

⎪⎪⎩Jm(U, p) =

c∑k=1

N∑n=1

umkn‖xn − pk‖2

A

s.t. U ∈ Mf c ={U ∈ Rc×N | ukn ∈ [0, 1], ∀n, k;

c∑k=1

ukn = 1, ∀n; 0 <N∑

n=1ukn < N, ∀k

},

(1)

where ukn is the measure of the nth sample belonging to the kth cluster. m�1 is the weighting exponent, also calledfuzziness index or smoothing parameter. The distance between xn and the prototype of kth cluster pk is as follows:

‖xn − pk‖2A = (xn − pk)

TA(xn − pk), (2)

The above formula is also called as Mahalanobis distance, where A is a positive matrix. When A is unit matrix,‖xn − pk‖2

A is Euclidean distance, we denote it ‖xn − pk‖2. Conveniently, we adopt Euclidean distance in the rest ofthe paper.

The parameters of FCM are estimated by updating min Jm(U, P ) step by step according to the formulas below:⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

pk =∑N

n=1 (ukn)mxn∑N

n=1 (ukn)m,

ukn = (1/‖xn − pk‖2)1/(m−1)∑ci=1 (1/‖xn − pi‖2)1/(m−1)

= (‖xn − pk‖)−2/(m−1)∑ci=1 (‖xn − pi‖)−2/(m−1)

n = 1, 2, . . . , N, k = 1, 2, . . . , c.

(3)

when m = 2, FCM is the Dunn type FCM.

Page 3: Kernelized fuzzy attribute C-means clustering algorithm

2430 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

2.3. Brief review of AMC

AMC is an iterative algorithm by introducing the stable function [2]. Suppose �(t) is a positive differential functionin [0, ∞). Let w(t) = �′(t)/2t , if w(t) is a positive non-increasable function, �(t) is called stable function, w(t) iscalled weight function. Obviously, �(t) can be adopted as

�(t) =∫ t

02sw(s) ds. (4)

Stable function is used in minimum generalization lp norm filtering problem, and lp norm’s extension exponent normand logarithm norm in [3]. Since stable function describes the relationship of objective function �(t) and its weightfunction w(t), it was introduced in [2] to propose AMC—a fuzzy clustering algorithm.

The criterion of AMC algorithm is to minimize the objective function

P(U, p) =c∑

k=1

N∑n=1

�(‖ukn(xn − pk)‖). (5)

And this minimizing procedure is converted to minimize the iterative weighted least-squared objective function [2],

Q(i)(U, p) =c∑

k=1

N∑n=1

w(‖u(i)kn(xn − p

(i)k )‖)‖ukn(xn − pk)‖2, (6)

where ‖ · ‖ denotes the Euclidean distance. The updating procedure based on two criteria min Q(i)(U(i), p) andmin Q(i)(U, p(i+1)).

min Q(i)(U(i), p), we obtain

p(i+1)k =

∑Nn=1 w(‖u(i)

kn(xn − p(i)k )‖)(u(i)

kn)2xn∑Nn=1 w(‖u(i)

kn(xn − p(i)k )‖)(u(i)

kn)2. (7)

min Q(i)(U, p(i+1)), we obtain

u(i+1)kn = w(‖u(i)

kn(xn − p(i)k )‖)(‖xn − p

(i+1)k ‖2)−1∑c

k=1 w(‖u(i)kn(xn − p

(i)k )‖)(‖xn − p

(i+1)k ‖2)−1

. (8)

In Cheng [2], four stable functions are recommended,

(1) squared stable function:

�(t) = t2, w(t) = 1. (9)

(2) Cauchy stable function:

�(t) = ln(� + t2), w(t) = 1

� + t2where � > 0. (10)

(3) general lp stable function:

if p = 0, �(t) =⎧⎨⎩

�−2t2, 0� t ��,

1 + 2 ln

(t

), � < t,

w(t) ={

�−2, 0� t ��,

t−2, � < t,(11)

Page 4: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2431

if 1 < p < 2, �(t) =⎧⎨⎩

�p−2t2, 0� t ��,2

�tp − 2 − p

p�p, � < t,

w(t) ={

�p−2, 0� t ��,

tp−2, � < t,(12)

where � > 0.(4) Exponential stable function:

�(t) = 1 − e−�t2, w(t) = � e−�t2

, � > 0. (13)

When stable function is chosen as squared stable function, AMC with m = 2 is FCM algorithm.

2.4. Brief review of FAMC

In [10], AMC and FCM were extended to FAMC, which is also an iterative algorithm to minimize the followingobjective function:

P(U, p) =c∑

k=1

N∑n=1

�(um/2kn ‖xn − pk‖). (14)

Also, minimizing formula (14) is converted to iterate the following weighted least-squared objectivefunction

Q(i)(U, p) =c∑

k=1

N∑n=1

w((u(i)kn)m/2‖xn − p

(i)k ‖)((ukn)

m ‖xn − pk‖2), (15)

where m > 1. And the iterative procedure is described as follows.min Q(i)(U(i), p), we obtain

p(i+1)k =

∑Nn=1 w((u

(i)kn)m/2‖xn − p

(i)k ‖)(u(i)

kn)mxn∑Nn=1 w((u

(i)kn)m/2‖xn − p

(i)k ‖)(u(i)

kn)m. (16)

min Q(i)(U, p(i+1)), we obtain

u(i+1)kn = w((u

(i)kn)m/2‖xn − p

(i)k ‖)(‖xn − p

(i+1)k ‖2)−1/(m−1)∑c

k=1 w((u(i)kn)m/2‖xn − p

(i)k ‖)(‖xn − p

(i+1)k ‖2)−1/(m−1)

, (17)

where m > 1. This iterative clustering algorithm is called FAMC. Obviously, AMC is obtained by setting m = 2 inFAMC. When squared stable function is chosen, FAMC is a general Bezdek type FCM. Hence, FAMC is an extensionof both AMC and FCM.

3. Kernelized FCM and kernelized FAMC

Recently, kernel analysis attracts much attention in statistical learning, pattern recognition, machine learning, etc. Themain idea of kernelization FCM is to substitute the distance in FCM with inner product. To gain a high dimensionalitydiscriminant linear hyperplane, the training data are first mapped into high dimensionality space H,

� : X → Hx → x = �(x). (18)

The function mapping the space X × X to the space R is called a kernel, that is

K : X × X → R. (19)

Page 5: Kernelized fuzzy attribute C-means clustering algorithm

2432 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

Thus, K(x, x′) = (x, x′) = (�(x), �(x′)), and four widely used basic kernel functions are as follows:

• linear: K(xi, xj ) = xTi xj ,

• polynomial: K(xi, xj ) = (�xTi xj + r)d , � > 0,

• sigmoid: K(xi, xj ) = tanh(�xTi xj + r)d , � > 0,

• radial basis function (RBF): K(xi, xj ) = exp(−�‖xi − xj‖2), � > 0,

where �, r , and d are kernel parameters. Since,

‖�(xn) − �(pk)‖2 = (�(xn) − �(pk))T(�(xn) − �(pk))

= �(xn)T�(xn) − �(xn)

T�(pk) − �(pk)T�(xn) + �(pk)

T�(pk)

= K(xn, xn) + K(pk, pk) − 2K(xn, pk) (20)

when the kernel function is chosen as RBF, K(xn, xn) = 1, K(pk, pk) = 1, then

‖�(xn) − �(pk)‖2 = 2(1 − K(xn, pk)). (21)

The KFCM algorithm modifies the objective function of FCM to

Jm(U, P ) =c∑

k=1

N∑n=1

umkn(1 − K(xn, pk)). (22)

The parameters of KFCM are estimated according to the following formulas:⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

ukn = (1 − K(xn, pk))−1/(m−1)∑c

k=1 (1 − K(xn, pk))−1/(m−1)

pk =∑N

n=1 umknK(xn, pk)xn∑N

n=1 umknK(xn, pk)

, n = 1, 2, . . . , N, k = 1, 2, . . . , c.

(23)

We substitute the Euclidean distance of formulas (5) and (14) with formula (21), and obtain the objective functions,

P(U, p) =c∑

k=1

N∑n=1

�(‖ukn(�(xn) − �(pk))‖), (24)

P(U, p) =c∑

k=1

N∑n=1

�(‖um/2kn (�(xn) − �(pk))‖), m > 1. (25)

We call them kernelized AMC (KAMC) and kernelized FAMC (KFAMC), respectively. KFAMC with m = 2 is KAMC.Since FCM is a special case of FAMC with square stable function, KFCM is a special case of KFAMC with squaredstable function.

Because there are too many combination possibilities of stable function, kernel function and parameter m, we onlydiscuss the iterative algorithm of KFAMC with RBF kernel in the rest of this paper.

Substituting the measure distance of formula (25) with formula (21), we obtain

Q(i)(U, p) =c∑

k=1

N∑n=1

w(√

2(u(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(2um

kn(1 − K(xn, pk))). (26)

Equivalently, we denote the following formula as the objective function of KFAMC:

Q(i)(U, p) =c∑

k=1

N∑n=1

w((u(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(um

kn(1 − K(xn, pk))). (27)

Page 6: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2433

Theorem 1 (KFAMC). For m > 1 and c�2, the estimation of parameters in KFAMC with RBF kernel from the iterativeobjective function of formula (27) are as follows.

min Q(i)(U(i), p), the updating procedure of parameter p is

p(i+1)k =

∑Nn=1 w((u

(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(u

(i)kn)mK(xn, p

(i)k )xn∑N

n=1 w((u(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(u

(i)kn)mK(xn, p

(i)k )

. (28)

min Q(i)(U, p(i+1)), the updating procedure of parameter U is

u(i+1)kn = w((u

(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(1 − K(xn, p

(i+1)k ))−1/(m−1)∑c

k=1 w((u(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(1 − K(xn, p

(i+1)k ))−1/(m−1)

. (29)

Proof. To minimize Q(i)(U(i), p), let

�pk

Q(i)(U(i), p) = 0 (30)

we can obtain formula (28) as proof of FCM in [1]. Since,

min Q(i)(U, p(i+1)) = minU

Q(i)(U, p(i+1))

=N∑

n=1

{minU

c∑k=1

w((u(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(um

kn(1 − K(xn, p(i+1)k )))

}. (31)

For the nth column of U = (u1, u2, . . . , uN), let the Lagrangian be

Fn(�, un) =c∑

k=1

w((u(i)kn)m/2(1 − K(xn, p

(i)k ))1/2)(um

kn(1 − K(xn, p(i+1)k ))) − �

(c∑

k=1

ukn − 1

). (32)

Calculating the first derivatives of Fn(�, un) with respect to un and �, and zeroing them, we obtain⎧⎪⎨⎪⎩

�un

Fn(�, un) = 0,

��Fn(�, un) = 0.

(33)

The rest of the proof is similar to the proof of FCM [1], we can easily obtain the solution of U = (u1, u2, . . . , uN),and denote it in updating style of formula (29) . �

In fact, the proof of Theorem 1 and the proof of formulas (16), (17) in [10] are similar.

4. Updating algorithms of FCM, FAMC, KFCM, and KFAMC

Based on Theorem 1, the updating procedure of KFAMC can be summarized in the following iterative scheme,where superscript (i) represents iterate step, tmax is the maximum iteration times, and W(i) denotes the weight matrixin FAMC and KFAMC, respectively. In FAMC, W(i) is a c × N matrix with element w

(i)kn = w(‖u(i)

kn(xn − p(i)k )‖).

While in KFAMC, W(i) is a c × N matrix with element w(i)kn = w((u

(i)kn)m/2(1 − K(xn, p

(i)k ))1/2).

FCM updating algorithm.(1) Fix c, m, �, and tmax, where 2�c < n, m ∈ {1.1, 1.2, . . . , 10}, and � = 1.0e − 5, tmax = 100 (� and tmax are set

the default values in Matlab7.5 for FCM). Initialize U(0).(2) At step i = 1, calculate the fuzzy cluster centers p(i) and U(i) with formula (3),(3) For step i = i + 1, update p(i+1), U(i+1).(4) If |J (i)(U, p) − J (i+1)(U, p)| < � with formula (1) or i > tmax, stop; Else, go to step 3.(5) End.

Page 7: Kernelized fuzzy attribute C-means clustering algorithm

2434 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

2 3 4 5 6 7 8 9 1085

90

95

100

m

Rec

ogni

tion

Rat

e/%

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 1. Iris recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 3.

2 3 4 5 6 7 8 9 1085

90

95

100

m

Rec

ogni

tion

Rat

e/%

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 2. Iris recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 4.

FAMC updating algorithm.(1) Fix c, m, �, and tmax, where 2�c < n, m ∈ {1.1, 1.2, . . . , 10}, and � = 1.0e − 5, tmax = 100. Initialize U(0) and

W(0).(2) At step i = 1, calculate the fuzzy cluster centers p(i) with formula (16), and U(i) with formula (17), and update

W(i) with Cauchy stable function with formula (10).(3) For step i = i + 1, update p(i+1), U(i+1), and W(i+1).(4) If |Q(i)(U, p) − Q(i+1)(U, p)| < � with formula (15) or i > tmax, stop; Else, go to step 3.(5) End.

KFCM updating algorithm.(1) Fix c, m, �, and tmax, where 2�c < n, m ∈ {1.1, 1.2, . . . , 10}, and � = 1.0e − 5, tmax = 100. Initialize U(0).(2) At step i = 1, calculate the fuzzy cluster centers p(i) and U(i) with formula (23).(3) For step i = i + 1, update p(i+1), U(i+1).(4) If |J (i)(U, p) − J (i+1)(U, p)| < � with formula (22) or i > tmax, stop; Else, go to step 3.(5) End.

Page 8: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2435

2 3 4 5 6 7 8 9 1088

90

92

94

96

98

100

m

Rec

ogni

tion

Rat

e/%

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 3. Iris recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 5.

2 3 4 5 6 7 8 9 1090

91

92

93

94

95

96

97

98

99

100

m

Rec

ogni

tion

Rat

e/%

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 4. Iris recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 6.

Table 1The average classification accuracy (%) of FCM, FAMC, KFCM, and KFAMC with cluster c ∈ {3, 4, 5, 6} over m ∈ {1.1, 1.2, . . . , 10} on Iris dataset

c FCM FAMC KFCM KFAMC

3 91.4074 91.9481 93.7259 93.47414 91.0963 91.0741 93.8000 93.68155 93.7407 94.1630 96.9111 97.08156 95.3111 95.3111 97.8000 97.8074

KFAMC updating algorithm.(1) Fix c, m, �, and tmax, where 2�c < n, m ∈ {1.1, 1.2, . . . , 10}, and � = 1.0e − 5, tmax = 100. Initialize U(0) and

W(0).(2) At step i = 1, calculate the fuzzy cluster centers p(i) with formulas (28) (29), and U(i) with formula (17), and

update W(i) with Cauchy stable function with formula (10).

Page 9: Kernelized fuzzy attribute C-means clustering algorithm

2436 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

m

Rec

ogni

tion

Rat

e/%

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 5. Normal/tumor gene recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 2.

2 3 4 5 6 7 8 9 1040

50

60

70

80

90

100

m

Rec

ogni

tion

Rat

e/%

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 6. Normal/tumor gene recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 3.

(3) For step i = i + 1, update p(i+1), U(i+1), and W(i+1).(4) If |Q(i)(U, p) − Q(i+1)(U, p)| < � with formula (27) or i > tmax, stop; Else, go to step 3.(5) End.To compare the effectiveness and robustness of FCM, FAMC, KFCM, and KFAMC, we initialize the same U(0) forFCM, FAMC, KFCM, and KFAMC, respectively, under same c and m in each evaluation.

In our experiments, the parameter � in RBF kernel is adopted as the maximum standard variance of each dimensionof the input data. That is

� = 1

max1�k �c

√1/N

∑Nn=1 (xkn − xk)2

, (34)

Page 10: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2437

2 3 4 5 6 7 8 9 1040

50

60

70

80

90

100

Rec

ogni

tion

Rat

e/%

m

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 7. Normal/tumor gene recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 4.

2 3 4 5 6 7 8 9 1040

50

60

70

80

90

100

m

Rec

ogni

tion

Rat

e/%

−∗KFAMC−+KFCM−FAMC−−FCM

Fig. 8. Normal/tumor gene recognition rates of FCM, FAMC, KFCM, and KFAMC with cluster number c = 5.

Table 2The average classification accuracy (%) of FCM, FAMC, KFCM, and KFAMC with cluster c ∈ {2, 3, 4, 5} over m ∈ {1.1, 1.2, . . . , 10} on geneexpression data set

c FCM FAMC KFCM KFAMC

2 50.6790 100 100 1003 52.6543 100 100 1004 54.9383 100 100 1005 56.1111 100 100 100

where

xk = 1

N

N∑n=1

xkn.

Page 11: Kernelized fuzzy attribute C-means clustering algorithm

2438 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

Fig. 9. Two-dimension clustering and classifying visualization of FCM, FAMC, KFCM, and KFAMC with m = 2.

We adopt the stop criterion |Q(i)(U, p) − Q(i+1)(U, p)| < �, which is equivalent to the criterion |U(i) − U(i+1)| < �in [1,18]. 1 This criterion is adopted in FCM package of MatLab7.5.

5. Fuzzy decision for pattern classification

To apply FCM, FAMC, KFCM, and KFAMC to pattern classification task, we calculate the membership degreematrix U, and calculate the probability of each cluster belonging to a given class in training sets. In test stage, we firstdistinguish the cluster that the test data point belonging to, then recognize which class it should belong to according tothe probability that one cluster belongs to one class. At last, we give the statistical recognition rate.

The outline of pattern classification is summarized as follows. Suppose that X = {x1, x2, . . . , xN } in Section 2belongs to s classes � = {A1, . . . , As} in pattern space. For a fixed algorithm of FCM, FAMC, KFCM, and KFAMC,we can obtain matrix Uc×N , then X belongs to c clusters at the same time. Generally, we set c�s.

1 In fact the two stop criteria are equivalent. Since the objective function is weighted least-squared, the minimum solution of Q(U, p) willbe obtained by differential. If Q(i)(U, p) is convergent, then |Q(i)(U, p) − Q(i+1)(U, p)| < � will deduce that U(i) is convergent, which means|U(i) −U(i+1)| < �′. Reversely, if |U(i) −U(i+1)| < �′, we can obtain that p(i) is also convergent, then Q(i)(U, p) will be convergent, equivalently,|Q(i)(U, p) − Q(i+1)(U, p)| < �.

Page 12: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2439

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

Fig. 10. Two-dimension clustering and classifying visualization of FCM, FAMC, KFCM, and KFAMC with m = 1.6.

For a given sample x ∈ X , we can calculate the attribute measures ux(C1), . . . , ux(Cc) and the conditional proba-bilities {P(Ak|Cj ), 1�k�s, 1�j �c}, then the decision function is defined as follows:

if m = arg max1�k �c

ux(Ck) and L = arg max1� j � s

P (Ak|Cj ) then x ∈ AL. (35)

We apply the formula (35) as fuzzy decision criterion to pattern classification.The detail procedure of calculating ux(Ck) and P(Ak|Cj ) is described as follows: As we announced the calculation

of ux(Ck) in Section 2.1, ukn = uxn(Ck), which means, for a given sample x, ux(Ck) will be calculated by iteration ofFCM, FAMC, KFCM, and KFAMC. After the convergence of algorithm, the cluster of Ck is denoted by pk . We cancalculate each sample’s membership to determine which cluster it belongs to. Simultaneously, each sample has classindex, for a fixed cluster, we can calculate the sample number of every class belonging to this cluster. Then P(Ak|Cj )

is calculated by the frequency.

Page 13: Kernelized fuzzy attribute C-means clustering algorithm

2440 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

4 4.5 5 5.5 6 6.5 71.5

7.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 71.5

7.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

Fig. 11. Two-dimension clustering and classifying visualization of FCM, FAMC, KFCM, and KFAMC with m = 2.8.

6. Experimental results

6.1. Databases

We used two databases in our experiments. The first data set is standard Iris data, which can be downloaded fromthe UCI repository of machine learning databases www.ics.uci.edu/mlearn/MLrepository. There are 150 data samplesfrom three classes (setosa, versicolor, and virginica) with four measurements (Sepal length, Sepal width, Petal length,Petal width). We use all of the 150 samples to train parameters of FCM, FAMC, KFCM, and KFAMC, and also usethese 150 samples to test the recognition rate.

The second data set is a normal/tumor gene expression data downloaded from http://www.molbio.princeton.edu/colondata. The training set includes 44 gene expression data profiles, where 22 gene data are normalgene profiles data, and 22 gene data are tumor gene profiles data. The testing set is composed of 18 tumor gene profiledata. Each gene expression data is a 2000 dimensional vector.

Both of the above data sets were investigated in our previous work [8,9]. And the gene data set is also evaluated in[14], which has the same data structure of training and testing set as in this paper and [9].

All the experiments are performed on MatLab7.5 platform. The FCM algorithm is adopted the standard programpacked in MatLab7.5. And the kernel function is chosen as RBF function.

Page 14: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2441

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

Fig. 12. Clustering and classifying results for FCM, FAMC, KFCM, and KFAMC with added outlier (100, 3) and m = 2.

6.2. Performance comparison on Iris

To compare the recognition performance of FCM, FAMC, KFCM, and KFAMC on Iris data. The cluster numberc ranges from 3 to 6. The exponential weight m ranges from 1.1 to 10 by step of 0.1. For each parameter of (c, m),the fuzzy recognition procedure is performed 1000 times, the best recognition result is recorded as the performance ofeach model, respectively. The experimental results are listed from Figs. 1–4.

Comparing all the performances on c × M = {3, 4, 5, 6} × {1.1, 1.2, . . . , 10}, we can conclude that FAMC is moreeffective than FCM, and both of KFCM and KFAMC have better performance than FCM and FAMC.

To give global classification performance on parameter set of c and m, for any cluster number c, the averageclassification accuracy over m ∈ {1.1, 1.2, . . . , 10} is listed in Table 1.

From Table 1, the conclusion can be achieved that the better the average classification accuracy, the bigger thenumber of cluster for all the FCM, FAMC, KFCM, and KFAMC algorithms, and KFAMC algorithm achieves the bestperformance.

6.3. Performance comparison on gene normal/tumor expression data

To compare the recognition performance of FCM, FAMC, KFCM, and KFAMC on normal/tumor gene expressiondata. The cluster number c ranges from 2 to 5. The exponential weight m ranges from 1.1 to 10 by step of 0.1. For each

Page 15: Kernelized fuzzy attribute C-means clustering algorithm

2442 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

Fig. 13. Clustering and classifying results for FCM, FAMC, KFCM, and KFAMC with added outlier (100, 3) and m = 1.6.

parameter of (c, m), the fuzzy recognition procedure is repeated 1000 times, the best recognition result is recorded asthe performance of each model respectively. The experimental results are shown from Figs. 5–8.

From Figs. 5–8, we can conclude that FAMC, KFCM, and KFAMC have better performance than FCM in m ∈{1.1, 1.2, . . . , 10}, and the performances of FAMC, KFCM, and KFAMC are similar.

For any cluster number c, the average classification accuracy over m ∈ {1.1, 1.2, . . . , 10} is also calculated and listedin Table 2.

From Table 2, the conclusion can be achieved that, for FCM, the better the average classification accuracy, the biggerthe number of cluster. All of FAMC, KFCM, and KFAMC algorithms achieve the 100% classification accuracy.

From Figs. 1–8, we can conclude that, on the whole, KFAMC has better performance than FCM, FAMC, and KFCM.

6.4. Two-dimension visualization of FCM, FAMC, KFCM, KFAMC on attribute (Sepal length, Sepal width) of Iris

To demonstrate the performance of FCM, FAMC, KFCM, and KFAMC, we select two classes, setosa and versicolor,with attributes of Sepal length and Sepal width from Iris data set, which is a data set of 100 samples with two classesand two dimensions, and set the exponent index m = 2 in all the fuzzy clustering methods of FCM, FAMC, KFCM,and KFAMC. Since all the four clustering algorithms are unsupervised, we decide the clustering results based on

Page 16: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2443

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

4 4.5 5 5.5 6 6.5 7 7.51.5

2

2.5

3

3.5

4

4.5

Fig. 14. Clustering and classifying results for FCM, FAMC, KFCM, and KFAMC with added outlier (100, 3) and m = 2.8.

{arg max1�k �c ukn} in each clustering algorithm, respectively. In this section, we simply set � = 1.0, and all the otherparameters are set similarly to KFAMC algorithm in Section 3, the results are shown in Fig. 9, where “∗” and “+”denote samples of setosa and versicolor classes, respectively, if the sample is classified wrongly, it is denoted by “o”.And the two cluster centers are shown in solid dots.

From Fig. 9, the clustering capability of FCM, FAMC, KFCM, and KFAMC are visualized, KFAMC has the betterclassification capability than the other three fuzzy clustering algorithms.

In addition, FCM, FAMC, KFCM, and KFAMC clustering results with m = 1.6 and m = 2.8 are shown in Figs. 10and 11, respectively.

The visualization of clustering and classifying with FCM, FAMC, KFCM, and KFAMC with m = 1.6, 2, 2.8demonstrates that KFAMC is the best effective algorithm compared to FCM, FAMC, and KFCM.

6.5. Robustness comparison of FCM, FAMC, KFCM, KFAMC on attribute (Sepal length, Sepal width) of Iris

To demonstrate the robustness performance of FCM, FAMC, KFCM, and KFAMC, we also select the 100 samples oftwo classes, setosa, and versicolor, with attributes of Sepal length and Sepal width from Iris data set. In addition, we addan outlier with coordinate (100, 3) (see Ref. [16]). We also set the exponent index m = 1.6, 2, 2.8, respectively, on all

Page 17: Kernelized fuzzy attribute C-means clustering algorithm

2444 J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445

the fuzzy clustering methods of FCM, FAMC, KFCM, and KFAMC, and cluster samples based on arg max1�k �c ukn

in each clustering algorithm, respectively. Let � = 1.0, and all the other parameters be set similarly to the fourthsubsection. The results are shown in Figs. 12–14, where “∗” and “+” denote samples of setosa and versicolor classes,respectively, if the sample is classified wrongly, it is denoted by “o”. The outlier (100, 3) is denoted as an arrow. Andthe cluster center is shown in solid dot.

In Fig. 12(a), there are still two cluster centers trained by FCM algorithm. It seems that two clusters are coincident;however, the other cluster is (99.9999, 3.0) and not shown in the figure, it is badly affected by the outlier (100, 3). As aresult, the samples in one class are wrongly classified thoroughly. From Fig. 12 (b)–(d), we can conclude that the clustercenters of FAMC, KFCM, and KFAMC are less affected by the outlier. KFAMC has the best antinoise capability andclassification accuracy compared to FCM, FAMC, and KFCM.

Also, FCM algorithm with m = 1.6, 2.8 is affected by outlier as in Figs. 13(a) and 14(a). All the robustnessexperimental results show that KFAMC is more robust than FCM, FAMC, and KFCM.

7. Conclusion and discussion

In this paper, we propose a kernelized fuzzy clustering method—KFAMC. It extends the KFCM algorithm. Experi-ment results show that KFAMC is more effective and robust than FCM, FAMC, and KFCM on the whole. Since, KFAMCwith squared stable function is KFCM, the comparison of KFAMC and KFCM is in fact a comparison between twostable functions performance within KFAMC model frame. Theoretically, our experiments are performed by KFAMCwith Cauchy stable function and KFAMC with squared stable function (KFCM). Hence, we extend KFCM algorithmand gain a large categories of kernelized fuzzy algorithms. The future work will focus on investigating cluster numberselection, exponential weight optimization and spatial constraining KFAMC (see Ref. [19]) and applying KFAMC tomore pattern classification tasks.

Acknowledgments

Authors are grateful to the editor and anonymous referees for their careful revision, valuable suggestion, and com-ments which improve the presentation of this paper. This project was partially supported by China Postdoctoral ScienceFoundation (2003033145).

References

[1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981.[2] Q.S. Cheng, Attribute means clustering, Systems Engineering Theory Practice 9 (1998) 124–126.[3] Q.S. Cheng, Mathematical Principle of Digital Signal Processing, second ed., Oil Industry Press, Beijing, 1993 (in Chinese).[4] J.H. Chiang, P.Y. Hao, A new kernel-based fuzzy clustering approach: support vector clustering with cell growing, IEEE Trans. Fuzzy Systems

11 (2003) 518–527.[5] J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybernet. 3 (1974) 32–57.[6] M. Girolami, Mercer Kernel-based clustering in feature space, IEEE Trans. Neural Networks 13 (2002) 780–784.[7] D.W. Kim, K.Y. Lee, D. Lee, K.H. Lee, A kernel-based subtractive clustering method, Pattern Recognition Lett. 26 (2005) 879–891.[8] J.W. Liu, Statistical learning based on DTW similarity and its application in pattern recognition, Ph.D. Dissertation of Peking University, July,

2002.[9] J.W. Liu, Q.S. Cheng, Dynamic programming based gene chip recognition, Acta Sci. Natur. Univ. Pekinensis 38 (2002) 611–615.

[10] J.W. Liu, M.Z. Xu, Bezdek type fuzzy attribute C-means clustering algorithm, J. Beijing Univ. Aeronautics and Astronautics 33 (2007)1121–1126 (in Chinese).

[11] V. Roth, V. Steinhage, Nonlinear discriminant analysis using kernel functions, in: S.A. Solla, T.K. Leen, K.-R. Muller (Eds.), Advances inNeural Information Processing Systems, MIT Press, Cambridge, 1999, pp. 568–574.

[12] B. Scholkopf, S. Mika, C.J.C. Burges, P. Knirsch, K.R. Muller, G. Ratsch, A.J. Smola, Input space versus feature space in kernel-based methods,IEEE Trans. Neural Networks 10 (1999) 1000–1017.

[13] H.B. Shen, S.T. Wang, X.J.Wu. Fuzzy kernel clustering with outliers, J. Software 15 (2004) 1021–1029 (in Chinese).[14] X.C. Sun, R.Y. He, J.F. Feng, A novel classification method—AMC–ASVM, Acta Sci. Natur. Univ. Pekinensis 43 (2007) 82–84.[15] V.N. Vapnik, The Nature of Statistical Learning Theory, second ed., Wiley, New York, 1998.[16] K.L. Wu, M.S. Yang, Alternative C-means clustering algorithms, Pattern recognition 35 (2002) 2267–2278.

Page 18: Kernelized fuzzy attribute C-means clustering algorithm

J. Liu, M. Xu / Fuzzy Sets and Systems 159 (2008) 2428–2445 2445

[17] L.A. Zadeh, Fuzzy sets, Inform. and Control 8 (1965) 338–353.[18] D.Q. Zhang, S.C. Chen, Clustering incomplete data using kernel-based fuzzy C-means algorithm, Neural Process. Lett. 18 (2003) 155–162.[19] D.Q. Zhang, S.C. Chen, A novel kernelized fuzzy C-means algorithm with application in medical image segmentation, Artificial Intelligence

Med. 32 (2004) 37–50.[20] L. Zhang, W.D. Zhou, L.C. Jiao, Kernel clustering algorithm, Chinese J. Comput. 25 (2002) 587–590 (in Chinese).