Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
-
Upload
melissa-hobbs -
Category
Documents
-
view
225 -
download
0
Transcript of Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
1/15
178 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007
Face Recognition Using Total Margin-BasedAdaptive Fuzzy Support Vector Machines
Yi-Hung Liu, Member, IEEE, and Yen-Ting Chen
AbstractThis paper presents a new classifier called totalmargin-based adaptive fuzzy support vector machines (TAF-SVM)that deals with several problems that may occur in support vectormachines (SVMs) when applied to the face recognition. The pro-posed TAF-SVM not only solves the overfitting problem resultedfrom the outlier with the approach of fuzzification of the penalty,but also corrects the skew of the optimal separating hyperplanedue to the very imbalanced data sets by using different costalgorithm. In addition, by introducing the total margin algorithmto replace the conventional soft margin algorithm, a lower gen-eralization error bound can be obtained. Those three functionsare embodied into the traditional SVM so that the TAF-SVM isproposed and reformulated in both linear and nonlinear cases.
By using two databases, the Chung Yuan Christian University(CYCU) multiview and the facial recognition technology (FERET)face databases, and using the kernel Fishers discriminant anal-ysis (KFDA) algorithm to extract discriminating face features,experimental results show that the proposed TAF-SVM is superiorto SVM in terms of the face-recognition accuracy. The resultsalso indicate that the proposed TAF-SVM can achieve smallererror variances than SVM over a number of tests such that betterrecognition stability can be obtained.
Index TermsFace recognition, kernel Fishers discriminantanalysis (KFDA), support vector machines (SVMs).
I. INTRODUCTION
MANY computer vision-based systems have become more
and more important and attractive in recent years, such as
the surveillance, automatic access control, and the humanrobot
interaction. Face recognition plays a critical role in those appli-
cations. Due to the complicated pattern distribution from large
variations in facial expressions, facial details, illumination con-
ditions, and viewpoints, the face-recognition task has been con-
sidered as one of the most difficult pattern-recognition research
fields. Recently, various approaches have been proposed, e.g.,
[3],[5],[12],[15], [16], [22],[23], and[25][32]. From these
systems, we can conclude that how to extract discriminating fea-tures from raw face images and how to accurately classify dif-
ferent people based on these input features are the two keys to
the development of reliable and high-accuracy face-recognition
systems. This paper aims to propose a new classifier called total
margin-based adaptive fuzzy support vector machines (TAF-
SVM), which can enhance the performance of support vector
Manuscript received July 1, 2005; revised March 1, 2006. This work wassupported by the National Science Council of Taiwan, R.O.C., under Grant93-2212-E-033-011.
The authors are withthe Department of Mechanical Engineering, Chung YuanChristian University, Chung-Li 32023, Taiwan, R.O.C. (e-mail: [email protected]).
Digital Object Identifier 10.1109/TNN.2006.883013
machines (SVM) for face recognition. In addition to classifier
design, selecting a good feature extractor is also necessary.
A. Feature Selection
Principal component analysis (PCA)[12]and Fishers linear
discriminant analysis (FLDA) are widely used linear subspace
analysis methods in facial feature extraction. Compared with
PCA, FLDA owns more abilities to extract discriminating fea-
tures since its objective is to maximize the between-class and
minimize the within-class scatters. FLDA has been successfully
applied to face recognition in[32]and shown to be superior toPCA. Due to the linear nature, the capabilities of linear subspace
analysis methods are still limited. Motivated by the success of
the use of kerneltrick in the SVMs [8], [13], Schlkopfetal. [24]
proposed the kernel PCA (KPCA) by combining the PCA with
the kernel trick. Since the kernel trick is capable of representing
nonlinear relations of input data, KPCA is better than PCA in
terms of representation and reconstruction. This has been also
evidenced by Kims work[25]in which KPCA combined with
linear SVM classifier was applied to face recognition.
Another nonlinear subspace analysis method called general-
ized discriminant analysis (GDA) or kernel Fishers discrimi-
nant analysis (KFDA) was proposed by Baudatet al.[9]. KFDA
first nonlinearly maps input data into higher dimensional featurespacein whichFLDA is performed.Recently, several works have
shown that KFDA was much effective than KPCA in face recog-
nition [3], [22], [23]. This is due to the fact that KFDA keeps the
nature of FLDA, which is based on the separability maximiza-
tion criterion while the unsupervised learning-based KPCA is
still only designed for the pattern representation/reconstruction.
Therefore, this paper adopts the KFDA as the feature extractor
such that the goals of the extraction of discriminating features
and the reduction of input dimensionality can be reached.
B. Classifier Design
Although KFDA has been proven its superiority to discrimi-nating features extraction, it suffers from the problems that its
performance would drop while meeting new inputs that have
never been considered in the training process, for example, a
test face, whose viewpoint does not face the camera, while the
training faces are of frontal face. Features extracted with KFDA
are not invariant to these large changes because KFDA is es-
sentially a kind of appearance-based method. In [3], authors
suggested that a more sophisticated classifier, compared with
nearest neighbor (NN) classifier, was still needed even though
the KFDA algorithm has been employed for the multiview face
recognition because the face-pattern distribution would still be
nonseparable in the KFDA-based subspace. In other words, a
1045-9227/$20.00 2006 IEEE
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
2/15
LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 179
classifier with good generalization ability and minimal empir-
ical risk is necessary for making up the drawback of the appear-
ance-based feature extractor. Based on this, an SVM can serve
as a good classifier candidate.
SVM was proposed by Vapniket al.[13]and has been suc-
cessfully applied to various applications such as the unsuper-
vised segmentation of switching dynamics[46],face member-ship authentication[47], and image fusion[48]. Recently, sev-
eral works relative to face recognition have used SVMs as classi-
fiers and yielded satisfactory results [25][31]. In those systems,
the SVMs used are of regular SVM. However, some researches
which are not directly relative to the search of face recognition
have indicated that SVMwould suffer from several critical prob-
lems when applied to classify some particular data types. The
first problem is that SVM is very sensitive to outliers since the
penalty weight for every data point is the same [5][7].Second,
the class-boundary-skew problem will be met when SVM is ap-
plied to the problem of learning from imbalanced data sets in
which the negative data heavily outnumber the positive data
[1], [11],[17],[33]. The class boundary, i.e., the optimal sep-arating hyperplane (OSH), learned by SVM, can be skewed to-
wards the positive class. In consequence, the false-negative rate
can be very high and can make SVM ineffective in identifying
the targets that belong to the positive class, which results in the
class-boundary-skew problem. The two problems limit the per-
formance of SVM. Unfortunately, they also occur in the appli-
cation of SVM-based face recognition.
In face recognition, for example, a face image with an exag-
gerated expression may result in the existence of an outlier. If
the outlier possesses nonzero value of slack variable, the soft
margin algorithm used in regular SVM would start to find a hy-
perplane to let the error be correct. The overfitting problem mayfollow. The other problem is that SVM was originally designed
for the binary classification, while face recognition is practically
a problem of multiclass classification. To extendthe binarySVM
to multiclass face recognition, most existing systems [25][31]
used the one-against-all (OAA) method. As far as the compu-
tational effort is concerned, OAA may be more efficient than
one-against-one (OAO) strategy. The advantage of OAA over
OAO is that we only have to construct one hyperplane for each
ofthe classesinsteadof pairwise decision functions.
This decreases the computational effort by a factor of ;
in some examples, it can be brought down to [35]. This
may be the reason that authors of[25][31]used OAA in their
systems, though it has been reported that OAO is more efficient
than OAA in terms of classification accuracy[2],[36], [45]. As
OAA method is used, one of the classes will be the target class
andtherest classeswillbethenegativeclassforthelearning
of each OSH. The class-boundary-skew problem occurs. Also,
the larger the number of classes becomes, the more imbalanced
the training set is when OAA method is applied.
To remedytheseproblemswhenSVM is applied to face recog-
nition, this paper proposes a new classifier called TAF-SVM.
TAF-SVM is able to solve the overfitting problem by fuzzifying
the training set which is equivalent to fuzzifying the penalty term
[7],[44]. With this manner, training data are no longer treated
equally but treated differently according to their relative impor-tance. Besides, TAF-SVM also embodies the different cost al-
gorithm[11],[33], by which TAF-SVM can adapt itself to the
imbalanced training set such that the false-negative rate is re-
duced and the recognition accuracy is enhanced.
Another contribution of this paper is that we replace the soft
margin algorithm by introducing the total margin algorithm [4]
to TAF-SVM. The total margin algorithm not only considers
the errors but also involves the information of correctly classi-fied data points in the construction of OSH. Compared with the
conventional soft margin algorithm used in the regular SVM, a
lower generalization error bound can be reached. This can facil-
itate the face recognition since the generalization ability plays
a very important role for the predictions of unseen face images.
We combine these approaches in TAF-SVM and show that we
can significantly improve the face-recognition accuracy com-
pared to applying any one approach including SVM also.
This paper is organized as follows. Section II presents the
KFDA-based feature extraction method. A brief review of SVM
is given inSection III. Then, the problems of applying SVM to
face recognition are pointed out in detail together with the solu-
tions embodied in the TAF-SVM. InSection IV,we reformulatethe TAF-SVM in both linear and nonlinear cases. Experimental
results are presented and discussed inSection V. Conclusions
are drawn inSection VI.
II. FEATURE EXTRACTION VIAKFDA
A face image is first scanned row by row to form a vector
of . The training set contains images out of
subjects, namely and , ,
where is the set of class and is the cardinality of . For
KFDA, the within-class scatter and between-class scatter
in space are given by
(1)
(2)
where is a nonlinear mapping function that maps the data
from input space to a higher dimensional feature space:
. denotes the th face image in the
th class. The mapped data are centered in space [9],[24].
KFDA seeks tofind a set of discriminating orthonormal eigen-
vectors for the projection of input face image byperforming FLDA in space in which the between-class scatter
is maximized and the within-class scatter is minimized. This is
equivalent to solving the following maximization problem:
(3)
Solutions associated with the largest nonzero eigenvalues
must lie in the span of all mapped data; so, for , there exists
a normalized expansion coefficient vector
such that
(4)
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
3/15
180 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007
Thus, for a testing face image , its projection on the th eigen-
vector is computed by
(5)
We do not need to know the nonlinear mapping exactly. By
using the kernel trick, the projection can be easily obtained
by
(6)
where the kernel function is defined as the dot product of vectors
(7)
Theradial basis function(RBF) kernel is used in this paper and
is expressed as
(8)
where the width is specifieda prioriby the user.
To project a face image into new coordinates, eigenvec-
tors associated with thefirst largest nonzero eigenvalues
are selected to construct the transformation matrix or
such that the dimensionality of a face image
is reduced from to .
To simplify the notation used in the following, we let the numberof projection vectors be equal to .
III. BASICIDEAS OFTAF-SVM
A. Basic Review of SVM
In SVM, the training set is given as , where
is the training data and is its class label being either
1 or 1. Let and be the weight vector and the bias of
the separating hyperplane, the objective of SVM is to find the
OSH by maximizing the margin of separation and minimizing
the training errors
Minimize (9)
Subject to (10a)
(10b)
where , is the nonlinear mapping
function which maps the data into a higher dimensional feature
space from the input space. are slack variables representing
the error measures of data points. The penalty weight is a free
parameter; it measures the size of the penalties assigned to the
errors. Minimizing the first term in (9)is equivalent to maxi-
mizing the margin of separation, which is related to minimizing
the VapnikChervonenkis (VC) dimension. Formulation of theobjective function in (9) is perfect accord with the structural risk
minimization (SRM) principle, by which good generalization
ability can be achieved[8].
By introducing the Lagrangian, the primal constrained opti-
mization problem can be solved with its dual form. The pre-
dicted class of an unseen data is the output of the decision
function
sign (11)
where are the nonnegative Lagrange multipliers for thefirst
inequality constraints in the primal problem (10a), are
support vectors for which , and is the number
of support vectors. The optimal value of is calculated with
KuhnTucker (KT) complementary conditions.
B. Basic Ideas of TAF-SVM
1) Dealing With the Overfitting Problem via Fuzzification of
Training Set: One issue on using SVM for face recognition ishow to tackle the overfitting problem since large variation re-
sulted from facial expressions and viewpoints may produce the
outliers appearing in the pattern distribution. As shown in pre-
vious researches[5], [6], SVM is very sensitive to outliers or
noises since the penalty term of SVM treats every data point
equally in the training process. This may result in the occurrence
of overfitting problem if one or few data points have relatively
very large values of . Wanget al. and Huanget al. proposed
the fuzzy SVM (FSVM) to deal with the overfitting problem
[7],[44],based on the idea that a membership value is assigned
to each data according to its relative importance in its class so
that a less important data is punished less. To achieve this, the
fuzzy penalty term is redefined in FSVM where is
the membership value denoting the relative importance of point
to its own class.
We incorporate the concept of FSVM into the proposed TAF-
SVM. The training set isfirst divided into two sets: the fuzzy
positive training set and the fuzzy negative training set ,
denoted by
(12a)
(12b)
where the membership values and standfor t he relative importance o f the points and to
the positive class and negative class, respectively. The variable
is a small positive real number. and are the cardinalities
of fuzzy positive training set and fuzzy negative training set,
respectively, and .
2) Adaptation to Imbalanced Face Training Sets via Different
Cost Algorithm: Face recognition is practically a task of mul-
ticlass classification while SVM was designed for the binary
classification. OAO and OAA methods are two popular ways to
realize the SVM-based multiclass classification task[2]. Based
on the pairwise learning framework, OAO method needs to con-
struct OSHs and use the voting strategy to makefinal
decisions if there are subjects to be recognized. Compared withOAO method, OAA method, by which only OSHs need to
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
4/15
LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 181
be learned, is more effective in terms of computational effort.
Therefore, it is found that most existing SVM-based face-recog-
nition systems chose the OAA method to accomplish the task of
multiclass classification[25][31]. However, a critical problem,
the class-boundary-skew phenomenon, which had never been
pointed out in these SVM and OAA method-based face-recog-
nition systems, is followed.By using OAA method to learn each OSH for multiclass face
recognition, one of the subjects forms the positive class and
the rest form the negative class. With this manner, the training
faces of the negative class significantly outnumber the training
faces in positive class. The ratio of the size of negative class to
the size of positive class is . A very imbalanced face
training set is produced. The larger the number of subjects is,
the heavier the imbalance of the face training set is.
It has been recently reported that the success of SVM is
limited when applied to the imbalanced data sets [1],[11],[17],
[33] because the OSH would be skewed towards the positive
class and results in the class-boundary-skew phenomenon. To
solve this critical problem, some remedies have been proposedincluding the oversampling and undersampling techniques [18],
combining oversampling with undersampling [19], synthetic
minority oversampling technique (SMOTE) [20], different error
cost algorithms[1], [33], class-boundary-alignment algorithm
[17],and SMOTE with different cost algorithm (SDC)[11].
Those methods can be divided into three categories. The
methods proposed in[18][20]process the data before feeding
them into the classifier. The oversampling technique dupli-
cates the positive data by interpolation while undersampling
technique removes the redundant negative data to reduce the
imbalanced ratio. They are classifier-independent approaches.
The second category belongs to the algorithm-based approach[1], [17], [33]. For example, Veropoulos et al. [1] and Lin et
al. [33] proposed the different cost algorithms suggesting that
by assigning heavier penalty to the smaller class, the skew
of the OSH can be corrected. The third category is the SDC
method which combines the SMOTE and the different error
cost algorithm[11].
For face recognition, since each training data stands for the
particular face information, we attempt not to use any presam-
pling techniques. Instead, the proposed TAF-SVM adopts the
different cost algorithm to achieve the goal of adaptation to the
imbalanced face training sets when faced with the OAA-based
multiclass classification. Another reason for using this algo-
rithm is that the different cost algorithm was originally designed
for solving the skew problem of SVM. By combining the fuzzy
penalty and the different cost algorithm, the proposed fuzzified
biased penalties are expressed as
(13)
where and are the penalty weights for the errors of the
positive class and negative class, respectively. The slack vari-
ables and are the measurement of errors of the data be-
longing to the positive class and the negative class, respectively.By setting to meet the central concept of different
cost algorithm, the OSH will be much more distant from the
smaller class.
3) Improvement of the Generalization Error Bound via Total
Margin Algorithm: Due to the fact that it is impossible to take
all face information into consideration, i.e., the available face
training samples are always finite and not numerous, the gen-
eralization ability for a classifier dominates the prediction ac-curacy for unseen faces. Soft margin algorithm used in SVM
relaxes the measure of margin by introducing slack variables
to errors. An OSH is found with the maximal margin of sepa-
ration by maximizing the minimum distance between
few extreme values (support vectors) and the separating hyper-
plane. However, only few extreme training data that are used
would cause the loss of information because the most of in-
formation is contained in the nonextreme data, which occupy
the majority in the training set. Feng et al.proposed the scaled
SVM [21], which not only employed the support vectors but
also involved the means of the classes to reduce the generaliza-
tion error of SVM. However, the face-pattern distribution is gen-
erally non-Gaussian and highly nonconvex[3], [22]. Namely,the mean of a class may not be very representative. Another ap-
proach for improving the generalization error bound called total
margin algorithm has been also proposed by Yoon et al.[4].
The total margin algorithm extends the soft margin algorithm
by introducing extra surplus variables to the correctly classi-
fied data points . The surplus variable measures the distance
between the correctly classified data point and the hyperplane
, if this data point belongs to the positive/neg-
ative class. In addition to minimizing the sum of slack variables
(the misclassified data points) while maximizing the margin
of separation proposed by soft margin algorithm, total margin
algorithm suggests that the sum of surplus variables (the cor-rectly classified data points) should also be maximized simulta-
neously. Maximizing the sum of surplus variables is equivalent
to maximizing , which in turn is equivalent to minimizing
. Therefore, total margin-based SVM is formulated as
the constrained optimization problem
Minimize
Subject to
(14)
where is the weight for the misclassified data points and
is the weight for the correctly classified data points, i.e., thesurplus variables .
From (14), we can see that the construction of OSH is no
longer controlled only by few extreme data points in which
most of them may be misclassified data points, but also by the
correctly classified data points, which are the majority of the
training set. The advantages are clear. First, the essence of the
soft margin-based SVM is to rely only on the set of data points
which take extreme values, the so-called support vectors. From
the statistics of extreme values, we know that the disadvantage
of such an approach is that the information contained in most
samples (not extreme values) is lost, so that such an approach
is bound to be less efficient than one that takes into account the
lost information[21], i.e., the correctly classified data points.Therefore, total margin algorithm can be more efficient and gain
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
5/15
182 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007
Fig. 1. Geometric interpretation of slack variables and surplus variables used in TAF-SVM.
better generalization ability than soft margin algorithm since the
information of all samples are considered in the construction
of OSH. Second, from the objective expressed in(14), we can
see that minimizing implies that the obtained OSH is
able to gain more correctly classified data points because mini-
mizing is equivalent to maximizing the sum of surplus
variables. Therefore, in this paper, we adopt the total margin al-
gorithm as one of the bases in the development of TAF-SVM
for the face recognition.
In order to facilitate the reformulation of TAF-SVM inSec-
tion IV, the usage of combining surplus variables and the im-
balanced penalties is illustrated here. Since the TAF-SVM con-
siders both of the different cost algorithm and the total margin
algorithm, the geometric relationship between the positive/neg-
ative slack variables and the positive/negative surplus variables
is illustrated inFig. 1.
InFig. 1, the white circles and the black circles denote thedata points belonging to the positive class and the negative class,
respectively. The slack variable measures the distance be-
tween the hyperplane and the misclassified data point
which is supposed to be classified as the positive class. Con-
trarily, is the measurement from the misclassified data point
to the hyperplane . The surplus variable mea-
sures the distance between the correctly classified data point
and the hyperplane . The surplus variable measures
the distance between the correctly classified data point and the
hyperplane . All these variables are nonnegative vari-
ables. At least one of and will be zero for a data point .
Furthermore, we assume that are the positive training datapoints, in which any of can have two different classification
results: misclassified and correctly classified. Table I summaries
the relationship between the slack variable and the surplus
variable according to different classification results of .
Notice that the used inTable Ican be any data point among
all the positive training data points, while the shown inFig. 1
is just one misclassified positive training data point.
IV. REFORMULATION OFTAF-SVM
In this section, we reformulate the proposed TAF-SVM for
linearly nonseparable and nonlinearly nonseparable cases basedon the aforementioned ideas.
TABLE IINTERPRETATION OF THE RELATIONSHIPS BETWEENSLACK VARIABLES,
SURPLUS VARIABLES, AND THE P OINT L OCATIONS BY T AKINGPOSITIVETRAININGDATAPOINTS ASEXAMPLE
A. Linearly Nonseparable Case
The primal problem for the linearly nonseparable case is re-
formulated as follows:
Minimize
(15)
Subject to
(16a)
(16b)
(16c)
(16d)
(16e)
(16f)
where and are the weights for positive and negative
slack variables, respectively. and are the weights for
the positive and negative surplus variables, respectively. It is
difficult to solve this constrained optimization problem. Sim-ilar to SVM, the primal optimization problem of TAF-SVM is
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
6/15
LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 183
transformed to the dual form by introducing a set of nonnega-
tive Lagrange multipliers , , , , , and for the
constraints from(16a) to (16f)to yield its Lagrangian
(17)
Differentiation with respect to , , , , , and yields
(18a)
(18b)
(18c)
(18d)
(18e)
(18f)
By the resubstitution of these equations into primal problem, the
dual problem is obtained
Maximize (19)
Subject to (20a)
(20b)
(20c)
B. Nonlinearly Nonseparable Case
The dual form for the nonlinearly nonseparable case can be
obtained by using kernel function
where , is a nonlinear map. The
objective is as follows:
Maximize (21)
The constraints for this maximization problem are the same as
those in the dual form of the linear case (20a)(20c).The KT
complementary condition plays a key role in the optimality. The
KT complementary conditions for the nonlinear TAF-SVM are
given by
(22a)
(22b)
(22c)
(22d)
(22e)
(22f)
The optimal value of can be calculated with any data in
the training set satisfying the KT complementary conditions.
However, from the numerical perspective, it is better to take the
mean value of resulting from all such data[14]. Therefore,
the optimal value of is computed by
(23)
where and are the subsets of and , respectively
(24a)(24b)
For an unseen data , its predicted class is the output of the
decision function
sign (25)
According to the formulation of the TAF-SVM, three main
properties are discussed and summarized as follows.
1) Through an inspection from the constraints in the dualform, we can see that the Lagrange multipliers ( and
) are bounded with t he upper b ound ( and )
and the lower bound ( and ). Therefore, ac-
cording to (20b) and (20c), all training data are support
vectors for TAF-SVM since all data are with nonzero ,
which meets the role of the total margin algorithm. On the
contrary, in the soft margin-based SVM, the OSH is only
constructed by few data points whose satisfy .
2) In SVM, the are bounded by the range of .
For all data points, their feasible regions are fixed once
is chosen. Speaking of TAF-SVM, the feasible region
is dynamic since the upper and lower bounds for every
data point are different, because the bounds of the feasibleregion are functions of the assigned membership values.
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
7/15
184 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007
This means that a less important data point hasthe narrower
width in the feasible region.
Another question is how to fuzzify the training set effi-
ciently. Basically, the rule to assign proper membership
values to data points can depend on the relative importance
of data points to their own classes. Therefore, for a positive
data, its assigned membership value can be calculated withthe membership function
if
otherwise(26)
where denotes the Euclidean distance, the lower bound
is a nonnegative small real number and is user-defined,
and is the mean of all data points in . The member-
ship values for all the fuzzified positive training data
are bounded in . The same procedure is also used for
the fuzzification of the negative data in which the mean is
calculated with all the negative data.
3) In SVM, only one free parameter has to be adjusted. Amore complex procedure may occur for TAF-SVM since
there are four free parameterstobe adjusted: , , ,
and . However, the adjustment process can be further
simplified according to some relationships. First, the in-
equality constraints in(20b) and (20c)say that the two in-
equalities and must be held. Second,
based on the concept of adaptation to imbalanced data sets,
the relationships and are required
if the size of positive class is smaller than that of negative
class. Two ratios are defined to simplify the procedure of
the adjustment of these parameters
(27)
(28)
Byfixing the value among any of the four parameters ,
, , and , and setting the values of and , the
other three parameters can be obtained directly. In the case
of 1, no adaptation effort will be made to the imbal-
anced case. Furthermore, TAF-SVM will be the standard
SVM if equals 1 and goes to infinity ( ,
0, and is finite), and the membership
values are set as 1. It is noticed that the very small number10 is added to avoid the situation when .
Also, when 1, infinity, and 0 1, the pro-
posed TAF-SVM will become the FSVM. Therefore, SVM
and FSVM can be viewed as two particular cases of the
proposed TAF-SVM.
V. EXPERIMENTAL RESULTS
A. Experiment on CYCU Face Database
Here, we present a set of experiments that were carried out by
using the Chung Yuan Christian University (CYCU) multiview
face database [10]. The CYCU multiview face database contains
3150 face images out of 30 subjects and involves variations offacial expression, viewpoint, and sex. Each image is a 112 92
Fig. 2. Thirty subjects of the CYCU multiview face database.
Fig. 3. Collected 21 face images of one of the 30 subjects in CYCU multiviewface database.
24-b pixel matrix. The viewpoint is governed by two parame-
ters: the rotation angle and the tilt angle where the rotation
angle and tilt angle have seven and three kinds of degrees of
angles ( , ), re-
spectively. For each viewpoint of each subject we prepared five
face images with different facial expressions. Therefore, each
subject has 105 face images.Fig. 2shows the total 30 subjectsin this database. All images contain face contours and hair. The
color of the background is nearly white and the lighting condi-
tion is controlled to be uniform.Fig. 3shows the collected 21
images containing 21 different viewpoints of one subject.
1) Analysis of Face-Pattern Distributions in KFDA-Based
Subspace: Two cases are analyzed in this subsection. Before
the experiments, all colored face images are transferred to gray-
level images and the contrast of each gray image is enhanced
by the histogram equalization. All gray-level images of 112
92 pixels are resized to 28 23 pixel matrices before the fea-
ture extraction. In addition, all extracted features by KFDA from
Case 1 to Case 2 are thefirst two most discrimination features.Case 1: Fig. 4 depicts the distribution of the face pat-
terns of five subjects randomly chosen from the database
in the KFDA-based subspaces. Each subject contains 21
patterns covering the whole range: and
, i.e., each of the 21 viewpoints provides one
image for one person. Two observations are as follows. First,
when , we observe that there exists an outlier
for the class denoted by ,as shown inFig. 4(a). This outlier
is very far from the main body of its class and falls into another
class denoted by .The SVM will suffer from the overfitting
problem when it is applied to solve the binary classification
problem between the two classes. Second, according to the
distribution shown inFig. 4(b), it is observed that there existsan overlap between the three classes denoted by , ,and
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
8/15
LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 185
Fig. 4. Distribution of 105 face patterns out of the five subjects in KFDA-basedsubspaces with the RBF kernel where (a)
2
and (b)
.
,respectively. To identify the class by using the OAA
method, the imbalanced ratio of positive class to negative class
is 1 : 4. The OSH learned by the traditional SVM will be skew
toward the positive class . Consequently, the number of
false negatives will increase and the recognition accuracy will
decrease.
Case 2: Most face-recognition systems evaluate their
systems performance by changing some face conditions such
as expression, viewpoint, and illumination conditions, etc. Ac-
cordingly, several well-known databases are widely used such
as the Olivetti Research Laboratory (ORL) face database, the
University of Manchester Institute of Science and Technology
(UMIST) multiview face database, and Yale face database. The
three face databases have different conditions considered. For
example, ORL database contains 400 face images in which
all frontal face images have different facial expressions and
details (glasses or no glasses). UMIST database consists of 575
face images covering a wide range of poses from one-sided
profile to frontal views as well as the expressions. Yale database
contains 165 frontal face images having different expressions,illumination conditions, and small occlusions (with glasses).
The three databases have involved most of the considerable
conditions crucial to the evaluations of face-recognition sys-
tems. However, all the faces in these databases are bounded
well. That is, they do not take the variations of face contour and
hair into consideration.
Face-recognition task follows the face-detection task. For ex-
ample, the SVM-based face-detection system[34]searches forfaces from an image by using size-varying widows to scan this
image and perform face/nonface-classification task. Once the
faces are detected, these faces will be framed by rectangular
bounding boxes with different sizes and then sent into the face-
recognition system. The framed face images detected by dif-
ferent face-detection systems (even the same) may contain the
whole hair and face contours, or just partial hair and contours,
or none of them. Most existing face-recognition systems do not
evaluate their systems on this factor since all the images in the
three databases are full faces containing both hair and contours.
Er et al. have conducted an interesting experiment in their
work[16]. They evaluated their system [discrete cosine trans-
form (DCT) FLDA RBF neural networks] on two groups ofdata: One was full faces of Yale database, and the other was the
closely cropped faces of the same database, and achieved error
rates of 1.8% and 4.8%, respectively, which were lower than
other approaches such as eigenfaces and Fisherfaces, etc. How-
ever, each of the two groups does not consider both full faces
and cropped faces at the same time. Nevertheless, by comparing
the two results, we see that the information of face contour and
hair style is important for face recognition. This study will eval-
uate the proposed TAF-SVM by letting this information be a
variable.
In this paper, we assume that an input face can be a full face
or a partially cropped face in order to fulfill the requirement thatin addition to variations resulted from different expressions and
viewpoints, a robust and reliable face-recognition system should
also be able tofight against the variation due to the size-varying
face-bounding boxes. This case aims to investigate the influence
of the changes of the sizes of face-bounding boxes upon the
face-pattern distribution in KFDA-based subspace. To achieve
this goal, each face image is cropped to a new face image with
two integer cropping sizes: and . This procedure is called
face cutting, which is illustrated inFig. 5, where the operator
round is to force the value of to become an integer. The
dotted white rectangle is the face-bounding box. After the cut-
ting, the cropped image is resized to a 112 92 new image. With
this manner, an input face may contain the whole face contour
or just part of it.
The face-cutting procedure is performed to all the 105 face
images that have been used in Case 1 with randomly chosen
within the range of [0, 7]. Fig. 6(a) and (b) shows the distribution
of these 105 randomly cropped face patterns in the KFDA-based
subspaces. Compared with the distribution depicted inFig. 4,
it can be seen that face images with different cropping sizes
significantly result in the increase of the interclass ambiguities
and the decrease of the intraclass compactness. Although the
KFDA-based feature extraction method has tried to maximize
the between-class separability and the within-class compact-
ness, it cannot absorb the large variations caused from view-point and size-varying face-bounding box. Therefore, a robust
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
9/15
186 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007
Fig. 5. Face-cuttingprocedureand thecropped face imageswith differentcrop-ping sizes of
1
. The dotted white rectangle is the face-bounding box.
Fig. 6. Distribution of randomly cropped and 105 face patterns offive subjectsin the KFDA-based subspaces subject to (a)
2
, and (b)
.
classifier is still needed even though the robust feature extrac-
tion method KFDA has been used. The robust classifier here
means a classifier better than NN classifier. Besides, because
outliers would appear in the distribution, and the situation of im-
balanced training data sets would happen when OAA method is
employed, a classifier which is more robust than SVM is alsoneeded. This also motivates this paper.
2) Sensitivity Test of TAF-SVM: The goal of this experiment
is to test the sensitivities of TAF-SVMto its intrinsic parameters,
including the penalty weights and , weights for surplus
variables and , and the lower bound used in the fuzzy
membership function. To make the following experiments more
constructive, three conditions containing different criteria for
the collection of the training set and the test set are de fined asfollows.
Condition 1: For the training set, each subject offers 21
face images picked from all 21 angle combinations of
. Each angle combination randomly offers one for
each subject. Therefore, the training set contains 630 face
images out of the 30 subjects. The collection procedure
for the test set is of the same procedure. The two sets have
no overlap.
Condition 2: Each set is provided with 21 face images
from every subject, so each set has 630 face images in
total. The face images, different from Condition 1, are
picked randomly from confined angle combinations. For
the training set, only 30 and 0 are considered in ,and 15 and 0 in . As to the test set, only combina-
tions of 45 and 15 in , and 15 and 0 in are
picked. Those chosen face images will not be picked again.
Condition 3: Face images are randomly chosen from all
3150 face images in CYCU face database for the training
set and test set. Each of the 30 subjects provides 21 face
images for each set and there is no overlapping between
the two sets. Those chosen face images will not be picked
again.
As far as the viewpoint of face is concerned, it can be seen hat
the degrees of uncertainties of the three data collection criteria
are apparently different.Condition 2 has the highest uncertaindegree among the three, while Condition 1 has the lowest. Also,
the face-cutting procedure is performed to all face images before
the feature extraction with random cropping size in the range
[0, 7].
Before extracting features via KFDA method, the optimal
RBF kernel parameter , which results in the minimum error
rate, is found by searching the variation range of from 1 to 10 .
The error rate is the average error rate over ten runs. Whenever
we are performing the next run, the training set and test set are
reprepared based on Condition 3. Following the method used
in[3]and[15],the average error rate is computed by
(29)
where is the number of runs, is the number of errors for
the th run, and is the number of total testing face images
of each run. It is noticed that the total testing face images
means the training set in the parameter selection process and
classifier training, while in the experiment of comparison of dif-
ferent classification systems (online testing), thetotal test pat-
terns means the test set. After taking trail-and-error, the op-
timal KFDA parameter was found to be 2 5.6 10 , which
resulted in the lowest average error rate measured from the ten
training sets, also resulted in a low average error rate 11.8%measured from the ten test sets, by using the NN classifier. With
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
10/15
LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 187
the optimal kernel value, total of 29 discriminating feature com-
ponents are extracted from each face image by using the KFDA
method.
a) Sensitivity test on , , , and : Thefirst ex-
periment is to test the sensitivities of TAF-SVM to the four pa-
rameters: , , , and , which have been condensed
to the ratios and defined by(27) and (28). The values ofand for this experiment are {1, 10, 20, 30, 40} and {2, 4, 6,
8, infinity}, respectively. The values of is set as 10 and is
fixed. The lower bound of the fuzzy membership function is
fixed to 0.4. RBF kernel is also used for TAF-SVM where its
kernel parameter is set as 0.05. The experimental results for
the three conditions are shown inFig. 7.
The lowest error rates for the three conditions are 2.10%,
7.10%, and 4.80% when the pairs equal (20, 4), (30, 4),
and (30, 6), respectively. The corresponding values of( , ,
, ) are (200, 10, 50, 2.5), (300, 10, 75, 2.5), and (300,
10, 50, 1.66), respectively. In addition, the results also indicate
that the error rates can be reduced by changing the ratios of
and . In the following, we take the results ofCondition 3shown inFig. 7(c)as examples to show how the performance of
TAF-SVM will be affected under different and . Three steps
are as follows.
Step 1) InFig. 7(c),the largest average error rate 7.76% oc-
curs at the position (1, infinity).
(1, infinity) means that
(10,10,0,0). At this position, the different cost
algorithm is disabled because the penalties for
the positive and the negative classes are the
same ( 10). The total margin al-
gorithm is also disabled at this position because of
0. Therefore, only the fuzzy penaltyis used in TAF-SVM.
Step 2) As the position goes to (30, infinity) from
(1, infinity) , the average error rate decreases to
5.94% from 7.76%. At the position (30, in-
finity), the different cost algorithm is enabled (used)
in TAF-SVM because the penalties for the positive
and the negative classes become different:
300 and 10. We can see that when
300 and 10, the penalty for the positive class
is much larger than that for the negative class. This
meets the role of different cost algorithm, which
says: Assign heavier penalty to the smaller class. In
the experiments ofFig. 7(c), the number of nega-
tive training data is 29 times the number of positive
training data in the learning of each OSH by OAA
TAF-SVM, because there are 30 subjects in CYCU
database needed to be classified. On the other hand,
the total margin algorithm is still not enabled at this
position because of 0. By comparing
the analysis in Step 1) with the one in Step 2), we
can see that the error rate is reduced by the applica-
tion of different cost algorithm.
Step 3) As the position goes to (30, 6) from (30, in-
finity), the average error rate decreases to 4.8% from
5.94%. At the position (30, 6), not only thedifferent cost algorithm is enabled ( 300 and
Fig. 7. Comparisons of average error rates among differentpairs of
usedin TAF-SVM under different data collection conditions.
10), but also the total margin algorithm is en-
abled ( and ). By
comparing the analyses in Steps 1) and 2) with the
analysis in Step 3), we can see that the error rate can
be further reduced with the involving of total marginalgorithm, after the use of different cost algorithm.
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
11/15
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
12/15
LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 189
TABLE III
PARAMETERSETTING INKFDA, SVM, AND TAF-SVM
TABLE IVCOMPARISON OF THE AVERAGEERRORRATE ANDSTANDARDDEVIATION(SD)
OVERTENRUNSBETWEENTAF-SVM AND OTHERSYSTEMS
TABLE V
COMPARISON OFCOMPUTATIONTIMEAMONGDIFFERENTSYSTEMS
based SVM, the improvement is very limited (from 11.41%
to 9.02%) while that is significant by using OAO-based SVM(from 11.41% to 7.55%). It is not surprising that the difference
of recognition accuracy between OAO- and OAA-based SVM
is that apparent since the OAO method does not result in the oc-
currence of imbalanced data sets while OAA method does. As a
matter of fact, Linet al.[2]have indicated that the OAO method
is more suitable for practical use than the OAA method in terms
of classification accuracy according to their experiments carried
out on various popular data. In this paper, we also suggest that
the OAO method is better than the OAA method for SVM-based
face recognition. However, this suggestion is only in terms of
face-recognition accuracy because the OAO method takes more
recognition time than the OAA method (seeTable V).We conduct this experiment mainly based on the reason that
a robust face classifier should be able to maintain good sta-
bility while expecting that it can achieve the best recognition
accuracy under the training with different training sets. The re-
sults of Table IV indicate that TAF-SVM outperforms OAO-
and OAA-based SVM. This is due to the fact that TAF-SVM
not only can adapt to the imbalanced face data sets but also
can avoid the overfitting problem and improve the generaliza-
tion error bound. In addition, the system KFDA TAF-SVM
achieves the lowest standard deviation (0.57) compared with the
system KFDA SVM. It indicates that the TAF-SVM is more
stable than SVM.
4) Computational Complexity: Our experiments were im-plemented on an Intel Xeon 3.0 GHz-Workstation (1 MB L2
catch, DDR2 2.0 GB SDRAM, 800 MHz Front-Site-Bus, and
10 000 rpm SCSI-hard disk). The training program was imple-
mented by using Matlab since it is able to solve the eigenvalue
problem for KFDA and constrained optimization problem for
both SVM and TAF-SVM easily. After the training, we saved
the expansion coefficients for KFDA and the indispensable in-
formation to the further recognition including the support vec-tors, Lagrange multipliers, and the optimal bias for each OSH.
The test program was implemented using C++ language since
the recognition process only executes simple calculations such
as the dot product of vectors, their linear combinations, and de-
cision making. We recorded the computation time of the first
run of the last experiment and listed them in Table V.
Most training time was spent on the solving of the constrained
optimization problem. The larger the number of the training data
was, the more time the training process needed. Also, the pro-
portion of the increase of training time to that of training data
was more than one. The total training time of OAA-based SVM
(2429.7 s for 30 OSHs) is much larger than that of OAO-based
SVM (234.8 s for 435 OSHs), as shown inTable V. Moreover,we found that the training time of the proposed TAF-SVM was
smaller than that of OAA-based SVM. This may be due to the
reason that for TAF-SVM the feasible regions are functions of
membership values less than one. That is, most data have com-
parative smaller feasible regions in searching the Lagrange mul-
tipliers, compared with SVM. Although the training is time-con-
suming, what is the biggest concern for face recognition is the
online recognition speed.
In the training of an OSH for the OAA-based SVM, we
noticed that the percentage of the obtained support vectors
is around 20%25%. The proposed TAF-SVM, by which the
percentage of the support vectors is 100%, is 4.75 (76.4/16.1)times the recognition time of OAA-based SVM, as listed in
Table V.Besides, the recognition time for TAF-SVM is around
0.1231 s per subject. This recognition speed is acceptable for
the tasks of security and visual surveillance.
B. Experiment on FERET Database
The facial recognition technology (FERET) face database,
obtained from the FERET program[37],[38],[43], is a com-
monly used database for the test of state-of-art face-recognition
algorithms. In the following, the proposed TAF-SVM is tested
on a subset of this database.
This subset contains 1400 images of 200 subjects. The subsetcontains the images whose names are marked with two-char-
acter strings:ba, bj, bk, be, bf, bd,and bg.Each
subject has seven images involving variations in illumination,
pose, and facial expression. In our experiment, each original
image is cropped so that each cropped image only contains
the portions of the face and hair. Then, each cropped image is
resized to 80 80 pixels and preprocessed by the histogram
equalization. Some images of one of the 200 subjects are shown
inFig. 9.Six out of seven images of each subject are randomly
chosen for training, and the remaining one is used for testing.
The training set size is 1200 and the test set size is 200. We run
the previous process 20 times and obtain 20 different training
and test sets, and in each run there is no overlap betweentraining set and test set.
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
13/15
190 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007
Fig. 9. First row: images of one of the 200 subjects in the FERET database.Second row: cropped images of those in thefirst row after histogram equaliza-tion.
TABLE VICOMPARISONS OFAVERAGEERROR RATE AND SD AMONG DIFFERENT
SYSTEMSWITHKFDA FEATUREEXTRACTION ONFERET DATABASE
1) Performance Test After the KFDA Feature Extraction: In
this experiment, the face images will go through the KFDA fea-
ture extractionfirst before the classification. Therefore, wefirst
find the optimal parameters of KFDA.
a) Optimal parameter selection: The first stepis to find the
optimal parameters of KFDA for the experiment on the subset
of FERET database. Only two parameters for KFDA need to
be determined, namely the RBF kernel parameter , and the
number of chosen eigenvectors . The optimal parameter pair
will be the pair, over the wide ranges of and, resulting in the lowest average error rate. One average error
rate is computed from the 20 error rates under a specific param-
eter pair and the errors are measured by an NN classifier. In the
sequence, the optimal parameters 6.1 10 and
199, are found for KFDA. Then, the training sets are projected
onto the 199 eigenvectors and thus 20 projected training sets are
obtained.
Then, the projected training sets are used tofind the optimal
parameters for SVM and TAF-SVM, respectively. The RBF
kernel is still used in the classifiers. Similar to the searching
process for the KFDAs optimal parameter selection, here the
optimal parameters of each classifier will also be the parameters
resulting in the lowest average error rate over wide searching
ranges of the classifiers parameters.
b) Comparisons among different systems: After we select
the optimal parameters for each classifier, we start to test and
compare the classification accuracies by feeding 20 test sets
into different systems. The experimental results are listed in
Table VI.
First, by comparing the results in Tables VIand Table IV,
we can observe that the error rates obtained from FERET data-
base are much larger than those obtained from CYCU data-
base. It may be due to the following facts: 1) the number of
available training data from FERET database (six per subject)
is much smaller than that from CYCU database (21 per sub-ject), 2) there exist larger variations in FERET database, and 3)
the number of subjects in the subset of FERET database (200
subjects) is relatively much larger than that of CYCU database
(30 subjects). Nevertheless, the experimental results carried out
from FERET database show that TAF-SVM still performs better
than SVM. Based on the results in Table VI, TAF-SVM out-
performs both SVM (OAO) and SVM (OAA) in average error
rate by 3.21% and 5.10%, respectively. Additionally, TAF-SVMachieves the lowest variance among these systems, which indi-
cates that TAF-SVM is able to keep better stability than SVM
when facing different unseen patterns.
It is worth noting that though KFDA has been used to ex-
tract discriminating features from original image raw data based
on the maximization of between-class separability; however, it
does not mean that the class distribution in KFDA-based sub-
space will be separable. This can be evidenced by the result from
Table VI: (KFDA NN) 22.18%.Thisresult impliesthat,
based on the optimal parameters of KFDA having been used,
there still exist numerous errors between classes. That is, the
class distribution in KFDA-based subspace is still nonseparable.
This may result from two reasons as follows.First, the face patterns involve too large variation such that
KFDA is not able to separate the classes well. Second, in this
paper, the KFDA used for the subset of FERET database and
CYCU database actually suffers from the so-called small
sample size(SSS) problem[3],because in our experiment the
number of training patterns is smaller than the dimensionality
of the input training patterns. For example, in our training
sets, each pattern (80 80 pixel image) is a 6400-dimensional
vector, while the number of available training patterns is only
1200 (200 subjects, six per subject). The SSS problem also
occurs in the KFDA used for the experiment of CYCU data-
base, where each training pattern (28 23 pixel image) is a644-dimensional vector, while the number of training patterns
in each training set is 630 (21 per subject). Since the KFDA
used for the two databases suffers from the SSS problem, the
within-class scatter matrix of(1)is degenerated because
contains the null space.
To solve the SSS problem in numerical computation, this
paper uses the method of adding a matrix , where is the
identity matrix and is a small number, to the inner product
kernel matrix infinding the expansion coefficient vector for
the data projection. This method is very simple and was sug-
gested by Mika et al. [39], [40]. However, this method dis-
cards the discriminant information contained in the null space of
within-class scatter matrix, yet the discarded information may
contain the most significant discriminant information[3],[41],
[42]. This means, even if the most discriminant eigenvectors
have been used for the data projection in our experiments, these
eigenvectors are actually not the most discriminant. Although
KFDA is employed in our work, the face-pattern distribution
is still nonseparable, for example the face-pattern distribution
shown inFigs. 4(b)and 6.
Several more efficient solutions for this SSS problem have
been recently proposed, such as the kernel direct discriminant
analysis (KDDA)[3] which is a generalization of direct LDA
[41], and the complete kernel Fisher discriminant analysis
(CKFD) which combines the kernel PCA and LDA [42].We expect that the classification accuracy of each system in
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
14/15
LIU AND CHEN: FACE RECOGNITION USING TAF-SVMS 191
TABLE VII
COMPARISONS OF AVERAGEERRORRATE ANDSD AMONGDIFFERENT
SYSTEMSWITHOUTKFDA FEATUREEXTRACTION ONFERET DATABASE
Tables VIandIVwill be improved if, instead of KFDA, KDDA
or CKFD is used for the face-feature extraction in this paper.
Moreover, though KFDA has tried to minimize the within-
class scatters for obtaining larger intraclass compactness, this
cannot guarantee that no outliers will appear in the KFDA-based
subspace. For example, in Fig. 4(a), an outlier still exists in
the KFDA-based subspace. Under such a situation, the SVMs,
SVM (OAO), and SVM (OAA) may suffer from the overfitting
problem and the classification performance will drop. On the
other hand, for SVM (OAA), although KFDA has been used
for the face-feature extraction, the case of imbalanced trainingdata sets is still unavoidable in the KFDA-based subspace. In
the training of an OSH via SVM (OAA), the imbalanced ratio
of negative training data to positive data is 199 : 1. Such a large
imbalanced ratio will result in the class-boundary-skew problem
for SVM (OAA). This may be the reason why SVM (OAO) al-
ways performs better than SVM (OAA), because the ratio of
negative training data to positive data is always 1 : 1 for SVM
(OAO) if the sizes of classes are the same. To sum up,Table VI
shows that the proposed TAF-SVM improves the classification
performance of SVM (OAO) and SVM (OAA), and such a sig-
nificant improvement of performance should be attributed not
only to the use of fuzzy penalty and different cost algorithm,but also to the total margin algorithm embedded in TAF-SVM.
2) Performance Test Without KFDAs Feature Extraction: In
this experiment, the image raw vectors are directly sent into each
classifier without using KFDA as the feature extractor. Since the
KFDA feature extractor is no longer used, the optimal parame-
ters of each classifier need to be reselected. It is noted here that
the inputs of each classifier are normalized to zero mean and
unit variance. After feeding the 20 different test sets into these
systems directly without using KFDA feature extractor, the av-
erage error rates are obtained and listed in Table VII.
Comparing the results reported in Tables VII and VI, we
canfind that the average error rate of each system inTable VI
is lower than that listed in Table VII. For example,
(KFDA TAF-SVM) 14.15%, while (TAF-SVM)
20.40%. Therefore, we can conclude that by using KFDA as the
feature extractor, the classification accuracy of each classifier
can be further enhanced significantly. FromTable VII, it can be
seen that in terms of the average classification rate, TAF-SVM
outperforms SVM (OAA) and SVM (OAO) by 7.23% and
3.73%, respectively. In addition, TAF-SVM still achieves the
lowest variance.
VI. CONCLUSION ANDFUTUREWORK
A new classifier called TAF-SVM is proposed in this paper.
TAF-SVM is mainly designed for the improvement of the draw-backs of traditional SVM when applied to face recognition,
the class-boundary-skew problem, and the overfitting problem,
by introducing the different cost algorithm and the method of
fuzzification of training set. Another contribution is to enhance
the generalization ability of SVM by introducing the total
margin algorithm. Experimental results show that the proposed
TAF-SVM is superior to OAO- and OAA-based SVM in terms
of both face-classification rate and stability. The validity ofTAF-SVM on the improvement of classification accuracy of
SVM for face recognition has been indicated.
Based on the work presented, there still remain several topics
worth studying in the future. First, the circle-like membership
model for the training set fuzzification used in this paper is not a
very efficient model since the face-pattern distribution is in gen-
eral non-Gaussian and nonconvex. The study on a better model
is needed. Second, experimental results have shown that using
KFDA as feature extractor is able to enhance the classification
accuracy. However, for face-recognition, KFDA suffers from
the SSS problem in our work. It is believed that if this problem
is solved, e.g., by using the variants of KFDA such as KDDA
[3]or CKFD[42],the face-recognition accuracy can be furtherenhanced based on the use of TAF-SVM classifier.
ACKNOWLEDGMENT
The authors would like to thank the reviewers for their useful
comments and suggestions, and Prof. H.-P. Huang, Prof. S.-G.
Miaou, Prof. P. C. P. Chao, and H.-Y. Lin for their help in
preparing this paper.
REFERENCES
[1] K. Veropoulos,C. Campbell, and N. Cristianini, Controlling the sensi-
tivity of support vector machines,inProc. Int. Joint Conf. Artif. Intell.(IJCAI99), Stockholm, Sweden, 1999, pp. 5560.[2] C. W. Hsu and C. J. Lin,A comparison of methods for multiclass
support vector machines, IEEE Trans. Neural Netw., vol. 13, no. 2,pp. 415425, Mar. 2002.
[3] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, Face recogni-tion using kernel direct discriminant analysis algorithms,IEEE Trans.
Neural Netw., vol. 14, no. 1, pp. 117126, Jan. 2003.[4] M. Yoon, Y. Yun, and H. Nakayama,A role of total margin in support
vector machines,in Proc. Int. Joint Conf. Neural Netw., 2003, vol. 3,pp. 20492053.
[5] I. Guyon, N. Matic, and V. Vapnik,Discovering informative patternsand data cleaning, in Advances in Knowledge Discovery and Data
Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthu-rusamy, Eds. Menlo Park, CA: AAAI Press, 1996, pp. 181203.
[6] X. Zhang, Using class-center vectors to build support vector ma-chines, in Proc. IEEE Workshop Neural Netw. Signal Process.
(NNSP99), Madison, WI, 1999, pp. 311.[7] C. F. Lin and S. D. Wang,Fuzzy support vector machines, IEEE
Trans. Neural Netw., vol. 13, no. 2, pp. 464471, Mar. 2002.[8] V. Vapnik, Statistical Learning Theory. New York: Springer-Verlag,
1998.
[9] G. Baudat and F. Anouar,Generalized discriminant analysis using akernel approach,Neural Comput., vol. 12, pp. 23852404, 2000.
[10] Chung Yuan Christian Univ. (CYCU), Multiview Face DatabaseChungli, Taiwan [Online]. Available: http://vsclab.me.cycu.edu.tw/~face/face_index.html
[11] R. Akbani, S. Kwek, and N. Japkowicz,Applying support vector ma-chines to imbalanced datasets,in Proc. 15th Eur. Conf. Mach. Learn.(ECML), Pisa, Italy, 2004, pp. 3950.
[12] M. Turk and A. Pentland,Eigenfaces for recognition,J. Cogn. Neu-rosci., vol. 3, no. 1, pp. 7186, 1991.
[13] C. Corts and V. Vapnik,Support vector networks,Mach. Learn., vol.
20, pp. 273297, 1995.[14] J. C. Burges,A tutorial on support vector machines for pattern recog-nition,Data Mining Knowl. Disc., vol. 2, pp. 121167, 1998.
-
8/10/2019 Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines
15/15
192 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007
[15] M. J. Er, S. Wu, J. Liu, and H. L. Toh,Face recognition with radialbasis function (RBF) neural networks,IEEE Trans. Neural Netw., vol.13, no. 3, pp. 697710, May 2002.
[16] M.J. Er, W.L. Chen,and S.Q. Wu, High-speed face recognition basedon discrete cosine transform and RBF neural networks,IEEE Trans.
Neural Netw. , vol. 16, no. 3, pp. 679691, May 2005.[17] G. Wu and E. Cheng, Class-boundary alignment for imbalanced
dataset learning, in Proc. ICML 2003 Workshop Learn. Imbalanced
Data Sets II, Washington, DC, 2003, pp. 4956.[18] N. Japkowicz,The class imbalance problem: Significance and strate-gies,inProc. 2000 Int. Conf. Artif. Intell.: Special Track on Inductive
Learning, Las Vegas, NV, 2000, pp. 111 117.[19] C. Ling and C. Li,Data mining for direct marketing problems and
solutions, in Proc.4th Int. Conf. Knowl.Disc.Data Mining, New York,1998, pp. 7379.
[20] N. Chawla, K. Bowyer, and W. Kegelmeyer, SMOTE: Syntheticminority over-sampling technique, J. Artif. Intell. Res., vol. 16, pp.321357, 2002.
[21] J. F. Feng and P. Williams,The generalization error of the symmetricand scaled support vector machines,IEEE Trans. Neural Netw., vol.12, no. 5, pp. 12551260, Sep. 2001.
[22] M. H. Yang,Kernel Eigenfaces vs. kernel Fisherfaces: Face recogni-tion using kernel methods,in Proc. 5th IEEE Int. Conf. Autom. FaceGesture Recognit., Washington, DC, 2002, pp. 215220.
[23] Q. S. Liu, H. Q. Lu, and S. D. Ma,Improving kernelfisher discrim-
inant analysis for face recognition, IEEE Trans. Circuits Syst. VideoTechnol., vol. 14, no. 1, pp. 42 49, Jan. 2004.
[24] B.Schlkopf,A. Smola,and K.R. Mller, Nonlinear component anal-ysis as a kernel eigenvalue problem,Neural Comput., vol. 10, no. 5,pp. 12991319, 1998.
[25] K. I. Kim, K. Jung, and H. J. Kim,Face recognition using kernel prin-cipal component analysis, IEEE Signal Process. Lett., vol. 9, no. 2,pp. 4042, Feb. 2002.
[26] G. Cui and W. Gao,SVMs for few examples-based face recognition,inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. , Hong Kong,2003, vol. 2, pp. 381384.
[27] W. Chi, G. Dai, and L. Zhang,Face recognition based on independentGabor features and supportvector machine, inProc. 5th World Congr.
Intell. Control Autom., Hangzhou, China, 2004, vol. 5, pp. 4030 4033.[28] C. Y. Li, F. Liu, and Y. X. Xie,Face recognition using self-orga-
nizing feature maps and support vector machines, in Proc. 5th Int.Conf. Comput.Intell. Multimedia Appl., Xian, China, 2003,pp.3742.
[29] G. Dai and C. Zhou,Face recognition using support vector machineswith the robust feature, in Proc. 12th IEEE Int. Workshop Robot
Human I nteractive Commun., 2003, pp. 4953.[30] S. Y. Zhang and H. Qiao,Face recognition with support vector ma-
chine, in Proc. IEEE Int. Conf. Robot., Intell. Syst. Signal Process.,Changsha, China, 2003, vol. 2, pp. 726730.
[31] K. I. Kim, J. Kim, and K. Jung, Recognition of facial imagesusing support vector machines, in Proc. 11th Workshop Stat. SignalProcess., Singapore, 2001, pp. 468471.
[32] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman,Eigenfaces vs.Fisherfaces: Recognition using class specific linear projection,IEEETrans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711720, Jul.1997.
[33] Y. Lin, Y. Lee, and G. Wahba,Support vector machines for classifi-cation in nonstandard situations,Mach. Learn., vol. 46, pp. 191202,2002.
[34] E. Osuna, R. Freund, and F. Girosit, Training support vector ma-chines: An application to face detection,in Proc. Comp. Vis. Pattern
Recognit. (CVPR) , Puerto Rico, 1997, pp. 130136.[35] U. H.-G. Kressel, Pairwise classification and support vector ma-
chines,inAdvances in Kernel MethodsSupport Vector Learning, B.Schlkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA:MIT Press, 1999.
[36] T. Van Gestel, J. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G.
Dedene, B. De Moor, and J. Vandewalle,Benchmarking least squaressupport vector machine classifiers, Mach. Learn., vol. 54, no. 1, pp.532, 2004.
[37] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss,The FERETevaluation methodology for face-recognition Algorithms,IEEE Trans.Pattern Anal. Mach. Intell., vol. 22, no. 10, pp. 10901104, Oct. 2000.
[38] P. J. Phillips, The Facial Recognition Technology (FERET) Database(2004) [Online]. Available: http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
[39] S. Mika, G. Rtsch, J. Weston, B. Schlkopf, and K.-R. Mller,Fisher discriminant analysis with kernels,in Proc. IEEE Int. Work-
shop Neural Netw. Signal Process. IX, Aug. 1999, pp. 4148.[40] , Constructing descriptive and discriminant nonlinear features:Rayleigh coefficients in kernel feature spaces, IEEE Trans. Pattern
Anal. Mach. Intell., vol. 25, no. 5, pp. 623628, May 2003.[41] H. Yu and J. Yang,A direct lDA algorithm for high-dimensional data
with application to face recognition, Pattern Recognit., vol. 34, pp.20672070, 2001.
[42] J. Yang, A. F. Frangi, J. Y. Yang, D. Zhang, and Z. Jin, KPCA plusLDA: A complete kernel Fisher discriminant framework for featureextraction and recognition, IEEE Trans. Pattern Anal. Mach. Intell.,vol. 27, no. 2, pp. 230244, Feb. 2005.
[43] J. Yang, J. Y. Yang, and A. F. Frangi,Combined Fisherfaces frame-work,Image Vis. Comput., vol. 21, no. 12, pp. 10371044, 2003.
[44] H. P. Huang and Y. H. Liu,Fuzzy support vector machines for pat-tern recognition and data mining,Int. J. Fuzzy Syst., vol. 4, no. 3, pp.826835, 2002.
[45] J. Suykens and J. Vandewalle,Least squares support vector machine
classifiers,Neural Process. Lett., vol. 9, pp. 293300, 1999.[46] M. W. Chang, C. J. Lin, and R. C. H. Weng,Analysis of switching dy-
namics with competing support vector machines,IEEE Trans. NeuralNetw., vol. 15, no. 3, pp. 720727, May 2004.
[47] S. N. Pang, D. Kim, and S. Y. Bang,Face membership authenticationusing SVM classification tree generated by membership-based LLEdata partition,IEEE Trans. Neural Netw., vol. 16, no. 2, pp. 436446,Mar. 2005.
[48] S.Li, J.T. Y.Kwok, I.W. H.Tsang,and Y.Wang, Fusing imageswithdifferent focuses using support vector machines,IEEE Trans. Neural
Netw., vol. 15, no. 6, pp. 15551561, Nov. 2004.
Yi-Hung Liu (M04) received the B.S. degree innaval architecture and marine engineering fromNational Cheng Kung University, Tainan, Taiwan,R.O.C.,in 1994, and the M.S. degree in engineeringscience and ocean engineering in 1996 and the Ph.D.degree in mechanical engineering in 2003, both fromNational Taiwan University, Taipei, Taiwan, R.O.C.
He is currently an Assistant Professor with theDepartment of Mechanical Engineering at ChungYuan Christian University, Chungli, Taiwan, R.O.C.
His research interests include computer vision,machine learning, pattern recognition, data mining, automatic control, and theirassociated applications.
Yen-Ting Chen was born in Kaohsiung, Taiwan,R.O.C. He received the B.S. and M.S. degrees in
mechanical engineering from Chung Yuan ChristianUniversity, Chungli, Taiwan, R.O.C., in 2004 and2006, respectively.
Currently, he is with the Industrial TechnologyResearch Institute (ITRI), Hsinchu, Taiwan, R.O.C.,where he is working for the intelligent robots. Hisresearch interests include machine vision and neuralnetworks.