34
CHAPTER 3
PROPOSED IMPROVED BEE COLONY OPTIMIZATION
BASED ON ROUGH SET ALGORITHM
3.1 INTRODUCTION
When processing large databases, two major obstacles are
encountered: i) numerous samples ii) high dimensionality of the feature space.
For example, documents are represented by several thousands of words;
images are composed of millions of pixels, where each word or pixel is here
understood as a feature. Currently, processing methods are often unable to
handle such high dimensional data, mostly due to numerous difficulties in
processing, requirements in storage and transmission within a reasonable
time. To reduce the computational time, it is a common practice to project the
data onto a smaller, latent space. Moreover, such a space is often beneficial
for further investigation due to noise reduction and desired feature extraction
properties. Smaller dimensions are also advantageous when visualizing and
analyzing the data. Thus, in order to extract desirable information,
dimensionality reduction or feature selection methods are often applied. A
feature selection algorithm should be seen as a computational approach to a
definition of relevance, although in many cases these definitions are followed
in a somewhat loose sense.
35
3.2 EXISTING METHODS
Rough Set Theory based Approaches
In rough set based feature selection, the goal is to omit attributes
(features) from decision systems such that objects in different decision classes
can still be discerned. A popular way to evaluate attribute subsets with respect
to this criterion is based on the notion of dependency degree. In the standard
approach, attributes are expected to be qualitative; in the presence of
quantitative attributes, the methodology can be generalized using fuzzy rough
sets, to handle gradual (in) discernibility among attribute values more
naturally. However, both the extended approach as well as its crisp
counterpart, exhibit a strong sensitivity to noise; a change in a single object
may significantly influence the outcome of the reduction procedure. Jensen
and Shen (2008) proposed an extension of the fuzzy-rough feature selection
methodology, based on interval-valued fuzzy sets, as a means to counter this
problem through the representation of missing values in an intuitive way.
Pan et al (2008) proposed a hybrid feature selection approach based
on Rough sets and Bayesian network classifiers. In this approach, the
classification result of a Bayesian network is used as the criterion for the
optimal feature subset selection. The Bayesian network classifier used here is
a kind of naive Bayesian classifier. It is employed to implement classification
by learning the samples consisting of a set of texture features. In order to
simplify feature reduction using Rough Sets, a discrete method based on C-
means clustering method is also presented. This approach is applied to extract
residential areas from panchromatic SPOT5 images. Experimental results
show that the proposed method not only improves classification quality but
also reduces computational cost.
36
Chiang and Ho (2008) presented a novel rough-based feature
selection method for gene expression data analysis. This method can find the
relevant features without requiring the number of clusters to be known a priori
and identify the centers that approximate to the correct ones. They also
introduced a prediction scheme that combines the rough-based feature
selection method with radial basis function neural network. Further, the effect
of different feature selection methods and classifiers on this prediction
process are evaluated using the Naive Bayes and linear support vector
machine as classifiers and the performance is compared with other feature
selection methods, including information gain and principle component
analysis. The performance is demonstrated by several published datasets. The
results show that the proposed method can achieve better classification
accuracy.
Shang and Shen (2008) presented a methodological approach for
developing image classifiers that work by exploiting the technical potential of
both fuzzy-rough feature selection and neural network-based classification.
The use of fuzzy-rough feature selection allows the induction of low-
dimensionality feature sets from sample descriptions of real-valued feature
patterns of a (typically much) higher dimensionality. The employment of a
neural network, trained using the induced subset of features, ensures the
runtime classification performance. The reduction of feature sets reduces the
sensitivity of such a neural network-based classifier to its structural
complexity. It also minimises the impact of feature measurement noise to the
classification accuracy. This method is evaluated by applying the approach to
classifying real medical cell images, supported with comparative studies.
Cornelis and Jensen (2008) considered a more flexible
methodology based on the recently introduced Vaguely Quantified Rough Set
(VQRS) model. This method can handle both crisp (discrete-valued) and
37
fuzzy (real-valued) data and encapsulates the existing noise-tolerant data
reduction approach using Variable Precision Rough Sets (VPRS), as well as
the traditional rough set model, as special cases.
Xie et al (2008) used the VPRS model as a tool to support Group
Decision-Making in credit risk management. It was considered that the
classification in decision tables consisting of risk exposure may be partially
erroneous and a variable precision factor was used to adjust the classification
error. In this work, firstly VPRS and AHP were combined to obtain the
weight of conditional attribute sets decided by each decision-maker. Then, the
Integrated Risk Exposure of attributes is obtained based on the three VPRS-
based models. To verify the effectiveness of these methods, an illustrative
example is presented. The experimental results suggest that the VPRS-based
IRE has advantages in recognizing important attributes.
Fazayeli et al (2008) studied the Rough Set theory as a method of
feature selection based on tolerant classes that extends the existing equivalent
classes. The determination of initial tolerant classes is a challenging and
important task for accurate feature selection and classification. The
Expectation-Maximization clustering algorithm is applied to determine
similar objects. This method generates fewer features with either a higher or
the same accuracy compared to two existing methods namely, Fuzzy Rough
Feature Selection and Tolerance-based Feature Selection, on a number of
benchmarks from the UCI repository.
Song et al (2008) proposed a semi-supervised dimensionality
reduction framework which can efficiently handle the unlabeled data. Underthe framework, several classical methods, such as principal componentanalysis, linear discriminant analysis, maximum margin criterion, localitypreserving projections and their corresponding kernel versions can be seen as
special cases. For high-dimensional data, a low-dimensional embedding result
38
can be given for both discriminating multi-class sub-manifolds and preservinglocal manifold structure. Experiments show that these algorithms cansignificantly improve the accuracy rates of the corresponding supervised andunsupervised approaches.
Yao and Zhao (2008) addressed attribute reduction in decision-theoretic rough set models regarding different classification properties, such
as decision-monotocity, confidence, coverage, generality and cost. It isimportant to note that many of these properties can be truthfully reflected by a
single measure in the Pawlak rough set model. On the other hand, they needto be considered separately in probabilistic models. A straightforward
extension of the measure is unable to evaluate these properties. This studyprovides a new insight into the problem of attribute reduction.
Jensen and Shen (2009) proposed an extension of the fuzzy-rough
feature selection methodology, based on interval-valued fuzzy sets, as ameans to counter this problem via the representation of missing values in anintuitive way. Jensen et al (2009) proposed another approach, based on fuzzy-rough sets. The algorithm is experimentally evaluated against leading
classifiers, including fuzzy and rough rule inducers and shown to be effective.
Zainal et al (2009) investigated the effectiveness of rough settheory in identifying important features in building an intrusion detection
system. Rough set theory was also used to classify the data. KDD Cup 99 datawas used for validating the results. Empirical results indicate that rough set is
comparable to other feature selection techniques deployed by few otherresearchers.
Swarm Intelligence based Approaches
Ke et al (2008) introduced a new approach based on ant colony
optimization (ACO) for attribute reduction. To verify the proposed algorithm,
39
numerical experiments are carried out on thirteen small or medium-sized
datasets and three gene expression datasets. The results demonstrate that this
algorithm can provide competitive solutions efficiently.
Liu et al (2009) introduced two nature inspired population-based
computational optimization techniques, Particle Swarm Optimization (PSO)
and Genetic Algorithm (GA) for rough set reduction. PSO discovers the best
feature combinations in an efficient way to observe the change of positive
region as the particles proceed throughout the search space. The performance
of the two algorithms is evaluated using some benchmark datasets and the
corresponding computational experiments are discussed. Empirical results
indicate that both methods are ideal for all the considered problems and
particle swarm optimization technique outperformed the genetic algorithm
approach by obtaining more number of reducts for the datasets. A real world
application in fMRI data analysis is also illustrated which is helpful for
cognition research.
Bello et al (2009) achieved promising results solving the feature
selection problem through a joint effort between rough set theory and
evolutionary computation techniques. In particular, two new heuristic search
algorithms are introduced namely, Dynamic Mesh Optimization and another
approach which splits the search process carried out by swarm intelligence
methods.
Wang and Ma (2009) proposed an efficient algorithm called as
Feature Forest algorithm for generation of the reducts of a medical dataset. In
the algorithm, the given dataset is transformed into a forest to form
discernibility string that is the concatenation of some of the features and the
disjunctive normal form is computed to reduct features based on feature
forest. In addition, experimental results on different datasets show that the
40
algorithms can efficiently reduce storage cost and be computationally
inexpensive.
Mishra et al (2009) proposed a novel method for dimensionality
reduction of a feature set by choosing a subset of the original features that
contains most of the essential information, using the same criteria as the Ant
Colony Optimization (ACO) hybridized with Rough Set Theory called as
Rough ACO. This method is successfully applied for choosing the best
feature combinations and then applying the Upper and Lower Approximations
to find the reduced set of features from a gene expression data.
As seen in the literature, the Rough Set theory has higher
performance than the other methods. However, it is not possible in the theory
to say whether two attribute values are similar and to what extent they are the
same; for example, two close values may only differ as a result of noise, but
in the standard RST-based approach they are considered to be as different as
two values of a different order of magnitude. Dataset discretization must take
place before reduction methods based on crisp rough sets can be applied. This
is often still inadequate, however, as the degrees of membership of values to
discretised values are not considered at all. To solve this problem, a number
of variations in this theory have been proposed. Among those, the Swarm
Intelligence (SI) based methods perform better than the others.
3.3 ROUGH SET THEORY
Rough set theory (Pawlak, 1991) is an extension of conventional
set theory that supports approximations in decision making. Rough Set
Attribute Reduction (Chouchoulas and Shen, 2001) provides a filter-based
tool by which knowledge may be extracted from a domain in a concise way,
retaining the information content whilst reducing the amount of knowledge
involved. Central to RSAR is the concept of indiscernibility. Let I = (U,A) be
41
an information system, where U is a non-empty set of finite objects (the
universe) and A is a non-empty finite set of attributes such that a : U Va for
every a A. With any P A, there is an associated equivalence relation
IND(P):
IND(P) = { (x,y), U2 | a P a(x) = a(y) } (3.1)
The partition of U, generated by IND(P) is denoted U/P and can be
calculated as follows:
U/P = { a P : U / IND({a}), (3.2)
where
A B = { X Y : X A, Y B, X Y } (3.3)
If (x; y) IND(P), then x and y are indiscernible by attributes from
P. The equivalence classes of the P-indiscernibility relation are denoted [x]P.
Let X U, the P-lower approximation XP and upper
approximation XP of set X can now be defined as:
XP= { x | [x]P X } (3.4)
XP = { { x | [x]P X } (3.5)
Let P and Q be equivalence relations over U, then the positive,
negative and boundary regions can be defined as:
XP)Q(POSQ/UX
P (3.6)
42
XPU)Q(NEGQ/UX
P (3.7)
XPXP)Q(BNDQ/UXQ/UX
P (3.8)
The positive region contains all objects of U that can be classified
to classes of U/Q using the knowledge in attribute P.
An important issue in data analysis is discovering dependencies
between attributes. Intuitively, a set of attributes Q depends totally on a set of
attributes P, denoted P Q, if all attribute values from Q are uniquely
determined by values of attributes from P. If there exists a functional
dependency between values of Q and P, then Q depends totally on P.
Dependency can be defined in the following way:
For P,Q A, it is said that Q depends on P in a degree k (0 k 1),
denoted P kQ, if
|U||)Q(POS|)Q(k P
P (3.9)
If k = 1, Q depends totally on P, if 0 < k < 1, Q depends partially
(in a degree k) on P and if k = 0 then Q does not depend on P. Based on these
fundamentals, two important reduction methods have been discussed namely,
QuickReduct and Entropy-based method.
3.4 QUICKREDUCT
The reduction of attributes is achieved by comparing equivalence
relations generated by sets of attributes. Attributes are removed so that the
reduced set provides the same quality of classification as the original. A
43
reduct is defined as a subset R of the conditional attribute set C such
that )D()D( CR . A given dataset may have many attribute reduct sets, so
the set R of all reducts is defined as:
})D()D(,CX:X{R CR (3.10)
The intersection of all the sets in R is called the core, the elements
of which are those attributes that cannot be eliminated without introducing
more contradictions to the dataset. In RSAR, a reduct with minimum
cardinality is searched for; in other words an attempt is made to locate a
single element of the minimal reduct set Rmin R :
Rmin = { X : X R, Y R, |X| |Y| } (3.11)
The problem of finding a minimal reduct of an information system
has been the subject of much research (Alpigini et al 2002). The most basic
solution to locating such a reduct is to simply generate all possible reducts
and choose any one with minimal cardinality. Obviously, this is an expensive
solution to the problem and is only practical for very simple datasets. Most of
the time, only one minimal reduct is required. Therefore, all the calculations
involved in discovering the rest are pointless. To improve the performance of
the above method, an element of pruning can be introduced. By noting the
cardinality of any pre- discovered reducts, the current possible reduct can be
ignored if it contains more elements. However, a better approach is needed;
one that will avoid wastage of computational effort. The QuickReduct
algorithm as given below, attempts to calculate a minimal reduct without
exhaustively generating all possible subsets. It starts off with an empty set and
adds in turn, one at a time, those attributes that result in the greatest increase
in dependency, until this produces its maximum possible value for the dataset.
44
The QuickReduct Algorithm
QUICKREDUCT (C,D)
C, the set of all conditional features;
D, the set of decision features.
i. R { }
ii. do
iii. T R
iv. x (C – R)
v. )()(}{ DDif TxR
vi. T R U { x }
vii. R T
viii. until )()( DD CR
ix. return R
Note that an intuitive understanding of QuickReduct implies that,
for a dimensionality of n, (n2+n)/2 evaluations of the dependency function
may be performed for the worst-case dataset. According to the QuickReduct
algorithm, the dependency of each attribute is calculated and the best
candidate chosen. The next best feature is added until the dependency of the
reduct candidate equals the consistency of the dataset (1 if the dataset is
consistent). This process, however, is not guaranteed to find a minimal reduct.
Using the dependency function to discriminate between candidates may lead
the search down a non-minimal path. It is impossible to predict which
combinations of attributes will lead to an optimal reduct based on changes in
45
dependency with the addition or deletion of single attributes. It does result in
a close-to-minimal reduct, but still useful in reducing dataset dimensionality
to a great extent.
3.5 PROPOSED BEE COLONY BASED RSAR (BeeRSAR)
Nature is inspiring researchers to develop models for solving their
problems. Optimization is an instance field in which these models are
frequently developed and applied. Genetic algorithm simulating natural
selection and genetic operators, Particle Swarm Optimization algorithm
simulating flock of birds and school of fishes, Artificial Immune System
simulating the cell masses of immune system, ACO algorithm simulating
foraging behavior of ants and Artificial Bee Colony algorithm simulating
foraging behavior of honeybees are typical examples of nature inspired
optimization algorithms.
Artificial Bee Colony (ABC) algorithm, proposed by Karaboga
(2005) for real parameter optimization, is a recently introduced optimization
algorithm and simulates the foraging behaviour of bee colony for
unconstrained optimization problems (Basturk and Karaboga, 2006, Karaboga
and Basturk, 2007a, 2007b, 2008). For solving constrained optimization
problems, a constraint handling method was incorporated with the algorithm.
In a real bee colony, there are some tasks performed by specialized
individuals. These specialized bees try to maximize the nectar amount stored
in the hive by performing efficient division of labour and self-organization.
The minimal model of swarm-intelligent forage selection in a honey bee
colony, that ABC algorithm adopts, consists of three kinds of bees: employed
bees, onlooker bees and scout bees. Half of the colony comprises employed
bees and the other half includes the onlooker bees. Employed bees are
responsible for exploiting the nectar sources explored before and giving
46
information to the other waiting bees (onlooker bees) in the hive about the
quality of the food source site which they are exploiting. Onlooker bees wait
in the hive and decide a food source to exploit depending on the information
shared by the employed bees. Scouts randomly search the environment in
order to find a new food source depending on an internal motivation or
possible external clues or at random. Main steps of the ABC algorithm
simulating these behaviours are given in the algorithm:
Bee Colony Optimization Algorithm
i) Initialize the food source positions.
ii) Each employed bee produces a new food source in her food
source site and exploits the better source.
iii) Each onlooker bee selects a source depending on the quality of
her solution, produces a new food source in selected source
site and exploits the better source.
iv) Determine the source to be abandoned and allocate its
employed bee as scout for searching new food sources.
v) Memorize the best food source found so far.
vi) Repeat steps 2-5 until the stopping criterion is met.
The above procedure can be implemented for feature reduction. Let
the bees select the feature subsets at random and calculate their fitness and
find the best one at each iteration. This procedure is repeated for a number of
iterations to find the optimal subset. Figure 3.1 demonstrates the steps in the
proposed bee colony based reduct algorithm.
47
Figure 3.1 Bee Colony based Reduct Algorithm
In the first step of the algorithm, the employed bee produces the
feature subset in random. Consider a conditional feature set C containing N
features. Then ‘p’ numbers of bees are chosen as the population size. From
this population half of the bees are considered as employed bees and the
remaining are considered as onlooker bees. For each employed bee, N random
numbers are generated between 1 and N and assigned to them. From these
random numbers, the feature subset is constructed by performing a round
operation and then extracting only the unique numbers from the set.
For example, consider the random numbers {1.45, 1.76, 3.33,
1.01}, where N=4. First, the truncation operation is performed. Then, the set
Set of all conditional anddecision features
Initialize the Population
Calculate the fitness value
Produce the Feature Set
Greedy Selection
Solution for Onlookers
Greedy Selection - Onlookers
Determine the abandonedsolution and scouts
Calculate the Cycle bestFeature
48
is modified as { 1 1 3 1 }. From the above result, the unique numbers alone
are extracted as { 1 3}, representing the feature subset, meaning that the 1st
and 3rd feature values alone are selected. In the second step of the algorithm,
for each employed bee, whose total number equals to the half of the number
of food sources, a new source is produced by:
)xx(xv kjijijijij (3.19)
where ij is a uniformly distributed real random number within the range [-
1,1], k is the index of the solution chosen randomly from the colony (k = int
(rand * N) + 1), j = 1, . . .,D and D is the dimension of the problem. After
producing vi, this new solution is compared to solution xi and the employed
bee exploits the better source. In the third step of the algorithm, an onlooker
bee chooses a food source with the probability and produces a new source in
selected food source site. As for employed bee, the better source is decided to
be exploited.
The indiscernibility relation is calculated for each feature subset as
objective value ( i). This value has to be maximized. From this objective
value the fitness value is calculated for each bee as given in the following
equation:
otherwise)f(abs10fif)f1/(1
fiti
iii
(3.20)
The probability is calculated by means of fitness value using the
following equation.
N
1jj
ii
fit
fitP (3.21)
where iti is the fitness of the solution xi. After all onlookers are distributed to
the sources, sources are checked whether they are to be abandoned. If the
49
number of cycles that a source cannot be improved is greater than a
predetermined limit, the source is considered to be exhausted.
The employed bee associated with the exhausted source becomes a
scout and makes a random search in problem domain by the following
equation.
rand*)xx(xx minj
maxj
minjij
(3.22)
The pseudocode of the proposed method is given as:
Bee Colony based RSAR(BeeRSAR) Algorithm
ROUGHBEE (C,D)
C, the set of all conditional features;
D, the set of decision features.
i) Select the initial parameter values for BCO
ii) Initialize the population (xi)
iii) Calculate the objective and fitness value
iv) Find the optimum feature subset as global.
v) do
a. Produce new feature subset (vi)
b. Apply the greedy selection between xi and vi
c. Calculate the fitness and probability values
d. Produce the solutions for onlookers
e. Apply the greedy selection for onlookers
f. Determine the abandoned solution and scouts
50
g. Calculate the cycle best feature subset
h. Memorize the best optimum feature subset
vi) repeat for maximum number of cycles
The following parameters are used in the proposed method :
The population size (number of bees) : 10
The dimension of the population : N
Lower bound : 1
Upper bound : N
Maximum number of iterations : 1000
The number of runs : 3
The computational complexity of the Bee Colony based RSAR algorithm is
calculated as (n2 m log n), with ‘n’ number of bees and ‘m’ number of
features. This is the complexity for the worst-case situation, where the
algorithm guarantees the near optimal solution.
3.6 PROPOSED BEE COLONY BASED INDEPENDENT
QUICKREDUCT (BeeIQR)
As an extension of the previous approach, a novel Rough set
approach is proposed in this work, to find the reducts and to reduce the
computational complexity and also acquire the most accurate feature subset.
In this proposed method, initially the instances are grouped based on the
decision attribute, then the reduct is found for each classes. The common
attributes from all these reduct sets are grouped to form the core reduct and
the remaining attributes are considered for further reduction. From each set of
51
reducts, the BCO algorithm based RSAR model is applied to receive the final
reduct.
The problem of finding a minimal reduct of an information system
has been the subject of much research. The most basic solution to locating
such a reduct is to simply generate all possible reducts and choose any with
minimal cardinality. Obviously, this is an expensive solution to the problem
and is only practical for very simple datasets. Most of the time only one
minimal reduct is required, so all the calculations involved in discovering the
rest are pointless. To improve the performance of the above method, an
element of pruning can be introduced. By noting the cardinality of any pre-
discovered reducts, the current possible reduct can be ignored if it contains
more elements. However, a better approach is needed - one that will avoid
wasted computational effort. An intuitive understanding of QuickReduct
implies that, for a dimensionality of n, (n2+n)/2 evaluations of the dependency
function may be performed for the worst-case dataset. According to the
QuickReduct algorithm, the dependency of each attribute is calculated and the
best candidate chosen. The next best feature is added until the dependency of
the reduct candidate equals the consistency of the dataset (1 if the dataset is
consistent). This process, however, is not guaranteed to find a minimal reduct.
Using the dependency function to discriminate between candidates may lead
the search down a non-minimal path. It is impossible to predict which
combinations of attributes will lead to an optimal reduct based on changes in
dependency with the addition or deletion of single attributes. It does result in
a close-to-minimal reduct, though which is still useful in greatly reducing
dataset dimensionality. Figure 3.2 shows the steps involved in the proposed
IQRBee algorithm.
52
Figure 3.2 IQRBee Algorithm
Normally all the reduct algorithms start with an empty set and add
in turn, one at a time which requires greater computations. Here, the
computation time is reduced as follows: Initially the feature space is clustered
based on decision attributes and then the reduct is found for each cluster. For
example if there are M number of feature rows, NC number of conditional
attributes and ND number of decision attributes, the feature rows are clustered
based on decision attributes at first. For each cluster the reduct is received as
Ri, where i=1,2,…,ND. From this set of reducts, the most common attributes
are taken out as core reduct (Rc). Then the ABC algorithm is applied to select
the random number of features from each cluster (Ri), to find the optimum
feature subset.
After choosing Rc, with the remaining attributes at each Ri, the
employed bee produces the feature subset in random. Consider a domain
which contains ND number of unique decision values, then the same number
of bees (p) are chosen as the population size. From this population, half of the
bees are considered as employed bees and the remaining are considered as
Set of all conditional anddecision features
Cluster the Domain and findthe reduct for each class
Construct the Core Reductand Reduct Sets
Construct the Populationbased on these Reducts
Apply BCO Algorithm
53
onlooker bees. For each employed bee, a random subset from one reduct set is
assigned. The random sets assigned to all the bees are combined to form the
feature subset. For example, Consider a database containing 10 number of
conditional attributes (c1,c2,…,c10) and 3 number of decision attributes with
500 records. Initially the records are clustered into 3 groups based on the
decision attribute and then the reduct is applied for each group. For example,
consider that the reducts obtained are,
R1 = { c1,c3,c4,c9 }
R2 = { c3,c4,c8 }
R3 = { c3,c4,c6,c7,c10 }.
From these reducts, the common attributes are chosen as the core
reduct. In this example, Rc = {c3,c4}. These attributes are removed from each
reduct.
R1 = { c1,c9 }; R2 = { c8 }; R3 = { c6,c7,c10 }
In the next step, these three bees are employed to construct a reduct
by selecting random subsets from these reducts and combined with the core to
find the optimum one. For example,
Rc + Bee1 = { c1 } + Bee2 = { c8 } + Bee 3 = { c6,c10 }
{ c3,c4,c1,c8,c6,c10 }
This reduct is evaluated using BCO. The pseudocode of the
proposed method is given as:
54
Bee Colony based Independent QuickReduct Algorithm
IQRBEE (C,D)
C, the set of all conditional features; D, the set of decision features.
i) Cluster the domain and Find the reduct for each class
ii) Construct the core reduct and reduct sets
iii) Select the initial parameter values for ABC
iv) Initialize the population (xi)
v) Calculate the objective and fitness value
vi) Find the optimum feature subset as global.
vii) do
a. Produce new feature subset (vi)
b. Apply the greedy selection between xi and vi
c. Calculate the fitness and probability values
d. Produce the solutions for onlookers
e. Apply the greedy selection for onlookers
f. Determine the abandoned solution and scouts
g. Calculate the cycle best feature subset
h. Memorize the best optimum feature subset
viii) Repeat for maximum number of cycles
The following parameters are used in the proposed method :
The population size (number of bees) : p(number of Classes)
The dimension of the population : p×N
Lower bound : 1
Upper bound : N
Maximum number of iterations : 1000
The number of runs : 3
55
Here the reducts are found for clusters based on the decision attributes, thus
the computational complexity can be reduced from (n2 m log n) to (1/nc n2 m
log n), where ‘nc’ is the number of clusters, when the number of cluster
increases it is easy to find the reduct by using the improved BeeRSAR.
3.7 PROPOSED WEIGHTED BEE COLONY BASED RSAR
(WBeeRSAR)
Another limitation is, all the feature selection algorithms start
constructing the feature subset and evaluate the performance. The feature
subset construction is performed without considering the relevance of the
attribute. Here the feature reduction is proposed along with weights of each
attribute. Initially the Information gain (Han and Kamber, 2001) is calculated
for each attribute and maintained as its weight. The indiscernibility relation
multiplied with information gain value is calculated for each feature subset as
objective value ( i). Then the bees are allowed to select the feature subsets at
random and calculate their fitness and find the best one at each iteration.
Further the computational complexity can be reduced from ((1/nc) n2 m log n)
to ((1/g)(1/nc) n2 m log n), where ‘g’ is the information gain of the dataset.
3.8 PERFORMANCE ANALYSIS
The performance of the proposed approaches discussed in this
chapter has been tested with ten different medical datasets (Appendix A),
downloaded from the UCI machine learning data repository. Once the values
are predicted for missing attributes, then the reduced feature set is received
from two novel methods based on Rough set theory; Rough Set Theory hybrid
with Bee Colony Optimization (BeeRSAR), Weighted Bee Colony based
RSAR (WBeeRSAR) and Bee Colony based Independent QuickReduct
(BeeIQR). Table 3.1 shows the reduct results of the methods on the 10
56
different medical datasets discussed. It shows the size of the reduct found for
each method.
Table 3.1 Reducts found for the DatasetsDatasets Features RSAR EBR Ant
RSARGenRSAR
PSO-RSAR
BeeRSAR
WBeeRSAR
Dermatology 34 10 10 8-9 10-11 7-8 7 7Cleveland Heart 13 7 7 6-7 6-7 6-7 6 6HIV 21 13 13 10-11 11-13 9-10 8 8Lung Cancer 56 4 4 4 6-7 4 4 4Wisconsin 09 5 5 5 5 4-5 4 4Echocardiogram 12 8 8 6-7 7-8 6-7 5 5Primary Tumor 17 12 12 10-11 10-12 10-11 10 10
Arrhythmia 279 212 205 162-175
165-180
160-170
155-170
154-169
SPECTF Heart 44 12 11 9-10 9-11 8-10 7-9 7-9Cardiotocography 23 16 16 14-15 14-16 13-15 11 11-12
Table 3.2 shows the reducts received from the proposed method for
each dataset. The underlined attributes in the final reduct are the wavers, that
is, in some iteration they occur in the reduct and in some other iteration they
do not. Figure 3.1 shows the comparison on feature reduction between the
proposed and the existing methods.
Table 3.2 Attributes in the Reduct from IQRBee
IQRBeeDatasets Features Core Reduct Final Reduct No.of
AttributesDermatology 34 { 1 } { 1,3,5,8,15,24,33 } 6-7Cleveland Heart 13 { 1 } {1,5,7,8,9} 5HIV 21 { 1,3,9,12,15 } { 1,3,5,7,9,12,15 } 6-7Lung Cancer 56 { 1,4 } { 1,4,8,14 } 4Wisconsin 09 { 1,8 } { 1,4,6,8 } 4Echocardiogram 12 { 3, 9 } {3, 7, 9, 11 } 5
Primary Tumor 17 { 1, 6, 7, 10 } { 1, 5, 6, 7, 10, 11, 15,16, 17 } 9
Cardiotocography 23 { 2, 4, 7, 8, 20 } { 2, 4, 5, 7, 8, 15, 17,19, 20 } 9
SPECTF Heart 44 { 2, 22, 25, 30 } { 2, 15, 22, 25, 30, 35 } 6Arrhythmia 279 150-160
57
Figure 3.3 Reducts found for each Dataset
As illustrated in the results and in the figure the proposed methods
finds the better reducts than the other approaches. All the bee colony based
methods can reduce the feature set to 1/3 ratio. In Dermatology for the total
set of 34 features has been reduced to 6 which is in the ratio of 1/6, for
Cleveland Heart 13 features are reduced to 5 of 1/3 ratio, for HIV dataset 21
features are 7 as 1/3 ratio, for Lung Cancer dataset 56 features are reduced to
4 as 1/13 ratio, for Wisconsin dataset 9 features are reduced to 4 as 1/2 ratio,
for Echocardiogram dataset 12 features are reduced to 5 as 1/2 ratio, for
Primary Tumor, the 17 features are reduced to 9 features as1/2 ratio and for
Cardiotocography, the 23 features are reduced to 9 as 1/2 ratio, for
Arrhythmia dataset the 279 features are reduced to 150 and for SPECTF
HEART 44 features are reduced to 6 features as 1/7 ratio. A proposed genetic
based kNN classifier named as GkNN classifier is employed to analyze the
classification performance.
58
Table 3.3 Comparison of Reducts based on Run Time Complexity(Seconds)
Runtime (Seconds)Datasets
Features # RSAR EB
RAntRSAR
GenRSAR
PSO-RSAR
BeeRSAR
WBeeRSAR
IQRBee
Dermatology 34 72 85 97 120 99 107 67 57
Cleveland Heart 13 48 61 73 96 75 83 43 33
HIV 21 62 75 87 110 89 97 57 47
Lung Cancer 56 125 138 150 173 152 160 120 110
Wisconsin 09 35 48 60 83 62 70 30 20
Echocardiogram 12 45 58 70 93 72 80 40 30
Primary Tumor 17 52 65 77 100 79 87 47 37
Arrhythmia 279 155 173 201 235 205 220 190 120
SPECTF Heart 44 50 62 74 82 75 85 60 42
Cardiotocography 23 70 83 95 118 97 105 65 55
Table 3.3 compares the run time complexity between each reduct
methods. As shown in the table, the traditional RSAR and EBR methods find
the reduct faster than the swarm based methods Ant, Genetic and PSO. Also
the BeeRSAR method is still slower when compared to RSAR and EBR
methods. But the modified versions of BCO such as WBeeRSAR and IQRBee
overcome this limitation by achieving lower time complexity.
Table 3.4 shows the comparison of classification accuracy of
proposed approach with the existing methods. It is clearly shown that the
reducts from IQRBee reaches greater accuracy than the other methods.
59
Table 3.4 Classification (%) Performance of Reducts
Dataset DermatologyClevelandHeart
HIVLungCancer
Wisconsin
IQRBee 92.36 ± 0.2286.54 ±0.36
86.29 ± 0.1883.03 ±0.18
88.70 ± 0.35
WBeeRSAR90.65 ± 0.3485.23 ±0.81
86.03 ± 0.3282.88 ±0.12
86.44 ± 0.46
BeeRSAR 91.70 ± 0.7484.70 ±0.74
85.70 ± 0.7482.37 ±0.39
84.70 ± 0.74
PSORSAR 88.89 ± 0.5284.89 ±0.52
84.89 ± 0.5279.63 ±0.31
84.89 ± 0.52
AntRSAR 85.32 ± 0.3486.32 ±0.34
85.32 ± 0.3479.53 ±0.37
85.32 ± 0.34
GenRSAR 86.39 ± 0.4285.39 ±0.42
84.39 ± 0.4278.89 ±0.71
86.39 ± 0.42
EBR 78.89 ± 0.2179.71 ±0.17
77.76 ± 0.7977.63 ±0.28
81.12 ± 0.18
RSAR 76.03 ± 0.2777.07 ±0.31
75.07 ± 0.5477.95 ±0.14
78.60 ± 0.26
Dataset EchocardiogramPrimaryTumor
ArrhythmiaSPECTFHEART
Cardiotocography
IQRBee 91.25 ± 0.1185.45 ±0.63
88.93 ± 0.8184.30 ±0.81
89.70 ± 0.53
WBeeRSAR89.54 ± 0.1484.32 ±0.18
87.35 ± 0.2383.88 ±0.21
88.44 ± 0.64
BeeRSAR 88.70 ± 0.4383.07 ±0.47
85.50 ± 0.4782.72 ±0.93
87.70 ± 0.47
PSORSAR 87.19 ± 0.2582.29 ±0.25
84.93 ± 0.2580.23 ±0.13
85.89 ± 0.25
AntRSAR 85.23 ± 0.4582.23 ±0.43
84.24 ± 0.4380.43 ±0.73
85.32 ± 0.43
GenRSAR 85.93 ± 0.2481.93 ±0.24
82.93 ± 0.2479.69 ±0.17
84.39 ± 0.24
EBR 78.78 ± 0.1278.17 ±0.71
78.62 ± 0.9777.53 ±0.82
81.12 ± 0.81
RSAR 76.30 ± 0.7276.70 ±0.13
76.70 ± 0.4576.45 ±0.41
79.60 ± 0.62
60
In the feature subset selection step, the QuickReduct and EBR
methods produced the same reduct every time, unlike GenRSAR, AntRSAR,
PSORSAR and BeeRSAR which found different reducts and sometimes
different reduct cardinalities. On the whole, it appears to be the case that
BeeRSAR and BeeIQR outperform the other methods. Compared to the other
methods, BeeRSAR consumes more time to find the reduct. BeeIQR resolves
this issue by finding the optimum reduct in minimal time. As it is illustrated
in the results, the proposed IQRBee method comes out with a very minimal
reduct than the others which shows its superior performance with the greater
accuracy of 92.3% on Dermatology, 86.5% on Cleveland Heart, 86.3% on
HIV, 83% on Lung Cancer and 88.7% on Wisconsin Breast Cancer Database.
Next to IQRBee, BeeRSAR achieves the greater accuracy. The other
optimization algorithms GenRSAR, AntRSAR and PSORSAR are in the next
level with classification accuracies between 85-89%. The EBR and the
standard rough set algorithms are in the lowest level, with classification
accuracies of 75-80%.
Based on the performance and results of the proposed Bee Colony
Optimization method for feature Selection developed and reported in this
Thesis, a paper entitled ”A Novel Rough Set Reduct Algorithm for Medical
Domain based on Bee Colony Optimization” is published in the Journal of
Computing, Vol.2, Issue 6, June 2010, pp. 49-54.
The proposed Independent Rough Set approach hybrid with Bee
Colony Optimization for feature Selection is analyzed and based on the
conclusion of the analysis, a paper entitled ”An Independent Rough Set
Approach Hybrid with Artificial Bee Colony Algorithm for Dimensionality
Reduction” is published in the American Journal of Applied Sciences, Vol.8,
Issue 3, March 2011, pp. 261-266.
61
Based on the Weighted Bee Colony based Reduct approach
proposed in this Thesis, a paper entitled, ”A Weighted Bee Colony
Optimization hybrid with Rough Set Reduct Algorithm for Feature Selection
in the Medical Domain”, International Journal of Granular Computing,
Rough Sets and Intelligent Systems, Vol. 2, Issue 2, pp. 123 – 140, 2011.
3.9 CONCLUSION
Feature Selection is a main research direction of rough set
application. However, this technique often fails to find better reducts. This
work demonstrates the fundamental concepts of rough set theory and explains
two basic reducts namely QuickReduct and Entropy-Based Reduct. These
methods can produce close to the minimal set, not optimal. The swarm
intelligence methods have been used to guide this method to find the minimal
reduct sets. Here, three different computation intelligence based reducts have
been discussed; GenRSAR, AntRSAR and PSO-RSAR. Though these
methods perform well, there is no consistency since they are dealing with
more random parameters. In this work, a Bee Colony Optimization algorithm
hybrid with Rough set theory has been proposed to find minimal reducts
which does not require random parameter assumption. As an extension, a
novel approach of Rough Set-based Attribute Reduction is proposed for
feature selection to receive a more accurate reduct. Initially the instances are
grouped based on the class attribute. Then the reduct is found for each group.
The intersection operation is performed to select the common attributes from
all these reducts to generate the core reduct. With the remaining attributes, the
BCO algorithm based RSAR model is applied to receive the final reduct.
Experiments are carried out on five different datasets from UCI machine
learning repository. The performance of the reducts is analyzed with GKNN
classifier and compared with six different algorithms. The results show that
the proposed BeeRSAR method achieved a maximum accuracy of 91%, the
62
WBeeRSAR attains a maximum accuracy of 89% and IQRBee achieved a
maximum accuracy of 92% which outpaces the other existing methods.
Top Related