Fuzzy rule base learning through simulated annealing

11
ELSEVIER Fuzzy Sets and Systems 105 0999) 353-363 FUZZY sets and systems Fuzzy rule base learning through simulated annealing Francois Gu61y a'b, R~my La a'b, Patrick Siarry a'* , Ecole Centrale de Paris (E.C.P.), Laboratoire d'Electronique et de Physique Appliqu~e (E.P.A.P.), Grande Vole des ['\ignes. 92290 Chgttenctv-Malabry, France b Schneider Electric, Direction des Recherches, 33 bis av. du Mar~chal Joffre, BP 204, 92002 Nanterre, France Received June 1995; received in revised form July 1997 Abstract We study the use of simulated annealing to optimize the membership functions of Takagi-Sugeno rules. The necessary adaptation of simulated annealing in order to be efficient for this problem is discussed in detail. The convergence is carefully studied for the test application of the approximation of an analytical function specially built to test the efficiency of the algorithm. The obtained results are compared with gradient descent optimization results. We point out that simulated annealing is particularly interesting in the case (usual in practical implementations) when there are few rules compared to the complexity of the problem. © 1999 Elsevier Science B.V. All rights reserved. Keywords: Fuzzy rule learning; Simulated annealing; Gradient descent; Optimization; Takagi-Sugeno rules 1. Introduction In 1979, Procyk and Mamdani [13] created the first fuzzy rule base learning method. Since then, several scientific and industrial works pointed out difficulties concerning the use of fuzzy logic. These difficulties appear at three steps of the fuzzy rule base's life: - initial design (lack of methodology for develop- ing fuzzy rule bases), - life of the controlled process (difficulty of setting parameters, modifying rules), - portability from one system to another (lack of portability). *Corresponding author. Tel.: (33-1) 41-13-13-05; fax: (33-1) 41- 13-14-37: e-mail: [email protected]. The third step has drawn less attention than the two first ones in the fuzzy community, but we believe it will become a major issue in the future. Making re-usable knowledge modules can be com- pared to modular programmation, and would be useful for a system development activity. It is a feature indispensable to enable transmission of knowledge. Rule base learning, which is quickly developing today, provides approaches giving the following answers to each of these three steps: Initial design: (I) minimize the complexity of fuzzy rule bases (for instance the number of inputs, rules, member- ship functions), (2) automatically identify a system or the behav- iour of an operator, 0165-0114/99/$ - see front matter c~2, 1999 Elsevier Science B.V. All rights reserved. PII: $0165-01 14(97)00260-1

Transcript of Fuzzy rule base learning through simulated annealing

Page 1: Fuzzy rule base learning through simulated annealing

E L S E V I E R Fuzzy Sets and Systems 105 0999) 353-363

FUZZY sets and systems

Fuzzy rule base learning through simulated annealing

F r a n c o i s G u 6 1 y a'b, R ~ m y L a a'b, P a t r i c k S i a r r y a '*

, Ecole Centrale de Paris (E.C.P.), Laboratoire d'Electronique et de Physique Appliqu~e (E.P.A.P.), Grande Vole des ['\ignes. 92290 Chgttenctv-Malabry, France

b Schneider Electric, Direction des Recherches, 33 bis av. du Mar~chal Joffre, BP 204, 92002 Nanterre, France

Received June 1995; received in revised form July 1997

A b s t r a c t

We study the use of simulated annealing to optimize the membership functions of Takagi-Sugeno rules. The necessary adaptation of simulated annealing in order to be efficient for this problem is discussed in detail. The convergence is carefully studied for the test application of the approximation of an analytical function specially built to test the efficiency of the algorithm. The obtained results are compared with gradient descent optimization results. We point out that simulated annealing is particularly interesting in the case (usual in practical implementations) when there are few rules compared to the complexity of the problem. © 1999 Elsevier Science B.V. All rights reserved.

Keywords: Fuzzy rule learning; Simulated annealing; Gradient descent; Optimization; Takagi-Sugeno rules

1. I n t r o d u c t i o n

In 1979, Procyk and Mamdani [13] created the first fuzzy rule base learning method. Since then, several scientific and industrial works pointed out difficulties concerning the use of fuzzy logic. These difficulties appear at three steps of the fuzzy rule base's life: - initial design (lack of methodology for develop-

ing fuzzy rule bases), - life of the controlled process (difficulty of setting

parameters, modifying rules), - portability from one system to another (lack of

portability).

*Corresponding author. Tel.: (33-1) 41-13-13-05; fax: (33-1) 41 - 13-14-37: e-mail: [email protected].

The third step has drawn less attention than the two first ones in the fuzzy community, but we believe it will become a major issue in the future. Making re-usable knowledge modules can be com- pared to modular programmation, and would be useful for a system development activity. It is a feature indispensable to enable transmission of knowledge.

Rule base learning, which is quickly developing today, provides approaches giving the following answers to each of these three steps:

Initial design: (I) minimize the complexity of fuzzy rule bases

(for instance the number of inputs, rules, member- ship functions),

(2) automatically identify a system or the behav- iour of an operator,

0165-0114/99/$ - see front matter c~2, 1999 Elsevier Science B.V. All rights reserved. PII: $0165-01 14(97)00260-1

Page 2: Fuzzy rule base learning through simulated annealing

354 F. Gully et at. / Fuzzy Sets and Systems 105 (1999) 353-363

(3) develop fuzzy rule bases more systematically by automatically optimizing manually defined fuzzy rule bases. Life of the controlled process:

(4) build systems capable of learning, for in- stance adaptive control systems. Portability, generalization to other processes:

(5) automatically tune or optimize the member- ship functions for processes having the same quali- tative behaviour.

Point (3) is considered essential by many users of fuzzy logic. There exists for the moment no method to ensure that an optimal performance has been attained with the chosen knowledge base structure. Intuition remains the usual tool to tune member- ship functions. Point (4) can be due to nontime invariant production systems, but can also be due to the desired flexibility of production systems (changes of product for instance).

An operational learning method should if possible: - be general (not linked to a given type of applica-

tion, or to the type of rule base and membership functions used),

- be scientifically proven (have well admitted or demonstrated properties). We believe these two characteristics to be key to

the future success of rule base learning, especially in the industry.

It is possible to optimize several types of fuzzy systems (Fig. 1) like fuzzy decision trees, relational equations, cognitive maps as well as rule bases. Here we are interested in fuzzy rule bases, because it currently is the most widely used tool of fuzzy logic, and enables to implement expert or operator knowledge.

Learning methods can modify the inference method, the rules ( s t r u c t u r a l learning) or the membership functions ( p a r a m e t r i c learning). For Mamdani rule bases let us define generally used types of rules and membership functions (in grow- ing order of complexity):

Rules: - F i l l e d table: Written in textual form, rule predi-

cates only contain the AND operator. Each rule

fuzzy systems le*rning~

~ ~ ffuzzyr~lational') (fuzzycTognitive'~ krules ~ e c l s l o n t r e e ~ Lequations J Lmaps j

(.IearningJ Lmethoas al ~methods J

Fig, 1. Typology of learning methods.

uses all rule base inputs. Rules can be written in a table and fill all the table.

- S p a r s e table: Rules do not fill all the table. - F r e e s t y l e : Rules are generally written in textual

form, and predicates comprise any combination of inputs through AND, OR and NOT logical operators

Membership functions: - F i x e d p a r t i t i o n s : Partitions, each membership

function having the same s ize ,

- F r e e p a r t i t i o n s : Membership functions partition the variable,

- F r e e s t y l e : Membership functions are not neces- sarily partitions,

- M u l t i d i m e n s i o n a l : Membership functions are defined over several dimensions. For instance c o l o u r y e l l o w can be defined in the three-dimen- sional space (red, green, blue). In this case they do not create partitions.

We decided to focus on the issue of membership function learning, and to use as general member- ship functions as possible (but defined on one di- mension), that do not form partitions. Partitions are very practical, but are difficult to generalize for membership functions defined over several dimen- sions.

We chose to work with Takagi-Sugeno fuzzy rule bases with constant output functions. These rules are interesting as they can be considered either as simplified Sugeno rules (with constant output functions) or as simplified Mamdani rules

Page 3: Fuzzy rule base learning through simulated annealing

F, Gukly et al. / Fuzzy Sets and Systems 105 (1999) 353-363 355

(with nonfuzzy outputs). They are also the most used in industry because of their simplicity. The methods which have been used until now to optimize them can be divided into (Fig. 1): speci- fic methods (for instance identification [17]), methods coming from neural networks, and optimization methods.

Among optimization based methods, local meth- ods like 9radient descent [4-6, 8, 12] and 91obal methods like simulated annealing [7, 10], and gen- etic algorithms [9, 15] have been tested. Local op- timization methods converge to a local optimum of the cost function, while global optimization methods aim at finding the best global optimum.

Compared to 9radient descent, simulated an- nealing has two essential advantages: it is able (at least in theory) to find the global minimum of the function to optimize, and it can handle any cost function. On the contrary, it cannot be used to realize a real-time learning method (unlike gra- dient descent), but its generality makes it very interesting.

Simulated annealing convergence has been dem- onstrated under rather general assumptions [1], while in the case of genetic algorithms, it is depen- dent on the way of coding parameters into genes (successful templates need to be short for conver- gence to occur).

Isaka et al. [7] have applied simulated annealing to tune the parameters of a fuzzy controller used for the regulation of blood pressure during surgery. Their work showed in this particular case an inter- esting use of simulated annealing to optimize Mamdani type fuzzy rule bases, and this encour- aged us to undertake a detailed study.

Many studies [4-6, 8, 12] have been carried out concerning gradient parametric learning of Takagi-Sugeno rules, Here, we study the paramet- ric learning of these rules by simulated annealing. The approximation of an analytical one input func- tion difficult to approximate, and comparison with gradient descent are proposed to analyze simulated annealing.

This paper is divided into three parts. The type of the rules which are used here, and the adopted methodology are presented in the first part. In the second part, we shortly recall the principle of simulated annealing, and we indicate our choices to

adapt this method to the given problem. The third part presents the test results.

2. O p t i m i z a t i o n p r o b l e m o f T a k a g i - S u g e n o

rule b a s e s

Unlike for Mamdani rules, Takagi-Sugeno rule outputs are not fuzzy [17]. The output of each rule is generally an affine function of the inputs of the rule. The implementation of Takagi-Sugeno rules means less calculus, than with Mamdani rules. Nevertheless, chained inferences are not generally realized with Takagi-Sugeno rule bases. These rule bases are essentially used to realize fuzzy control- lers and to model processes.

In this paper, we consider a rule base of n Takagi-Sugeno rules with constant outputs. Each rule has the following characteristics:

Ri: i f (x l is f i l and ... x j is ~ j and ... x,, is .Jim)

then yi ~ wi

where - m is the number of inputs, - X = (xl . . . . . xj . . . . . x,,) is the rule base inputs

vector, - J l j is the membership function of the ith rule

defined on jth input, - wi is the constant output of ith rule.

The membership functions are triangular sym- metric (see Fig. 2) The output y of the rule base is

/J

> X.

J

Fig. 2. Symmetric triangular membership function.

Page 4: Fuzzy rule base learning through simulated annealing

356 F. Gubly et al. / F u z ~ Sets and Systems 105 (1999) 353-363

given by:

2 7 = 1 ~i Wi Y ~ n

Yi=x ~i

where N Y,j(xjl if ¢ O. j = l i=1

The supervised learning of the rule base is real- ized from a set of learning samples. We define the quadratic error by

psamp

E = y , ( y , - y ; ) 2 p = l

where psamp is the number of learning samples, (Xp, Y'v) is the pth learning sample and y, is the output of the rule base, corresponding to the input vector X, .

The optimization problem consists in tuning the parameters wl, alj and bij in order to minimize E. Optimizing of the membership functions parameters is a complex problem for the following reasons: - the cost function is not derivable everywhere, - the cost function is not continuous everywhere

when membership functions do not overlap, - parameters to optimize are numerous (typically

20-100). The complexity of the problem could be made

lower through: - using membership functions defining a partition

of the parameters space, - optimizing only output functions, - using constraints in order to ensure overlapping

of membership functions. We chose not to use these features in order to be

able to use as many forms of expertise as possible. We think our approach can be generalized to multi- dimensional membership functions, for which defin- ing partitions or even constraints between member- ship functions does not make sense most of the time.

3. U s e o f the s i m u l a t e d annea l ing m e t h o d for

l earning

3.1. Brief presentation of simulated annealing

The simulated annealing method, separately de- scribed by Kirkpatrick et al. [10] and Cerny [2], is

based on an analogy between the optimization of a complex system and the description of the behaviour of a physical system, composed of numerous particles in interaction. The aim in minimizing a cost function having many degrees of freedom can be compared to the search of a minimum energy state for a set of particles. Thus, the cost function of the optimization problem is similar to an energy. To lead a physical system to a low-energy state, the annealing method is used. It consists in slowly reducing the temper- ature to avoid to freeze some structures into local minima energy states. Some algorithms simulate this cooling process. Their transposition to an optimization problem can be done by in- troducing a control parameter similar to the tem- perature.

Then, the simulated annealing algorithm consists in making the physical system evolve from an ini- tial configuration (arbitrarily or expertly chosen) through elementary transformations which modify its energy. The sequence of these transformations is made to simulate the evolution of a physical system towards its thermodynamic equilibrium, for a given temperature. The progressive reduction of the tem- perature then leads to a global minimum of the cost function.

The steps of simulated annealing are briefly de- scribed below. The cost function to optimize is called E.

(1) Generate an initial configuration X and choose an initial temperature T.

(2) Generate a configuration Y next to X. (3) If E(Y) < E(X) then the "move" is accepted

(replace X by Y and go to (5)). (4) Accept the move with a probability p =

exp( - rE(Y) - E(X)]/T). If accepted, replace X by Y

(5) If the thermodynamic equilibrium at T tem- perature is not reached, go to (2).

(6) If the system is not frozen, reduce temper- ature T (new stage) and go to (2).

(7) Stop: the result is the best configuration encountered.

A detailed presentation of the theory and ap- plications of simulated annealing is proposed in [14]. The conditions of convergence to a global optimum are studied in [1].

Page 5: Fuzzy rule base learning through simulated annealing

F. Gukly et al. / Fuzzy Sets and Systems 105 (1999) 353-363 357

3.2. Practical implementation of simulated annealin9

3.2.1. "Annealing progress" The annealing program aims at controlling the

temperature. It consists in setting the four elements studied below.

3.2.1.1. Initial temperature. Simulated annealing accepts an energy rise of a quantity h with a prob- ability e x p ( - h/T), T being the temperature of the current stage. The initial temperature To is deter- mined according to the probability p - chosen by the user - of the energy rises during the first stage. The mean value hmean of the energy rises during the first stage is estimated by a few (typically 100) random moves. To is then calculated through the following formula:

p = e x p ( - h . . . . /To).

The tuning of parameter p enables one to apply strategies like "starting with a high temperature" (p near to 1) and "starting with a low temperature" (p near to 0). Starting with a low temperature is appropriate if we have an initial knowledge that we do not want to alter too much, or if CPU time is limited.

3.2.1.2. Temperature decrease law. Usually, geo- metric series Tik + 1) = )~T(k) are used to perform the temperature decrease (k: temperature stage num- ber). But taking a constant coefficient 2 either leads to an excessive calculation time or to the risk of evolving to a local minimum of E. So we use a coefficient 2(k~ related to the annealing process progress, that is: •(kJ = max(J-rain, min(rtk~, 2max)), where rtg ~ is the ratio between the minimal energy and the mean energy during stage k. 2rain and 2max are constants with Zmln < 2m,x < 1 (typically, 2mi., = 0.8 and 2max = 0.95). At the beginning of the annealing, rlk ) is much lower than 2min, the temper- ature decreases regularly but not too rapidly. At the end of the annealing, the energy drops are low, rtkl is nearly equal to 1 so that 2~k) = 2 . . . . which forces the temperature to still decrease.

the two following conditions is met: the number of attempted moves during the current stage is greater than 100 times the number P of parameters modi- fied at each move, or the number of moves accepted during the current stage is greater than 12 times P.

3.2.1.4. Criterion on the end of the algorithm. Usu- ally, criteria enabling to stop optimization in an optimal way are used. Here, a large number of movements is made in order to let simulated annealing evolve more certainly until the end of optimization (calculation time is not saved).

3.2.2. Discretization Simulated annealing was first devised for combi-

natorial problems, but the parametric optimization of fuzzy rule bases is a continuous variables prob- lem. It is therefore necessary to "discretize" the parameters. We do it the following way: at initial- ization a maximal step maxstep is assigned to each parameter. At each move, each parameter is per- turbed with a step randomly chosen in the interval [ - maxstep, maxstep], according to a uniform law. During the temperature stage changes, each maxi- mal step is updated so that the acceptance rate of the moves leading to energy rises be kept, for each parameter, between two bounds (typically 0.4 and 0.6) [-14]. However, the updated maximal steps remain smaller than the initial maximal steps.

3.2.3. Problem partitioning Instead of modifying the N parameters of the

problem at each move, only P parameters are modified. Typically, P is comprised between 3 and N/3. The P parameters to modify at each move are randomly chosen so that all the parameters are updated the same number of times. Thanks to this problem partitioning method, the total number of objective function evaluations needed remains pro- portional to the number of parameters.

4. Experimental study

4.0. Learning method testing conditions

3.2.1.3. Temperature stage length. We adopted Kirkpatrick's criterion. A stage ends when one of

We evaluate the learning method by approxi- mating with a fuzzy rule base an analytical function

Page 6: Fuzzy rule base learning through simulated annealing

358 F. Gubly et al. / Fuzzy Sets and Systems 105 (1999) 353-363

specially built for that purpose. The learning samples are generated randomly in the interval on which the function is defined, with a uniform distri- bution law. At the end of the rule base optimiza- tion, the quadratic error E is calculated with a set of "testing samples", different from the learning sam- ples. The test samples are numerous enough and regularly distributed in the interval on which the analytical function is defined. The considered ana- lytical function presents some singularities in order to test the learning method in difficult conditions.

a high curvature zone around x = 0.85:

x e [ 0 , ½ ] : y°(x )= - 9x e + 3x + ½,

x ½]: y ° ( x ) = 2 x - } ,

x e [½, 1]: y°(x) = 0.03/(0.03 + (4.5x - 3.85)z).

4.1.3. Test samples We took 500 test samples, regularly spaced over

the definition interval [0, 1] of the analytical test function.

4.1. Description of the test conditions

4.1.1. Initial configuration: The aij are regularly spaced inside the variation

interval of the xj. The w~ are initialized to zero. Let T be the "overlapping rate" of the membership functions, that is the average number of simulta- neously active rules. The following formula ex- presses initial b~j parameters so that the initial overlapping rate is equal to a given value To:

b,j = (x max - x min) /'gO x 1/m J , \ ~ - ] '

where mi, xmaX] is the variation interval of CXj ' 3

input xj.

4.1.2. Analytical test function The results presented in Section 4.3 are obtained

with a one input analytical function (Fig. 3), which is the same as was used in [4]. It presents three singularities: a discontinuity of the derivative at x = ½, a discontinuity at x = ½, and a peak with

2/3

1/3

t i 0 1/3 1/2

Fig. 3. Analytical test function.

4.2. Adaptations of simulated annealing to the problem

The rule bases obtained during the first tests of simulated annealing presented characteristics which seemed unacceptable to us. In this section, we will suggest some hypotheses concerning the conditions of their appearance and present some solutions to obtain satisfactory results.

Some membership functions are "very wide", others are "very thin". They are very numerous in some intervals and scarce in others (Fig. 4). The width and the distribution of the membership func- tions do not seem to relate to the shape of the analytical function to approximate.

What can be the mechanism of the creation of the "peaks"? The following hypothesis seemed to us to be realistic. Let us suppose that, during opti- mization, one of the outputs wi is very far from its optimal value. The simulated annealing algorithm can either correct wl or reduce the influence of the rule by decreasing bij. In the latter case, the mem- bership function can become so "thin" that the associated rule is not active for any learning sample, wl then varies randomly, which makes any further widening of the membership function improbable, because it would cause a prohibitive energy rise. The creation of the "peak" is then irreversible.

Some tests showed that a proper modification of the maximal step of the b~j move lowers the prob- ability of creation of the "peaks", but does not eliminate them totally. To obtain better results, we made the variation step of bgj relative instead of absolute: it is expressed as a percentage of the value of bij. The tests show that the "peaks" do not

Page 7: Fuzzy rule base learning through simulated annealing

F. Gukly et al. /Fuzzv Sets and Systems 105 (1999) 353-363 359

"f iJ" /

1

"Pyramid . . . . Peak"

I I H xi "Cluster . . . . Hole"

Fig. 4. Examples of observed membership functions.

~ _ ~ analytical function rule base output learning samples

J: ' : i

Fig. 5. Typical result of optimization.

appear any more, which corroborates our hypo- thesis about their creation.

We think that the "pyramids" appear at the beginning of optimization, when most outputs w~ are still near to zero. The parameters b u of some membership functions would rise to compensate the low value of the w~ of other membership func- tions. Tests showed that the "pyramids" appear more rarely when we impose an initial step of the wi much greater than the one of the b u. Fig. 5 shows a typical result of learning.

Experiments showed that a great reduction of the maximal step of the a u eliminates the "clusters" and the "holes", but deteriorates the optimal solution. The problem is to find an initial maximal step appropriate to all the initial configurations. It is difficult to find a robust solution to this problem.

For the tests presented in Section 4.3, the maxi- mal step of the a u was chosen manually according to preliminary tests, and the maximal steps of the b u and w~ were chosen so that the "'peaks" and the "pyramids" do not appear.

Page 8: Fuzzy rule base learning through simulated annealing

360 F. Gukly et al. / F u z ~ Sets and Systems 105 (1999) 353-363

4.3. Test results

In all the tests presented below, three parameters were modified at each move. We set the following intervals of variation: wi e [0, 1], aij e ['0, 1] and bij e [0, 2].

In order to simplify the notations, "MS, = p%" and "MSa = p%" mean that the initial maximal steps of wi and of air are equal to p% of their respective intervals of variation. "MSb = p % " means that the current maximum step of b~j is equal to p% of the current value of b~j. We call E~ . . . . and E,~st the quadratic learning samples and test sam- ples errors.

Tests have been carried out varying successively the number of membership functions, the number of learning samples, the initial overlapping rate and the total number of moves. The typical result shown in Fig. 5 was obtained with MS, = 10%, MSa = 0.1%, MSh = 5%, 20 rules, 100 learning samples, 500 moves for each parameter and r0 = 2. We obtained El . . . . = 0.0007 and E t e s t = 0.5525.

In Fig. 5 we can remark that the discontinuity has been approximated by creating a very small overlapping between two membership functions.

4.3.1. Influence of the number of membership functions

Fig. 6 shows the influence of the number of membership functions for a constant number of learning samples. The plotting has been obtained with MS, = 10%, MSh = 5%, 100 learning sam- ples, 500 moves for each parameter and ~o = 2.

MSa = x%, where x is calculated through the following empirical formula: x = lOxlnterval/ NMV, N M V being the number of moves for each parameter, and Interval, the initial interval between the ai j .

For 5 membership functions, the initial maximal step calculated by the formula given above is not appropriate, which explains the bad result illus- trated by the quadratic error of the test. The rise of the quadratic error for the test samples tor more than 15 membership functions is due to the unsuffi- cient number of learning samples.

The main conclusion we draw here is that the ability to generalize knowledge learnt from samples is good when the number of membership functions (or rules) is small. Taking more rules enables to fit the learning samples properly, but not the curve we try to approximate! This result advocates the use in real problems of as few membership functions and rules as possible (common sense also tells to try to keep rule bases simple).

4.3.2. Influence of the number of learning samples Fig. 7 was obtained with different numbers of

learning samples, MSw = 10%, MSa = O. 1%, MSb = 5%, 20 rules, 10000 total moves (500 for each parameter) and ro = 2. The base has 20 rules, with 60 parameters.

The quadratic error for the test sample decreases with the number of learning samples until it reaches a limit. Best results are obtained with a number of samples at least equal to 2 times the number of parameters of the rule base. This result is also

10

1

0,1

0,01

0,001

0,0001

",,, [ - . . . . . learning sample . . . . . . "', test sample

I I I I "~I I

10 15 20 25 30 35 40

number of membership functions

E/psamp

1

0,1

0,01

0,001

0,0001

0,00001

0,000001

0,0000001

~ - ~ - - ~ ? . . . . learning sample test sample

I

100 200 300 400 500

number of learning samples

Fig. 6. Influence of the number of membership functions. Fig. 7. Influence of the number of learning samples.

Page 9: Fuzzy rule base learning through simulated annealing

F. Gukly et al. / Fuzzy Sets and Systems 105 (1999) 353-363 361

10

E/psamp

1

0.1

0.01

0.001

0.0001

0.00001

= ,

. . . . . - . . . . learning ? . . . . . . . sample L

- - test sample I

r ; ; : ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i i

2 3 4

initial overlapping rate

Fig. 8. Influence of the initial overlapping rate.

1

0,1

0,01

0,001

0,0001

500

[- - - - : - - - l e ~ n i n g s ~ p l e . "'2~ . . . . 7.. 4_ ~ , tes_tsamp! e

. . . . . . . 2 2 . , = ~ : - ~ . . . . . . . . . . .

I I I I I i

1000 1500 2000 2500 3000 3500 4000

number of moves per parameter

Fig. 9. Influence of the number of moves.

consistent with the necessity to have more learning samples than parameters (else some parameters are likely to be unconstrained), and shows that intro- ducing a lot more redundancy does not improve the results. Therefore, it seems that on a practical basis, the number of rules of the rule base (and thus of parameters to optimize) will be one of the main constraints for practical implementations, as for a lot of systems gathering sample data is costly and difficult. Once more using less membership func- tions will simplify practical real implementations by allowing to gather less sample data.

4.3.3. Influence of the initial overlapping rate We studied the influence of the total number of

moves on the quality of the optimization. Fig. 8 was obtained with MSw = 10%, MS, = 0.1%, MSb = 5%, 20 rules, 100 learning samples and ro = 2. This result is very interesting, as it shows that this initial overlapping rate has very limited impact on the optimization result (only too small overlapping rates creating risks of nonoverlapping zones yield worse results). This corroborates the global nature of the optimization carried out by simulated annealing.

4.3.4. Influence of the number of moves We studied the influence of the number of moves

(Fig. 9). We noticed that 1000 or even 500 moves per parameter are enough to reach the end of the optimization process of the fuzzy rule base. This number is consistent with other simulated anneal- ing literature results.

4.4. Comparison of gradient descent and simulated annealing

Gradient descent and simulated annealing were compared for the approximation of the one-input function given in Fig. 3, with the same learning samples.

Fig. 10 shows the obtained result when varying the number of membership functions (and rules). Good results were obtained with simulated anneal- ing from 7 membership functions, while only from 15 with gradient. With few membership functions, each membership function covers a zone for which the approximated function is highly nonlinear: the cost function probably has much more local min- ima. We believe it to be the main reason for which gradient does not perform well in this case.

Working with as little membership functions and rules as possible is an extremely important matter in practical implementations of fuzzy logic, as it enables: - to keep the rule base simple and understandable, - to limit the need to collect samples for learning.

In the case of the gradient, quadratic error did not increase when the number of membership func- tions was very high, with a constant number of learning samples. This robustness of gradient de- scent to small samples (smaller than the number of parameters to optimize) is due in our opinion to the fact that gradient is deterministic while simulated annealing is stochastic. In the absence of any im- pact of a given parameter on the cost function, gradient does not change the parameter, while simulated annealing changes it randomly.

Page 10: Fuzzy rule base learning through simulated annealing

362 F. Gubly et al. / Fuzzy Sets and Systems 105 (1999) 353-363

_ ~ - - - - - - ~ . Gradient I:--- ~!

Etest 1 . . . . . . . . . . ~ "? .... Simulate d Annealing -

J

0 , 1 . . . . . . . . . . . . . . . . . . . . . . . . ~ . . . . . . . . . I

0 10 20 30 40

number of membership functions

Fig. 10. Influence of the number of membership functions on gradient descent and simulated annealing.

4

3

Etest 2

1

0

Gradient ~_ . . . . . Simulated[

. - , , l . . . . .

t i

2 3

initial overlapping rate

Fig. 11. Influence of the initial overlapping rate on gradient descent and simulated annealing.

Simulated annealing also gave more steady results when the initial overlapping rate varies (Fig. 11). This shows the robustness of simulated annealing to the initial conditions, and the global nature of this optimization method.

Nevertheless, it is often desirable not to depart too much from a previously manually determined rule base, as this rule base expresses an expertise that should not be forgotten. With simulated an- nealing, it is indeed possible to tune the dependency to the initial rule base. For our simulations, we always used an initial movement acceptation prob- ability of 0.6. If we wish to obtain a result that is not too far from the initial rule base, a lower value of this parameter can be used. Gradient does not enable that kind of tuning.

5. Conclusion

The aim of this study was to test a learning method of fuzzy rule bases parameters. Simulated annealing is very general, since we do not need to calculate gradients, there are no limitations con- cerning the cost function, the shape of the member- ship functions, the fuzzy logic operators, etc.

The learning of the Takagi -Sugeno membership functions with simulated annealing is not easy to implement, but gives promising results. To obtain good results, it is necessary to master simulated annealing and to take into account the type of parametrization chosen for the fuzzy rules. In particular, a specific method to determine the dis- cretization steps was proposed for each type of parameter.

We obtained encouraging results for a difficult

one-input function. We think a next step should be to validate the approach with multi-input func- tions. The comparison with gradient shows that simulated annealing enables to obtain better results when the number of membership functions and rules is as small as possible to solve the problem at hand, a realistic case from an industrial point of view.

The validation of this learning method is in pro- gress on an industrial problem. Moreover, this method should be tested on standard Mamdani rules (with fuzzy outputs), which are the most cur- rently used ones in the industry.

We believe it possible to develop general optim- ization schemes adapted to rule bases optimization (provided that data necessary for optimization are available), regardless of the type of problem ad- dressed, and the size and the type of the rule base.

References

[1] E.H.L. Aarts, P.J.M. Van Laarhoven, Simulated Anneal- ing: Theory and Applications, Reidel, Dordrecht, 1987.

[2] V. Cerny, Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm, J. Optim. Theory Appl. 45 (1), (1985).

[3] A. Corana, M. Marchesi, C. Martini, S. Ridella, Minimiz- ing multimodal functions of continuous variables with the simulated annealing algorithm, ACM Trans. Math. Soft- ware 13 (1987) 262-280.

Page 11: Fuzzy rule base learning through simulated annealing

F. Gubly et al. /Fuzzv Sets and Systems 105 (1999) 353-363 363

[4] F. Guely, P. Siarry, Gradient descent method for optimiz- ing various fuzzy rule bases, Proc. 2nd IEEE Internat. Conf. on Fuzzy Systems FUZZ-IEEE'93, San Francisco CA, vol. 2, March 1993, pp. 1241-1246.

[5] F. Guely, P. Siarry, A centred formulation of Takagi-Sugeno rules for improved learning efficiency, Fuzzy Sets and Systems 62 (1994) 277-285.

[6] H. lchihashi, Iterative fuzzy modeling and a hierarchical network, in: Proc. IFSA '91 Bruxelles, Vol. Engineering, 1991, pp. 49-52.

[7] S. lsaka, A.V. Sebald, A. Karimi, N.T. Smith, M.L. Quinn, On the design and performance evaluation of adaptive fuzzy controllers, in: Proc. IEEE '88 Conf. on Decision and Control, Austin, TX, vol. 2, 1988, pp. 1068-1069.

[8] J.S.R. Jang, Fuzzy controller design without domain ex- perts, in: Proc. 1st IEEE Internat. Conf. on Fuzzy Systems FUZZ IEEE '92, San Diego, 1992, pp. 289-296.

[9] C.L. Karr, E.J. Gentry, Fuzzy control of pH using genetic algorithms, IEEE Trans. Fuzzy Systems (1) (February 1993).

[10] S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671-680.

r i l l M.A. Lee, H. Takagi, Integrated design stages of fuzzy systems using genetic algorithms, in: Proc. 2nd IEEE Internat. Conf. on Fuzzy Systems FUZZ-IEEE '93, San Francisco, March 1993, pp. 612-617.

[12] H. Nomura, I. Hayashi, N. Wakami, A sell-tuning method of fuzzy control by descent method, in: Proc. IFSA '91 Bruxelles, Vol. Engineering, 1991, pp. 155-158.

[13] T.J. Procyk, E.H. Mamdani, A linguistic self-organizing process controller, Automatica 15 t 1979j 15--30.

[14] P. Siarry, G. Dreyfus, La M6thode du Recuit Simule: Th6orie et Applications, 1.D.S.E.T. (E.S.P.C.I.). Paris, 1988.

[15] P. Siarry, F. Gu61y, A genetic algorithm for optimizing Takagi-Sugeno fuzzy rule bases, Fuzzy Sets and Systems (to be published).

[16] M. Sugeno, T. Takagi, Fuzzy identification of systems and its applications to modeling and control, IEEE Trans. Systems Man Cybernet. SMC-15(I)(January/February 1985) 116-132.

[17] T. Takagi, M. Sugeno, Derivation of fuzzy control rules from human operator's control actions, in: Proc. lnternat. Fuzzy Association Conf. IFAC '83.