Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning,...

29
Neuro-Fuzzy Modeling and Control JYH-SHING ROGER JANG, MEMBER, IEEE, AND CHUEN-TSAI SUN, MEMBER, IEEE Fundamental and advanced developments in neum-fuzzy syner- gisms for modeling and control are reviewed. The essential part of neuro-fuuy synergisms comes from a common framework called adaptive networks, which unifies both neural networks and fuzzy models. The fuuy models under the framework of adaptive net- works is called Adaptive-Network-based Fuzzy Inference System (ANFIS), which possess certain advantages over neural networks. We introduce the design methods for ANFIS in both modeling and control applications. Current problems and future directions for neuro-fuzzy approaches are also addressed. modeling. neuro-fuzzy control. ANFIS. Keywords- Fuuy logic, neural networks, fuzzy modeling, neuro-fuzzy I. INTRODUCTION In 1965, Zadeh published the first paper on a novel way of characterizing nonprobabilistic uncertainties, which he called “fuzzy sets” [116]. This year marks the 30th anniversary of fuzzy logic and fuzzy set theory, which has now evolved into a fruitful area containing various disciplines, such as calculus of fuzzy if-then rules, fuzzy graphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary in nature, includes automatic control, consumer electronics, signal processing, time-series prediction, information retrieval, database management, computer vision, data classification, decision-making, and so on. Recently, the resurgence of interest in the field of artificial neural networks has injected a new driving force into the “fuzzy” literature. The back-propagation learning rule, which drew little attention till its applications to artificial neural networks was discovered, is actually an universal learning paradigm for any smooth parameterized models, including fuzzy inference systems (or fuzzy models). As a result, a fuzzy inference system can now not only take linguistic information (linguistic rules) from human experts, but also adapt itself using numerical data (input/output pairs) to achieve better performance. This gives fuzzy Manuscript received March 30, 1994; revised November 28, 1994. This work was supported in part by NASA Grant NCC 2-275, MICRO Grant 92-180, EPRl Agreement RP 8010-34, and in part by the BISC program. J.4. Jang is with the Control and Simulation Group, The Mathworks, Inc., Natick, MA 01760 USA. C.-T. Sun is with the Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan. IEEE Log Number 940830 1. inference systems an edge over neural networks, which cannot take linguistic information directly. In this paper, we formalize the adaptive networks as a universal representation for any parameterized models. Under this common framework, we reexamine back- propagation algorithm and propose speedup schemes utilizing the least-squared method. We explain why neural networks and fuzzy inference systems are all special instances of adaptive networks when proper node functions are assigned, and all leaming schemes applicable to adaptive networks are also qualified methods for neural networks and fuzzy inference systems. When represented as an adaptive network, a fuzzy in- ference system is called adaptive networks-based fuzzy inference systems (ANFIS). For three of the most com- monly used fuzzy inference systems, the equivalent ANFIS can be derived directly. Moreover, the training of ANFIS follows the spirit of the minimum disturbance principle [lo91 and is thus more efficient than sigmoidal neural networks. Once a fuzzy inference system is equipped with learning capability, all the design methodologies for neural network controllers become directly applicable to fuzzy controllers. We briefly review these design techniques and give related references for further studies. The arrangement of this article is as follows. In Section 11, an in-depth introduction to the basic concepts of fuzzy sets, fuzzy reasoning, fuzzy if-then rules, and fuzzy inference systems are given. Section 111 is devoted to the formaliza- tion of adaptive networks and their leaming rules, where the back-propagation neural network and radial basis function network are included as special cases. Section IV explains the ANFIS architecture and demonstrates its wperiority over back-propagation neural networks. A number of design techniques for fuzzy and neural controllers is described in Section V. Section VI concludes this paper by pointing out current problems and future directions. 11. REASONING, AND FUZZY MODELS FUZZY SETS, FUZZY RULES, F U 7 . y This section provides a concise introduction to and a summary of the basic concepts central to the study of fuzzy sets. Detailed treatments of specific subjects can be found in the reference list. 378 0018-9219/95$04.00 0 1995 IEEE PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Transcript of Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning,...

Page 1: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

Neuro-Fuzzy Modeling and Control JYH-SHING ROGER JANG, MEMBER, IEEE, AND CHUEN-TSAI SUN, MEMBER, IEEE

Fundamental and advanced developments in neum-fuzzy syner- gisms for modeling and control are reviewed. The essential part of neuro-fuuy synergisms comes from a common framework called adaptive networks, which unifies both neural networks and fuzzy models. The f u u y models under the framework of adaptive net- works is called Adaptive-Network-based Fuzzy Inference System (ANFIS), which possess certain advantages over neural networks. We introduce the design methods f o r ANFIS in both modeling and control applications. Current problems and future directions for neuro-fuzzy approaches are also addressed.

modeling. neuro-fuzzy control. ANFIS. Keywords- F u u y logic, neural networks, fuzzy modeling, neuro-fuzzy

I. INTRODUCTION In 1965, Zadeh published the first paper on a novel

way of characterizing nonprobabilistic uncertainties, which he called “fuzzy sets” [116]. This year marks the 30th anniversary of fuzzy logic and fuzzy set theory, which has now evolved into a fruitful area containing various disciplines, such as calculus of fuzzy if-then rules, fuzzy graphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary in nature, includes automatic control, consumer electronics, signal processing, time-series prediction, information retrieval, database management, computer vision, data classification, decision-making, and so on.

Recently, the resurgence of interest in the field of artificial neural networks has injected a new driving force into the “fuzzy” literature. The back-propagation learning rule, which drew little attention till its applications to artificial neural networks was discovered, is actually an universal learning paradigm for any smooth parameterized models, including fuzzy inference systems (or fuzzy models). As a result, a fuzzy inference system can now not only take linguistic information (linguistic rules) from human experts, but also adapt itself using numerical data (input/output pairs) to achieve better performance. This gives fuzzy

Manuscript received March 30, 1994; revised November 28, 1994. This work was supported in part by NASA Grant NCC 2-275, MICRO Grant 92-180, EPRl Agreement RP 8010-34, and in part by the BISC program.

J . 4 . Jang is with the Control and Simulation Group, The Mathworks, Inc., Natick, MA 01760 USA.

C.-T. Sun is with the Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan.

IEEE Log Number 940830 1 .

inference systems an edge over neural networks, which cannot take linguistic information directly.

In this paper, we formalize the adaptive networks as a universal representation for any parameterized models. Under this common framework, we reexamine back- propagation algorithm and propose speedup schemes utilizing the least-squared method. We explain why neural networks and fuzzy inference systems are all special instances of adaptive networks when proper node functions are assigned, and all leaming schemes applicable to adaptive networks are also qualified methods for neural networks and fuzzy inference systems.

When represented as an adaptive network, a fuzzy in- ference system is called adaptive networks-based fuzzy inference systems (ANFIS). For three of the most com- monly used fuzzy inference systems, the equivalent ANFIS can be derived directly. Moreover, the training of ANFIS follows the spirit of the minimum disturbance pr inc ip le [lo91 and is thus more efficient than sigmoidal neural networks.

Once a fuzzy inference system is equipped with learning capability, all the design methodologies for neural network controllers become directly applicable to fuzzy controllers. We briefly review these design techniques and give related references for further studies.

The arrangement of this article is as follows. In Section 11, an in-depth introduction to the basic concepts of fuzzy sets, fuzzy reasoning, fuzzy if-then rules, and fuzzy inference systems are given. Section 111 is devoted to the formaliza- tion of adaptive networks and their leaming rules, where the back-propagation neural network and radial basis function network are included as special cases. Section IV explains the ANFIS architecture and demonstrates its wperiority over back-propagation neural networks. A number of design techniques for fuzzy and neural controllers is described in Section V. Section VI concludes this paper by pointing out current problems and future directions.

11. REASONING, AND FUZZY MODELS

FUZZY SETS, FUZZY RULES, F U 7 . y

This section provides a concise introduction to and a summary of the basic concepts central to the study of fuzzy sets. Detailed treatments of specific subjects can be found in the reference list.

378

0018-9219/95$04.00 0 1995 IEEE

PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Page 2: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

MF on a diwntc X I ’ I

MF an a continuous x

X = numbcr of courses X = age

Fig. 1. “about 50 years old.”

(a) A = “appropriate number of courses taken” (b) B =

A. Fuzzy Sets

example, a classical set A can be expressed as A classical set is a set with a crisp boundary. For

A = {z I z > 6) (1)

where there is a clear, unambiguous boundary point 6 such that if z is greater than this number, then z belongs to the set A, otherwise z does not belong to this set. In contrast to a classical set, a fuzzy set, as the name implies, is a set without a crisp boundary. That is, the transition from “belonging to a set” to “not belonging to a set” is gradual, and this smooth transition is characterized by membership functions that give fuzzy sets flexibility in modeling commonly used linguistic expressions, such as “the water is hot” or “the temperature is high.” As Zadeh pointed out in 1965 in his seminal paper entitled “Fuzzy Sets” [ 1 161, such imprecisely defined sets or classes “play an important role in human thinking, particularly in the domains of pattem recognition, communication of information, and abstraction.” Note that the fuzziness does not come from the randomness of the constituent members of the sets, but from the uncertain and imprecise nature of abstract thoughts and concepts. Dejnition 1: Fuzzy Sets and Membership Functions If X is a collection of objects denoted generically by z, then a fuzzy set A in X is defined as a set of ordered pairs:

p ~ ( z ) is called the membership function (MF for short) of 2 in A. The MF maps each element of X to a continuous membership value (or membership grade) between 0 and 1.

0 Obviously the definition of a fuzzy set is a simple

extension of the definition of a classical set in which the characteristic function is permitted to have continuous values between 0 and 1. If the value of the membership function p . ~ ( z ) is restricted to either 0 or 1, then tl is reduced to a classical set and p.A(r) is the characteristic function of A .

Usually X is referred to as the “universe of discourse,” or simply the “universe,” and it may contain either discrete objects or continuous values. Two examples are given below.

Example 1: Fuuy Sets with Discrete X . Let X = { 1, 2, 3, 4, 5 . 6, 7, 81 be the set of numbers of courses a student

may take in a semester. Then the fuzzy set A = “appropriate number of courses taken” may be described as follows:

A ={(11~.1),(2,0.~~,(~l~~.~)l(~, I) , (5,0.9),(6,0.5),(7,0.2),(8,0.1)}.

This fuzzy set is shown in Fig. ](a). 0 Example 2: Fuzzy Sets with Continuous X . Let X = R+

be the set of possible ages for human beings. Then the fuzzy set B = “about 50 years old” may be expressed as

B = {(z.,u”B(.r) 15 E X} where

This is illustrated in Fig. l(b). U An alternative way of denoting a fuzzy set A is

p ~ ( z , ) / ~ , , if X is discrete.

(3) A = X T € ? X - S, p~A(:r) /r l if x is continuous. { The summation and integration signs m (3) stand for the union of (2. p~ ( . E ) ) pairs; they do not indicate summation or integration. Similarly, “/” is only a marker and does not imply division. Using this notation, we can rewrite the fuzzy sets in Examples 1 and 2 as

A = O . l / l + 0.3/2 + 0.8/3 + 1.0/1 + 0.9/5 + 0.5/6 + 0.2/7 + 0.1/8,

and

respectively. From Example 1 and 2, we see that the construction of

a fuzzy set depends on two things: the identification of a suitable universe of discourse and the specification of an ap- propriate membership function. It should be noted that the specification of membership functions is quite subjective, which means the membership functions specified for the same concept (say, “cold”) by different persons may vary considerably. This subjectivity comes from the indefinite nature of abstract concepts and has nothing to do with randomness. Therefore the subjectivity and nonrandomness of fuzzy sets is the primary difference between the study of fuzzy sets and probability theory, which deals with objective treatment of random phenomena.

Corresponding to the ordinary set operations of union, intersection, and complement, fuzzy sets have similar oper- ations, which were initially defined in Zadeh’s paper [116]. Before introducing these three fuzzy set operations, first we will define the notion of containment, which plays a central role in both ordinary and fuzzy sets. This definition of containment is, of course, a natural extension of the case for ordinary sets.

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 379

Page 3: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

two fuzzy gets A, B

" A OR B"

"NOT A I'

"A AND E" I

' t

Fig.?. (b) 4; ( c ) A U 8; (d) -4 f l B.

Operations on fuzzy sets: (a) two fuzzy sets A and B ;

Definition 2: Containment or Subset Fuzzy set A is con- tained in fuzzy set B (or, equivalently, A is a subset of B, or A is smaller than or equal to B ) if and only if PA(%) 5 p ~ ( x ) for all z. In symbols,

(4)

0 Definition 3: Union (disjunction) The union of two fuzzy

sets A and B is a fuzzy set C, written as C = A U B or C = A OR B, whose MF is related to those of A and B by

A c B e P A ( x ) 5 P B ( x ) .

p C ( X ) = "(bA(Z),bB(x)) = P A ( z ) V P B ( x ) . ( 5 )

U As pointed out by Zadeh [116], a more intuitive and

appealing definition of union is the smallest fuzzy set containing both A and B. Alternatively, if D is any fuzzy set that contains both A and B, then it also contains A U B. The intersection of fuzzy sets can be defined analogously.

Definition 4: Intersection (conjunction) The intersection of two fuzzy sets A and B is a fuzzy set C , written as C = A fl B or C = A AND B, whose MF is related to those of A and B by

P C ( z ) = min(pA(x), pL?(z) ) = pA(z) A P B ( x ) . (6)

0 As in the case of the union, it is obvious that the

intersection of A and B is the largest fuzzy set which is contained in both A and B. This reduces to the ordinary intersection operation if both A and B are nonfuzzy.

Definition 5: Complement (negation) The complement of fuzzy set A, denoted by ~ ( T A , NOT A), is defined as

p x ( x ) = 1 - P A ( Z ) . (7)

0

Fig. 2 demonstrates these three basic operations: 1) illus- trates two fuzzy sets A and B, 2) is the complement of A, 3) is the union of A and B, and 4) is the intersection of A and B.

Note that other consistent definitions for fuzzy AND and OR have been proposed in the literature under the names "T-norm" and "T-conorm" operators [ 161, respectively. Except for min and max, none of these operators satisfy the law of distributivity:

pAU(BnC)(2) = /L(AUB)n(AUC)(X), PAn(BUC)(l) = P(AI~B)V(A"C)(~).

However, min and max do incur some difficulties in ana- lyzing fuzzy inference systems. A popular alternative is to use the probabilistic AND and OR:

pAnB(z) = /LA(x)PB(z). p A u B ( z ) =pA(z) -k P B ( T ) - P A ( x ) P B ( z ) .

In the following, we shall give several classes of param- eterized functions commonly used to define MF's. These parameterized MF's play an important role in adaptive fuzzy inference systems.

Definition 6: Triangular MF's A triangular M F is spec- ified by three parameters { a , b. c}, which determine the z coordinates of three comers:

triangle(:c; a , b, c )

= rnax (min (-,-),o). 2 - a c - x (8) h - U C - b

Fig. 3(a) illustrates an example of the triangular MF defined 0

Definition 7: Trapezoidal MF's A trapezoidal M F is by triangle(x; 20, 60, 80).

specified by four parameters {U, b. c. d } as follows:

trapezoid(.c: a , b, c . d )

= max (rriin (E, x - a 1 , ----) d - 3 . , ( I ) . (9) d - ( .

Fig. 3(b) illustrates an example of a trapezoidal MF defined by trapezoid(x; 10, 20, 60, 95). Obviously, the triangular

0 Due to their simple formulas and computational effi-

ciency, both triangular MF's and trapezoidal MF's have been used extensively, especially in real-time implementa- tions. However, since the MF'\ are composed of straight line segments, they are not smooth at the switching points specified by the parameters. In the following we introduce other types of MF's defined by smooth and nonlinear functions.

Definition 8: Gaussian MF's A Gaussian M F is specified by two parameters {g. c}:

MF is a special case of the trapezoidal MF.

gaussian(2; f7* c ) = , { - [ ( x - c ) / ~ I L } (10)

where c represents the MF's center and c determines the MF's width. Fig. 3(c) plots a Gaussian MF defined by gaussian(x;20,50). 0

380 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Page 4: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

hianguler MF

X

(a)

Gaussian MF bell MF

X

Fig. 3. Examples of various classes of MF’s: (a) trrangle (x; 20, 60, 80); (b) t rapezoid ( x ; IO, 20, 60, 95); (c ) gaussic~n (x; 20, 50); (d) bell (x; 20, 4, 50).

Deifinition 9: Generalized Bell MF’s A generalized bell MF (or bell MF ) is specified by three parameters {a, b , c}:

where the parameter b is usually positive. Note that this MF is a direct generalization of the Cauchy distribution used in probability theory. Fig. 3 illustrates a generalized bell MF

U A desired generalized bell MF can be obtained by a

proper selection of the parameter set {a . b. c } . Specifically, we can adjust c and a to vary the center and width of the MF, and then use b to control the slopes at the crossover points. Fig. 4 shows the physical meanings of each parameter in a bell MF.

Deifinition IO: Sigmoidal MF’s A sigmoidal MF is de- fined by

defined by bel@; 20, 4, 50).

(12) 1

1 + exp [-a(. - c)] s igmoid(x ; a. c) =

where a controls the slope at the crossover point x = c. 0 Depending on the sign of the parameter a, a sigmoidal

MF is inherently open right or left and thus is appropriate for representing concepts such as “very large” or “very negative.” Sigmoidal functions of this kind are employed widely as the activation function of artificial neural net- works. Therefore, for a neural network to simulate the behavior of a fuzzy inference system, the first problem we face is how to synthesize a close MF through a sigmoidal function. There are two simple ways to achieve this: one is

Fig. 4. Physical meaning of parameters in a generalized bell function.

to take the product of two sigmoidal MF’s; the other is to take the absolute difference of two sigmoidal MF’s.

It should be noted that the list of MF’s introduced in this section is by no means exhaustive; other specialized MF’s can be created for specific applications if necessary. In particular, any types of continuous probability distribution functions can be used as an MF here, provided that a set of parameters are given to specify the appropriate meanings of the MF.

B. Fuzzy If-Then Rules

conditional statement ) assumes the form A fuzzy if-then rule (fuzzy rule, fuzzy implication, or fuzzy

if x is A then y is B (13)

where A and B are linguistic values defined by fuzzy sets on universes of discourse X and Y , respectively. Often “x is A” is called the antecedent or premise while “y is B” is called the consequence or conclusion. Examples of fuzzy if-then rules are widespread in our daily linguistic expressions, such as the following:

If pressure is high then volume is small. If the road is slippery then driving is dangerous. If a tomato is red then it is ripe. If the speed is high then apply the brake a little.

Before we can employ fuzzy if-then rules to model and analyze a system, we first have to formalize what is meant by the expression “if x is A then y is B,” which is sometimes abbreviated as .4 + R. In essence, the expression describes a relation between two variables x and y; this suggests that a fuzzy if-then rule be defined as a binary fuzzy relation R on the product space X x Y . Note that a binary fuzzy relation R is an extension of the classical Cartesian product, where each element (x,y) E X x Y is associated with a membership grade denoted by p ~ ( x , y). Altematively, a binary fuzzy relation R can be viewed as a fuzzy set with universe X x Y , and this fuzzy set is characterized by a two-dimensional MF p~ ( x. y ).

Generally speaking, there are two ways to interpret the fuzzy rule A -+ B. If we interpret A --+ B as ‘*A coupled with B,” then

R = A + B = A x B =

JANG AND SUN: NEURO-FULLY MODELING AND CONTROL 381

Page 5: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

B

T x Y x (a) (b)

Two interpretations of fuzzy implication: (a) A coupled Fig. 5. with B; (b) A entails E .

where i is a fuzzy AND (or more generally, T-norm) operator and A -+ B is used again to represent the fuzzy relation R. On the other hand, if A -+ B is interpreted as “ A entails B,” then it can be written as four different formulas:

Material implication: R = A -+ B = T A U B . Propositional calculus: R = A + B = - A U ( A n B ) . Extended propositional calculus: R = A -+ B = (’A n ’ B ) U B. Generalization of modus ponens: p ~ ( x , y ) = ~ u p { c ( p ~ ( x ) * c 5 p ~ ( y ) a n d O 5 c I l}, where R = A -+ B and jl is a T-norm operator.

Though these four formulas are different in appearance, they all reduce to the familiar identity A -+ B T A U B when A and B are propositions in the sense of two-valued logic. Fig. 5 illustrates these two interpretations of a fuzzy rule A -+ B . Here we shall adopt the first interpretation, where A -+ B implies “ A coupled with B.” The treatment of the second interpretation can be found in [35], [50], [Sl].

C. Fuzzy Reasoning (Approximate Reasoning) Fuzzy reasoning (also known as approximate reasoning)

is an inference procedure used to derive conclusions from a set of fuzzy if-then rules and one or more conditions. Before introducing fuzzy reasoning, we shall discuss the compositional rule of inference [ 1171, which is the essential rationale behind fuzzy reasoning.

The compositional rule of inference is a generalization of the following familiar notion. Suppose that we have a curve y = f(z) that regulates the relation between x and y. When we are given z = a, then from y = f ( x ) we can infer that y = b = f(a); see Fig. 6(a). A generalization of the above process would allow a to be an interval and f(z) to be an interval-valued function, as shown in Fig. 6(b). To find the resulting interval y = b corresponding to the interval x = a, we first construct a cylindrical extension of a (that is, extend the domain of a from X to X x Y ) and then find its intersection I with the interval-valued curve. The projection of I onto the y-axis yields the interval y = b.

Going one step further in our generalization, we assume that A is a fuzzy set of X and F is a fuzzy relation on X x Y, as shown in Fig. 7(a) and (b). To find the resulting fuzzy set B , again, we construct a cylindrical extension c(A) with base A (that is, we expand the domain of A from X to X x Y to get c(A)). The intersection of c(A) and F (Fig. 7(c)) forms the analog of the region of intersection I

(a) (b)

Fig. 6. Derivation of y = b from .c = a and y = f(.c). (a) a and b are points, y = f (z ) is a curve, (b) a and b are intervals, y = f(s) IS an interval-valued function.

in Fig. 6(b). By projecting c (A) n F onto the y-axis, we infer y as a fuzzy set B on the y-axis, as shown in Fig. 7(d). Specifically, let PA, p c ( ~ ) , p ~ , and p~ be the MF’s of A, c(A) , B, and F, respectively, where p , ( ~ ) is related to p~ through

Y) = p A ( X ) .

Then

This formula is referred to as ma-min composition and B is represented as

B = A o F

where o denotes the composition operator. If we choose product for fuzzy AND and max for fuzzy OR, then we have ma-product composition and pg(y) is equal

Using the compositional rule of inference, we can formal- ize an inference procedure, called fuzzy reasoning, upon a set of fuzzy if-then rules. The basic rule of inference in traditional two-valued logic is modus ponens, according to which we can infer the truth of a proposition B from the truth of A and the implication A + B . For instance, if A is identified with “the tomato is red” and B with “the tomato is ripe,” then if it is true that “the tomato is red,” it is also true that “the tomato is ripe.” This concept is illustrated below.

vx [pA(Z:)pF(x , U)].

premise 1 (fact): premise 2 (rule):

x is A , if z is A then :y is B ,

~

consequence (conclusion): y is B.

However, in much of human reasoning, modus ponens is employed in an approximate manner. For example, if we have the same implication rule “if the tomato is red then it is ripe” and we know that “the tomato is more or less

382 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3. MARCH 1995

Page 6: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

(a) cylindrical extension of A (b) fuzzy relation F on x and y Q,

10 10

(c) min. of (a) and (b) (d) projection of (c) onto y-axis a Q, U

10 10

ripe.” This is written as

premise 1 (fact): x is A’, YT/jl: ...,,,,, > premise 2 (rule):

consequence (conclusion): ?/ is B‘ X

if :I‘ is A then y is B, .......-....... ....

Fig. 7. Compositional rule of inference

I&,> ~ ...... . ....

Y

where A‘ is close to A and B’ is close to B. When A, B, A’, and B’ are fuzzy sets of appropriate universes, the above inference procedure is called fuzzy reasoning or approximate reasoning; it is also called generalized modus ponens, since it has modus ponens a5 a special case.

Using the composition rule of inference introduced ear- lier, we can formulate the inference procedure of fuzzy reasoning as the following definition.

Dejiinitian 1 I: Fuzzy Reasoning Bused On M a - M i n Com- position: Let A, A‘, and B be fuzzy sets of X , X , and Y , respectively. Assume that the fuzzy implication A + B is expressed as a fuzzy relation R on X x Y . Then the fuzzy set B’ induced by “lc is A’” and the fuzzy rule “if x is A then y is B” is defined by

Fig. 8. Fuzzy reasoning for a single rule with a single antecedent.

Remember that (15) is a general expression for fuzzy reasoning, while (14) is an instance of fuzzy reasoning where min and max are the operators for fuzzy AND and OR, respectively.

Now we can use the inference procedure of the gener- alized modus ponens to derive conclusions, provided that the fuzzy implication A --+ B is defined as an appropriate binary fuzzy relation.

1) Single Rule with Single Antecedent For a single rule with a single antecedent, the formula is available in (14). A further simplification of the equation yields

p ~ , ( y ) = max min [ p A , ( s ) , p R ( x , y ) ] PB’(zl) = [vx (P.4,(Z) A PA-L(2)] A P B ( Y ) X

= ? U A p ~ ( y ) . = v.r [ P A ’ ( Z ) A p R ( . c , y)] (14)

In other words, first we find the degree of match w as the maximum of ~ A , ( z ) A p,4(z) (the shaded area in the antecedent part of Fig. 8); then the MF of the resulting B’ is equal to the MF of B clipped by 20, shown a? the shaded area in the consequent part of Fig. 8.

or, equivalently,

R‘ = A’ o R = A’ o ( A -+ B) . (15)

0

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 383

Page 7: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

mln min

Fig. 9. Approximate reasoning for multiple antecedents.

A fuzzy if-then rule with two antecedents is usually writ- ten as “if x is A and g is B then z is C.” The corresponding problem for approximate reasoning is expressed as

premise 1 (fact): premise 2 (rule):

consequence (conclusion): z is C’

x is A’ and y is B’ if z is A I and y is B1 then z is C1

The fuzzy rule in premise 2 above can be put into the simpler fomi “ A x B -+ C.” Intuitively, this fuzzy rule can be transformed into a ternary fuzzy relation R, which is specified by the following MF:

And the resulting C’ is expressed as

C’ = (A’ x B’) o ( A x B -+ C).

Thus

where w1 is the degree of match between A and A‘; w2 is the degree of match between B and B’; and w1 A 202 is called the firing strength or degree of fu&Elment of this fuzzy rule. A graphic interpretation is shown in Fig. 9, where the MF of the resulting C’ is equal to the MF of C clipped by the firing strength w , w = w1 A w2. The generalization to more than two antecedents is straightforward.

2) Multiple Rules with Multiple Antecedents: The inter- pretation of multiple rules is usually taken as the union of the fuzzy relations corresponding to the fuzzy rules. For instance, given the following fact and rules:

I

X

’4

Fig. 10. Fuzzy reasoning for multiple rules with multiple an- tecedents.

premise 1 (fact): premise 2 (rule 1):

premise 3 (rule 2):

x is A’ and y is B’ if z is A1 and y is B1 then z is C1 if z is A2 and y is Bz then z is C,

consequence (conclusion): z is C’

we can employ the fuzzy reasoning shown in Fig. 10 as an inference procedure to derive the resulting output fuzzy set C’.

To verify this inference procedure, let RI = A1 x B1 --+

C1 and R2 = A2 x B2 -+ (32 . Since the max-min composition operator o is distributive over the U operator, it follows that

C’ = (A’ x B’) o (RI U R2) = [(A’ x B’) o R I ] U [(A‘ x B‘) o Rz]

=c; U c; (17)

where Ci and Ch are the inferred fuzzy sets for rule 1 and 2, respectively. Fig. 10 shows graphically the opera- tion of fuzzy reasoning for multiple rules with multiple antecedents.

When a given fuzzy rule assumes the form “if x is A or y is B then z is C,” then firing strength is given as the maximum of degree of match on the antecedent part for a given condition. This fuzzy rule is equivalent to the union of the two fuzzy rules “if z is A then z is C” and “if 7~ is B then z is C” if and only if the max-min composition is adopted.

D. Fuzzy Models (Fuzzy Inference Systems) The Fuzzy inference system is a popular computing frame-

work based on the concepts of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. It has been successfully applied in fields such as automatic control, data classi- fication, decision analysis, expert systems, and computer vision. Because of its multi-disciplinary nature, the fuzzy inference system is known by a number of names, such

384 PROCEEDINGS OF THE IEEE, VOL. 83. NO. 3. MARCH 1995

Page 8: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

min

I X

I Y 0x

z

Fig. 11. for fuzzy AND and OR operators, respectively.

The Mamdani fuzzy inference system using min and max

as “fuzzy-rule-based system,” “fuzzy expert system” (381, “fuzzy model” [89], [96], “fuzzy associative memory” [48], “fuzzy logic controller” [50], [51], 1611, and simply (and ambiguously) “fuzzy system.”

The basic structure of a fuzzy inference system consists of three conceptual components: a rule base, which contains a selection of fuzzy rules, a database or dictionary, which defines the membership functions used in the fuzzy rules, and a reasoning mechanism, which performs the inference procedure (usually the fuzzy reasoning introduced earlier) upon the rules and a given condition to derive a reasonable output or conclusion.

In what follows, we will first introduce three types of the most commonly used fuzzy inference systems. Then we will introduce three ways of partitioning the input space for any type of fuzzy inference system. Last, we will address briefly the features and the problems of fuzzy modeling, which is concemed with the construction of a fuzzy inference system for modeling a specific target system.

1 ) Mamdani Fuzzy Model: The Mamdani fuzzy model [61] was proposed as the very first attempt to control a steam engine and boiler combination by a set of linguistic control rules obtained from experienced human operators. Fig. 11 is an illustration of how a two-rule fuzzy inference system of the Mamdani type derives the overall output z when subjected to two crisp inputs x and y.

If we adopt product and max as our choice for the fuzzy AND and OR operators, respectively, and use max-product composition instead of the original max-min composition, then the resulting fuzzy reasoning is shown in Fig. 12, where the inferred output of each rule is a fuzzy set scaled down by its firing strength via the algebraic product. Though this type of fuzzy reasoning was not employed in Mamdani’s original paper, it has often been used in the literature. Other variations are possible if we have different choices of fuzzy AND (T-norm) and OR (T- conorm) operators.

I X

I Y ax

I 1.

z

Fig. 12. The Mamdani fuuy inferexe system using p r ~ d t r c t and max for fuzzy AND dnd OR operaLor5, re\pectively

In Mamdani’s application [61], two fuzzy inference sys- tems were used as two controllers to generate the heat input to the boiler and throttle opening of the engine cylinder, respectively, in order to regulate the steam pressure in the boiler and the speed of the engine. Since the plant takes only crisp values as inputs, we have to use a defuzzifier to convert a fuzzy set to a crisp value. DefuzziJication refers to the way a crisp value is extracted from a fuzzy set as a representative value. The most frequently used defuzzification strategy is the centroid of area. which is defined as

where p c / ( x ) is the aggregated output MF. This formula is reminiscent of the calculation of expecled values in prob- ability distributions. Other defuzzification strategies arise for specific applications, which includes bisector of area, mean of maximum, largest of maximum, and smallest of maximum, and so on. Fig. 13 demonstrates these defuzzifi- cation strategies. Generally speaking, these defuzzification methods are computation intensive and there is no rigorous way to analyze them except through experiment-based studies. Other more flexible defuzzification methods can be found in [72], [79], [ I 131.

Both Figs. 11 and 12 conform to the fuzzy reasoning defined previously. In practice, however, a fuzzy inference system may have certain reasoning mechanisms that do not follow the strict definition of the compositional rule of inference. For instance, one might use either min or product for computing firing strengths and/or qualified rule outputs. Another variation is to use pointwise summation (sum) instead of max in the standard fuzzy reasoning, though sum is not really a fuzzy OR operators. An advantage of this sum-product composition [48] is that the final crisp output

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 385

Page 9: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

- centroid of area blaectw of area

smallest of max. J& - mean ofmer. largest of max.

Fig. 13. Various defuzzification schemes for obtdining a crisp output.

via centroid defuzzification is equal to the weighted average of each rule’s crisp output, where the weighting factor for a rule is equal to its firing strength multiplied by the area of the rule’s output MF, and the crisp output of a rule is equal to the centroid defuzzified value of its output MF. This reduces the computation burden if we can obtain the area and the centroid of each output MF in advance.

2) Sugeno Fuzzy Model: The Sugeno fuzzy model (also known as the TSK fuzzy model) was proposed by Takagi, Sugeno, and Kang [89], [96] in an effort to develop a systematic approach to generating fuzzy rules from a given input-output data set. A typical fuzzy rule in a Sugeno fuzzy model has the form

if z is A and y is B then z = f ( z , y)

where A and B are fuzzy sets in the antecedent, while z = f(z, y) is a crisp function in the consequent. Usually f ( z , y) is a polynomial in the input variables z and y, but it can be any function as long as it can appropriately describe the output of the system within the fuzzy region specified by the antecedent of the rule. When f(z, y) is a first-order polynomial, the resulting fuzzy inference system is called a first-order Sugeno fuzzy model, which was originally proposed in [89], [96]. When f is a constant, we then have a zero-order Sugeno fuzzy model, which can be viewed either as a special case of the Mamdani fuzzy inference system, in which each rule’s consequent is specified by a fuzzy singleton (or a predefuzzified consequent), or a special case of the Tsukamoto fuzzy model (to be introduce later), in which each rule’s consequent is specified by an MF of a step function crossing at the constant. Moreover, a zero-order Sugeno fuzzy model is functionally equivalent to a radial basis function network under certain minor constraints [33].

It should be pointed out that the output of a zero-order Sugeno model is a smooth function of its input variables as long as the neighboring MF’s in the premise have enough overlap. In other words, the overlap of MF’s in the consequent does not have a decisive effect on the smoothness of the interpolation; it is the overlap of the MF’s in the premise that determines the smoothness of the resulting input-output behavior.

Fig. 14 shows the fuzzy reasoning procedure for a first- order Sugeno fuzzy model. Since each rule has a crips output, the overall output is obtained via weighted average and thus the time-consuming procedure of defuzzification is avoided. In practice, sometimes the weighted average

X Y

I dghtdsverage 1

= - wtzi+wazr I*l + wa

Fig. 14. The Sugeno fuzzy model.

mi” cv

I I X Y

Fig. 15. The Tsukamoto fuzzy model.

operator is replaced with the weighted sum operator (that is, z = wlzl + w2z2 in Fig. 14) in order to further reduce computation load, especially in training a fuzzy inference system. However, this simplification could lead to the loss of MF linguistic meanings unless the sum of firing strengths (that is, E, w,) is close to unity.

3) Tsukamoto Fuzzy Model: In the Tsukamoto fuzzy m d - els [99l, the consequent of each fuzzy if-then rule is represented by a fuzzy set with a monotonical MF, as shown in Fig. 15. As a result, the inferred output of each rule is defined as a crisp value induced by the rule’s firing strength. The overall output is taken as the weighted average of each rule’s output. Fig. 15 illustrates the whole reasoning procedure for a two-input two-rule system.

Since each rule infers a crisp output, the Tsukamoto fuzzy model aggregates each rule’s output by the method of weighted average and thus also avoids the time-consuming process of defuzzification.

4) Partition Styles for Fuuy Models: By now it should be clear that the spirit of fuzzy inference systems resembles that of “divide and conquer”-the antecedents of fuzzy rules partition the input space into a number of local fuzzy regions, while the consequents describe the behavior within a given region via various constituents. The conse- quent constituent could be an output MF (Mamdani and Tsukamoto fuzzy models), a constant (zero-order Sugeno model), or a linear equation (first-order Sugeno model). Different consequent constituents result in different fuzzy inference systems, but their antecedents are always the

386 PROCEEDINGS OF THE IEEE. VOL. 83. NO. 3. MARCH 1995

Page 10: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

(a) (b)

Various methods for partitioning the input space: (a) grid partition; (b) tree partition; Fig. 16. (c) scatter partition.

same. Therefore the following discussion of methods of partitioning input spaces to form the antecedents of fuzzy rules is applicable to all three types of fuzzy inference systems.

Grid Partition: Fig. 16(a) illustrates a typical grid par- tition in a two-dimensional input space. This partition method is often chosen in designing a fuzzy controller, which usually involves only several state variables as the inputs to the controller. This partition strategy needs only a small number of MF’s for each input. However, it encounters problems when we have a moderately large number of inputs. For instance, a fuzzy model with 10 inputs and two MF’s on each input would result in 2’O = 1024 fuzzy if-then rules, which is prohibitively large. This problem, usually referred to as the curse of dimensionality, can be alleviated by the other partition strategies introduced below. Tree Partition: Fig. 16(b) shows a typical tree partition, in which each region can be uniquely specified along a corresponding decision tree. The tree partition relieves the problem of an exponential increase in the number of rules. However, more MF’s for each input are needed to define these fuzzy regions, and these MF’s do not usually bear clear linguistic meanings such as “small,” “big,” and so on. Scatter Partition: As shown in Fig. 16(c), by covering a subset of the whole input space that characterizes a region of possible occurrence of the input vectors, the scatter partition can also limit the number of rules to a reasonable amount.

5 ) Neuro-Fuzzy Modeling: The process for constructing a fuzzy inference system is usually called “fuzzy modeling,” which has the following features:

Due to the rule structure of a fuzzy inference system, it is easy to incorporate human expertise about the target system directly into the modeling process. Namely, fuzzy modeling takes advantage of domain knowledge that might not be easily or directly employed in other modeling approaches.

When the input-output data of a system to be modeled is available, conventional system identification tech- niques can be used for fuzzy modeling. In other words, the use of numerical data also plays an important role in fuzzy modeling, just as in other mathematical modeling methods. common practice is to use domain knowledge for

structure determination (that is. determine relevant inputs, number of MF’s for each input, number of rules, types of fuzzy models, and so on) and numerical data for parameter identification (that is, identify the values of parameters that can generate best the performance). In particular, the term neuro-fuzzy modeling refers to the way of applying various learning techniques developed in the neural network litera- ture to fuzzy inference systems. In the subsequent sections, we will apply the concept of the adaptive network, which is a generalization of the common back-propagation neural network, to tackle the parameter identification problem in a fuzzy inference system.

111. ADAPTIVE NETWORKS This section describes the architectures and learning

procedures of adaptive networks, which are a superset of all kinds of neural network paradigms with supervised learning capability. In particular, we shall address two of the most popular network paradigms adopted in the neural network literature: the back-propagation neural network (BPNN) and the radial basis function network (RBFN). Other network paradigms that can be interpreted as a set of fuzzy if-then rules are described in the next section.

A. Architecture As the name implies, an adaprive nehvork (Fig. 17) is

a network structure whose overall input-output behavior is determined by the values of a collection of modifiable parameters. More specifically, the configuration of an adap- tive network is composed of a set of nodes connected through directed links, where each node is a process unit that performs a static node function on its incoming signals to generate a single node output and each link specifies the

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 387

Page 11: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

t t t t Inputlayer layer 1 layer2 layer3

(WOHlt

Fig. 17. tion.

A feedforward adaptive network in layered representa-

direction of signal flow from one node to another. Usually a node function is a parameterized function with modifiable parameters; by changing these parameters, we are actually changing the node function as well as the overall behavior of the adaptive network.

In the most general case, an adaptive network is hetero- geneous and each node may have a different node function. Also remember that each link in an adaptive network are merely used to specify the propagation direction of a node’s output; generally there are no weights or parameters associated with links. Fig. 17 shows a typical adaptive network with two inputs and two outputs.

The parameters of an adaptive network are distributed into the network’s nodes, so each node has a local parameter set. The union of these local parameter sets is the network’s overall parameter set. If a node’s parameter set is nonempty, then its node function depends on the parameter values; we use a square to represent this kind of adaptive node. On the other hand, if a node has an empty parameter set, then its function is fixed; we use a circle to denote this type of fixed node. Adaptive networks are generally classified into two categories on the basis of the type of connections they have: feedforward and recurrent types. The adaptive network shown in Fig. 17 is a feedfonvard network, since the output of each node propagates from the input side (left) to the output side (right) unanimously. If there is a feedback link that forms a circular path in a network, then the network is a recurrent network; Fig. 18 is an example. (From the viewpoint of graph theory, a feedforward network is represented by an acyclic directed graph which contains no directed cycles, while a recurrent network always contains at least one directed cycle.)

In the layered representation of the feedforward adaptive network in Fig. 17, there are no links between nodes in the same layer and outputs of nodes in a specific layer are always connected to nodes in succeeding layers. This representation is usually preferred because of its modularity, in that nodes in the same layer have the same functionality or generate the same level of abstraction about input vectors.

Another representation of feedfonvard networks is the topological ordering representation, which labels the nodes in an ordered sequence 1, 2, 3, . . . , such that there are no links from node i to node j whenever i 2 j. Fig. 19

U

Fig. 18. A recurrent adaptive network

is the topological ordering representation of the network in Fig. 17. This representation is less modular than the layer representation, but it facilitates the formulation of the leaming rule, as will be seen in the next section. (Note that the topological ordering representation is in fact a special case of the layered representation, with one node per layer.)

Conceptually, a feedfonvard adaptive network is actually a static mapping between its input and output spaces; this mapping may be either a simple linear relationship or a highly nonlinear one, depending on the structure (node arrangement and connections, and so on) for the network and the function for each node. Here our aim is to construct a network for achieving a desired nonlinear mapping that is regulated by a data set consisting of a number of desired input-output pairs of a target system. This data set is usually called the training data set and the procedure we follow in adjusting the parameters to improve the performance of the network are often referred to as the learning rule or learning algorithm. Usually an adaptive network’s performance is measured as the discrepancy between the desired output and the network’s output under the same input conditions. This discrepancy is called the error measure and it can assume different forms for different applications. Generally speaking, a leaming rule is derived by applying a specific optimization technique to a given error measure.

Before introducing a basic learning algorithm for adap- tive networks, we shall present several examples of adaptive networks.

Example 3: An Adaptive Network with a Single Linear Node: Fig. 20 is an adaptive network with a single node specified by

where x1 and z2 are inputs and al . a2, and a.3 are mod- ifiable parameters. Obviously this function defines a plane in 2 1 - :c2 - 5 3 space, and by setting appropriate values for the parameters, we can place this plane arbitrarily. By adopting the squared error as the error measure for this network, we can identify the optimal parameters via the

Example 4: A Building Block for the Perceptron or the Back-Propagation Neural Network: If we add another node to let the output of the adaptive network in Fig. 20 have only two values 0 and 1, then the nonlinear network shown

linear least-squares estimation method. 0

388 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3. MARCH 1995

Page 12: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

Fig. 19. A fecdforward adaptive network in topological ordering representation.

Fig. 20. A linear single-node adaptive network.

Fig. 21. A nonlinear single-node adaptive network.

in Fig. 21 is obtained. Specifically, the node outputs are expressed as

and

1 if 2 3 2 0 0 if z3 < 0

3:4 = f4(3:3) =

where f 3 is a linearly parameterized function and f 4 is a step function which maps -c3 to either 0 or 1. The overall function of this network can be viewed as a linear classi$er: the first node forms a decision boundary as a straight line in 11'1 - x2 space, and the second node indicates which half plane the input vector ( T I . ,c2) resides in. Obviously we can form an equivalent network with a single node whose function is the composition of f3 and f 4 ; the resulting node is the building block of the classical perceptron.

Since the step function is discontinuous at one point and flat at all the other points, it is not suitable for learning procedures based on gradient descent. One way to get around this difficulty is to use the sigmoid function:

which is a continuous and differentiable approximation to the step function. The composition of f3 and this differ- entiable f 4 is the building block for the back-propagation

0 neural network in the following example.

.

U

f t f layer 0 layer 1 layer 2

(Input layer) (hidden layer) (output layer)

Fig. 22. A 3-3-2 neural network.

Example 5: A Back-Propagation Neurul Network: Fig. 22 is a typical architecture for a back-propagation neural network with three inputs, two outputs, and three hidden nodes that do not connect directly to either inputs or outputs. (The term back-propagation refers to the way the learning procedure is performed, that is, by propagating gradient information from the network's outputs to its inputs; details on this are to be introduced next.) Each node in a network of this kind has the same node function, which is the composition of a linear f 3 and a sigmoidal fi in Example 4. For instance, the node function of node 7 in Fig. 22 is

where x4, .c5, and :E6 are outputs from nodes 4, 5 , and 6, respectively, and ( 7 ~ 4 , ~ : 2115,~. ' u I ~ , J . t:.} is the parameter set. Usually we view wi;j as the weight associated with the link connecting node i and j and t j as the threshold associated with node j . However, it should be noted that this weight-link association is only valid in this type of network. In general, a link only indicates the signal flow direction and, the causal relationship between connected nodes, as will be shown in other types of adaptive networks in the subsequent development. A more detailed discussion about the structure and learning rules of the artificial neural

0 network will be presented later.

JANG A h D SUN;: NEURO-FUZZY MODELING AND CONTROL 38Y

Page 13: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

xo, 1

xo.2

t t t layer 0 layer 1 layer 2 layer 3

(a)

n

0 .

(b)

Fig. 23. sentation

Our notational conventions: (a) layered representation; (b) topological ordering repre-

B. Back-Propagation Learning Rule The central part of a learning rule for an adaptive network

concerns how to recursively obtain a gradient vector in which each element is defined as the derivative of an error measure with respect to a parameter. This is done by means of the chain rule, and the method is generally referred to as the "back-propagation learning rule" because the gradient vector is calculated in the direction opposite to the flow of the output of each node. Details follow below.

Suppose that a given feedforward adaptive network in the layered representation has L layers and layer 1(1 = 0, 1, . . . . L; I = 0 represents the input layer) has N ( l ) nodes. Then the output and function of node i (i = 1, . . . , N ( l ) ) of layer 1 can be represented as xl.? and .fl.z, respectively, as shown in Fig. 23(a). Without loss of generality, we assume there are no jumping links, that is, links connecting nonconsecutive layers. Since the output of a node depends on the incoming signals and the parameter set of the node, we have the following general expression for the node function fl,?:

5 1 , z = ~ / , ~ ( x / - l , l , . . . ~ l - l , ~ ( l - ~ ) , a , / ? , ~ , . . . ) (19)

where a, /?, 7, etc., are the parameters pertaining to this node.

Assuming the given training data set has P entries, we can define an error measure for the p th (1 5 p 5 P ) entry of the training data as the sum of squared errors:

where d k is the kth component of the pth desired output vector and X L , k is the kth component of the actual output vector produced by presenting the pth input vector to the network. (For notational simplicity, we omit the subscript p for both d k and X L , ~ . ) Obviously, when Ep is equal to zero, the network is able to reproduce exactly the desired output vector in the pth training data pair. Thus our task here is to minimize an overall error measure, which is defined as

Remember that the definition of Ep in (20) is not uni- versal; other definitions of Ep are possible for specific situations or applications. Therefore we shall avoid using an explicit expression for the error measure Ep in order to emphasize the generality. In addition, we assume that Ep depends on the output nodes only; more general situations will be discussed below.

To use the gradient method to minimize the error mea- sure, first we have to obtain the gradient vector. Before calculating the gradient vector, we should observe that

P E = C,=lEP.

/changein*rchangeintheourput/ parametera of node containing Q:

change in the output change in the output *[Fl*Fl where the arrows * indicate causal relationships. In other words, a small change in a parameter a will affect the

390 PROCEEDLNGS OF THE IEEE, VOL. 83, NO. 3. MARCH 1995

Page 14: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

The error signal for the ith output node (at layer L) can be calculated directly:

Fig. 24. text for details).

Ordered derivatives and ordinary partial derivatives (see

output of the node containing a; this in turn will affect the output of the final layer and thus the error measure. There- fore the basic concept in calculating the gradient vector of the parameters is to pass a form of derivative information starting from the output layer and going backward layer by layer until the input layer is reached.

To facilitate the discussion, we define the error signal F L , ~ as the derivative of the error measure Ep with respect to the output of node i in layer 1, taking both direct and indirect paths into consideration. In symbols,

This expression was called the “ordered derivative” by Werbos [107]. The difference between the ordered deriva- tive and the ordinary partial derivative lies in the way we view the function to be differentiated. For an internal node output . c ~ , ~ (where 1 # L) , the partial derivative (?lEP/axl,&) is equal to zero, since Ep does not depend on ~ 1 , ~ directly. However, it is obvious that Ep does depend on ~ 1 , ~ indirectly, since a change in ~ 1 , ~ will propagate through indirect paths to the output layer and thus produce a corresponding change in the value of E p . Therefore

can be viewed as the ratio of these two changes when they are made infinitesimal. The following example demonstrates the difference between the ordered derivative and the ordinary partial derivative.

Example 6: Ordered Derivatives and Ordinary Partial Derivatives: Consider the simple adaptive network shown in Fig. 24, where z is a function of x and y, and y is in turn a function of x:

Y = f(x.). { 2 = d G Y ) .

For the ordinary partial derivative (az/dz) , we assume that all the other input variables (in this case, y) are constant:

ax ax . In other words, we assume the direct inputs x and y are independent, without paying attention to the fact that y is actually a function of x. For the ordered derivative, we take this indirect causal relationship into consideration:

az - a g ( ~ , y > -~ -

a+z - a d z , f (.)I - i ) X dX

Therefore the ordered derivative takes into consideration both the direct and indirect paths that lead to the causal relationship. 0

This is equal to EL,^ = - 2 ( 4 - Z L , ~ ) if Ep is defined as in (20). For the internal (nonoutput) node at the ith position of layer 1, the error signal can be derived by the chain rule:

error signal error signal at layer 1+1 at layer 1

where 0 5 1 5 L- 1. That is, the error signal of an internal node at layer 1 can be expressed as a linear combination of the error signal of the nodes at layer 1+ I . Therefore for any I and .i (0 5 I 5 L and 1 5 i 5 N ( l ) ) , we can find = (d+Ep/3xl,z) by first applying (22) once to get error signals at the output layer, and then applying (23) iteratively until we reach the desired layer 1. Since the error signals are obtained sequentially from the output layer back to the input layer, this learning paradigm is called the “back-propagation” learning rule by Rumelhart, Hinton, and Williams [78].

The gradient vector is defined as the derivative of the error measure with respect to each parameter, so we have to apply the chain rule again to find the gradient vector. If a is a parameter of the ith node at layer I , we have

Note that if we allow the parameter a to be shared between different nodes, then (24) should be changed to a more general form:

where S is the set of nodes containing a as a parameter and f* is the node function for calculating x*.

The derivative of the overall error measure E with respect to a is

p=l

Accordingly, the update formula for the generic param- eter a is

d+E A a = - T ) - atr

in which 71 is the leaming rate, which can be further expressed as

(28) K

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 39 I

Page 15: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

where IC. is the step size, the length of each transition along the gradient direction in the parameter space. Usually we can change the step size to vary the speed of convergence; two heuristic rules for updating the value of IE are described in [30].

When an n-node feedforward network is represented in its topological order, we can envision the error measure E, as the output of an additional node with index n + 1, whose node function fn+l can be defined on the outputs of any nodes with smaller indexes; see Fig. 23(b). (Therefore Ep may depend directly on any intemal nodes.) Applying the chain rule again, we have the following concise formula for calculating the error signal ti = 8Ep/i3x,:

or

where the first term shows the direct effect of 2, on Ep via the direct path from node z to node n + 1 and each product term in the summation indicates the indirect effect of x, on Ep. Once we find the error signal for each node, then the gradient vector for the parameters is derived as before.

Another simple and systematic way to calculate the error signals is through the representation of the error- propagation network (or sensitivity model), which is ob- tained from the original adaptive network by reversing the links and supplying the error signals at the output layer as inputs. The reader is referred to [22] or [35] for a complete coverage.

Depending on the applications we are interested in, two types of leaming paradigms for adaptive networks are available to suit our needs. In off-line learning (or batch leaming), the update formula for parameter a is based on (26) and the update action takes place only after the whole training data set has been presented, that is, only after each epoch or sweep. On the other hand, in on-line learning (or pattern learning), the parameters are updated immediately after each input-output pair has been presented, and the update formula is based on (24). In practice, it is possible to combine these two leaming modes and update the parameter after k training data entries have been presented, where k is between 1 and P and it is sometimes referred to as the epoch size.

For a recurrent adaptive network that operates synchro- nously, we can transform it into an equivalent feedfor- ward network by a simple technique called “unfolding of time” [78]. When applied to an unfold network, The back-propagation learning algorithm is often referred to as “back-propagation through time.” An improved on-line version that is less memory-intensive, called “real-time recurrent leaming” [112], is also available in the literature. Due to space limitation, the reader is refer to [22] or [351 for a general coverage of applying back-propagation to recurrent networks.

C. Hybrid Learning Rule: Combining BP and LSE It is observed that if an adaptive network’s output (as-

suming only one) or its transformation is linear in some of the network’s parameters, then we can identify these linear parameters by the well known linear least-squares method. This observation leads to a hybrid learning rule [251, [30] which combines the gradient method and the least-squares estimator (LSE) for fast identification of parameters.

I) 08-Line Learning (Batch Learning): For simplicity, assume that the adaptive network under consideration has only one output

output = F(I: S ) (31)

where 9 is the vector of input variables and S is the set of parameters. If there exists a function H such that the composite function H o F is linear in some of the elements of S , then these elements can be identified by the least- squares method. More formally, if the parameter set S can be decomposed into two sets

(where Q represents direct sum) such that H o F is linear in the elements of S Z , then upon applying H to (31), we have

H(ou tpu t ) = H 0 F ( f , S ) (33)

which is linear in the elements of Sz. Now given values of elements of SI, we can plug P training data into (33) and obtain a matrix equation:

AB = U (34)

where B is an unknown vector whose elements are param- eters in SZ. This equation represents the standard linear least-squares problem and the best solution for 8, which minimizes ((AB - L?(I2, is the least-squares estimator (LSE) e* :

where AT is the transpose of A and (ATA)-’AT is the pseudo-inverse of A if A T A is nonsingular. Of course, we can also employ the recursive LSE formula [ l l , 1241, [591. Specifically, let the ith row vector of matrix A defined in (34) be U: and the ith element of B be bT; then B can be calculated iteratively as follows:

where the least-squares estimator B* is equal to B p . The initial conditions needed to bootstrap (36) are 80 = 0 and

392 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3. MARCH 1995

Page 16: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

SO = 71, where y is a positive large number and I is the identity matrix of dimension M x M . When we are dealing with multi-output adaptive networks (output in (31) is a column vector), (36) still applies except that bT is the ith row of matrix B.

Now we can combine the gradient method and the least- squares estimator to update the parameters in an adaptive network. For hybrid learning to be applied in a batch mode, each epoch is composed of a forward pass and a backward pass. In the forward pass, after an input vector is presented, we calculate the node outputs in the network layer by layer until a corresponding row in the matrices A and B in (34) are obtained. This process is repreated for all the training data entries to form the complete A and B; then parameters in S2 are identified by either the pseudo-inverse formula in (35) or the recursive least-squares formulas in (36). After the parameters in SZ are identified, we can compute the error measure for each training data entry. In the backward pass, the error signals (the derivative of the error measure w.r.t. each node output, see (22) and (23)) propagate from the output end toward the input end; the gradient vector is accumulated for each training data entry. At the end of the backward pass for all training data, the parameters in S1 are updated by the gradient method in (27).

For given fixed values of the parameters in SI, the parameters in Sz thus found are guaranteed to be the global optimum point in the Sz parameter space because of the choice of the squared error measure. Not only can this hybrid learning rule decrease the dimension of the search space in the gradient method, but, in general, it will also substantially reduce the time needed to reach convergence.

It should be kept in mind that by using the least-squares method on the data transformed by W(.), the obtained parameters are optimal in terms of the transformed squared error measure instead of the original one. In practice, this usually does not cause a problem as long as H ( . ) is monotonically increasing and the training data are not too noisy. A more detailed treatment of this transformation method can be found in [35].

2) On-Line Learning (Pattern Learning): If the parame- ters are updated after each data presentation, we have an on-line leaming or pattern learning scheme. This leaming strategy is vital to on-line parameter identification for systems with changing characteristics. To modify the batch learning rule to obtain an on-line version, it is obvious that the gradient descent should be based on Ep (see (24)) instead of E. Strictly speaking, this is not a truly gradient search procedure for minimizing E, yet it will approximate one if the learning rate is small.

For the recursive least-squares formula to account for the time-varying characteristics of the incoming data, the effects of old data pairs must decay as new data pairs become available. Again, this problem is well studied in the adaptive control and system identification literature and a number of solutions are available [20]. One simple method is to formulate the squared error measure as a weighted version that gives higher weighting factors to more recent

data pairs. This amounts to the addition of a forgetting factor X to the original recursive formula:

where the typical value of X in practice is between 0.9 and 1 . The smaller X is, the faster the effects of old data decay. A small X sometimes causes numerical instability. however, and thus should be avoided. For a complete discussion and derivation of ( 3 3 , the reader is referred to [20], 1351, [59].

3) Different Ways of Combining GD and LSE: The com- putational complexity of the least-squares estimator (LSE) is usually higher than that of the gradient descent (GD) method for one-step adaptation. However, for achieving a prescribed performance level, the LSE is usually much faster. Consequently, depending on the available computing resources and required level of performance, we can choose from among at least five types of hybrid leaming rules combining GD and LSE in different degrees, as follows:

One pass of LSE only: Nonlinear parameters are fixed while linear parameters are identified by one-time application of LSE. GD only: All parameters are updated by GD itera- tively. One pass of LSE followed by GD: LSE is employed only once at the very beginning to obtain the initial values of linear parameters and then GD takes over to update all parameters iteratively. GD and LSE: This is the proposed hybrid leaming rule, where each iteration (epoch) of GD used to update the nonlinear parameters is followed by LSE to identify the linear parameters. Sequential (approximate) LSE only: The outputs of an adaptive network are linearized with respect to its parameters, and then the extended Kalman filter algorithm [21] is employed to update all parameters. This method has been proposed in the neural network literature [82]-[84].

The choice of one of the above methods should be based on a tradeoff between computational complexity and performance. Moreover, the whole concept of fitting data to parameterized models is called regression in statistics liter- ature, and there are a number of other techniques for either linear or nonlinear regression, such as the Guass-Newton method (linearization method) and the Marquardt procedure [62]. These methods can be found in advanced textbooks on regression and they are also viable techniques for finding optimal parameters in adaptive networks.

D. Neural Networks as Special Cases ofAdaptive Networks Some special cases of adaptive networks have been

explored extensively in the neural network literature. In particular, we will introduce two types of neural networks: the back-propagation neural network (BPNN) and the radial basis function network (RBFN). Other types of adaptive

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 393

Page 17: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

step function sigmoid function

21-----1 2r-----l 0 1/

node 4

hyper-tangent function identity function

I 1 -3 0 0 10 0 10

(c) (d)

Fig. 25. Activation functions for BPNN’s: (a) step function; (b) sigmoid function; (c) hyper-tangent function; (d) identity function.

networks that can be interpreted as a set of fuzzy if-then rules are investigated in the next section.

I ) Back Propagation Neural Networks (BPNN’s): A back- propagation neural network (BPNN), as already mentioned in Examples 4 and 5, is an adaptive network whose nodes (called “neurons”) perform the same function on incoming signals; this node function is usually a composite function of the weighted sum and a nonlinear function called the “activation function” or “transfer function.” Usually the activation functions are of either a sigmoidal or a hyper- tangent type which approximates the step function (or hard limiter) and yet provides differentiability with respect to input signals. Fig. 25 depicts the four different types of activation functions f(x) defined below.

Step function: 1 i f z > 0 . 0 i f x < O .

1

Sigmoid function:

Hyper-tangent function: 1 - e-” 1 + e-“

f(x) = tanh (2/2) = -. Identity function: f(x) = 2.

When the step function (hard-limiter) is used as the acti- vation function for a layered network, the network is often called a “perceptron” [69], [77], as explained in Example 4. For a neural network to approximate a continuous-valued function not necessarily limited to the interval [0,1] or [ I , -11, we usually let the node function for the output layer be a weighted sum with no limiting-type activation functions. This is equivalent to the situation where the activation function is an identity function, and output nodes of this type are often called linear nodes.

Fig. 26. A BPNN node

For simplicity, we assume the BPNN in question uses the sigmoidal function as its activation function. The net input Z of a node is defined as the weighted sum of the incoming signals plus a threshold. For instance, the net input and output of node j in Fig. 26 (where j = 4) are

where xi is the output of node i located in the previous layer, wij is the weight associated with the link connecting nodes i and j , and t j is the threshold of node j . Since the weights wij are actually intemal parameters associ- ated with each node j , changing the weights of a node will alter the behavior of the node and in turn alter the behavior of the whole BPNN. Fig. 22 shows a two-layer BPNN with 3 inputs in the input layer, 3 neurons in the hidden layer, and 2 output neurons in the output layer. For simplicity, this BPNN will be referred to as a 3-3- 2 structure, corresponding to the number of nodes in each layer. (Note that the input layer is composed of three buffer nodes for distributing the input signals; therefore this layer is conventionally not counted as a physical layer of the BPNN.)

BPNN’s are by far the most commonly used NN structure for applications in a wide range of areas, such as speech recognition, optical character recognition (OCR), signal processing, data compression, and automatic control.

2) The Radial Basis Function Networks (RBFN’s): The locally tuned and overlapping receptive field is a well known structure that has been studied in the regions of the cerebral cortex, the visual cortex, and so forth. Drawing on the knowledge of biological receptive fields, Moody and Darken [65], [66] proposed a network structure that employs local receptive fields to perform function map- pings. Similar schemes have been proposed by Powell [73], Broomhead and Lowe [7 ] , and many others in the areas of interpolation and approximation theory; these schemes are collectively called radial basis function approximations. Here we shall call this network structure the radial basis function network or RBFN. Fig. 27 shows a schematic diagram of an RBFN with five receptive field units; the activation level of the ith receptive field unit (or hidden unit) is

394 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Page 18: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

- f

Fig. 27. A radial basis function network (RBFN).

where 5 is a multi-dimensional input vector, G is a vector with the same dimension as 5, H is the number of radial basis functions (or equivalently, receptive field units), and R,(.) is the ith radial basis function with a single maximum at the origin. Typically, R;(.) is chosen as a Gaussian function

or as a logistic function

Thus the activation level of the radial basis function w, computed by the ith hidden unit is maximum when the input vector 2 is at the center

The output of a radial basis function network can be computed in two ways. In the simpler method, as shown in Fig. 27, the final output is the weighted sum of the output value associated with each receptive field:

of that unit.

H H

a = 1 2 = 1

where f, is the output value associated with the ith recep- tive field. A more complicated method for calculating the overall output is to take the weighted average of the output associated with each receptive field:

H H

f2Wt ftRt(5) (43) 1=1

H - f(2) = 1=1 -

?W. ZRd4 . I = 1 L = l

This mode of calculation, though has a higher degree of computational complexity, possesses the advantage that points in the overlapping area of two receptive fields will have a well interpolated output value between the output values of the two receptive fields. For representation purposes, if we change the radial basis function &(2) in each node of layer 2 in Fig. 27 by its normalized counterpart R,(Z)/ E, Rt(2) , then the overall output is specified by (43).

Several learning algorithms have been proposed to iden- tify the parameters (6, o; and f;) of an RBFN. Note that the RBFN is an ideal example of the hybrid learning described in the previous section, where the linear parameters are f; and the nonlinear parameters are ci and oi. In practice, the 6 are usually found by means of vector quantization or clustering techniques (which assume similar input vectors produce similar outputs) and the oi are obtained heuristi- cally (such as by taking the average distance to the first several nearest neighbors of Zi’s). Once these nonlinear parameters are fixed, the linear parameters can be found by either the least-squares method or the gradient method. Chen et al. [8] used an alternative method that employs the orthogonal least-squares algorithm to determine the c;’s and fi’s while keeping the oi’s at a pre-determined constant.

An extension of Moody-Darken’s RBFN is to assign a linear function as the output function of each receptive field; that is, f i is a linear function of the input variables instead of a constant:

f ; = r i a . .t + b, (44)

where Zi is a parameter vector and bi is a scalar parameter. Stokbro et al. 1871 used this structure to model the Mackey- Glass chaotic time series [60] and found that this extended version performed better than the original RBFN with the same number of fitting parameters.

It was pointed out by the authors that under certain constraints, the RBFN is functionally equivalent to the the zero-order Sugeno fuzzy model. See 1331 or 1351 for details.

IV. ANFIS: ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS

A class of adaptive networks that act as a fundamental framework for adaptive fuzzy inference systems is intro- duced in this section. This type of networks is referred to as “ANFIS” [25], 1261, 1301, which stands for Adaptive- Network-based Fuzzy Inference System, or semantically equivalently, Adaptive Neuro-Fuzzy Inference System. We will describe primarily the ANFIS architecture and its learning algorithm for the Sugeno fuzzy model, with an application example of chaotic time series prediction.

Note that similar network structures were also proposed independently by Lin and Lee 1571 and Wang and Mendel [1041).

A. ANFIS Architecture For simplicity, we assume the fuzzy inference system

under consideration has two inputs z and 1~ and one output z . For a first-order Sugeno fuzzy model [89], [96], a typical rule set with two fuzzy if-then rules can be expressed as

Ride 1 : If z is AI and is B1,

Rule 2 : If x is A2 and y is B2, then f l = plz + y ~ g + TI ,

then f 2 = p 2 2 + q2y + 1.2.

Fig. 28(a) illustrates the reasoning mechanism for this Sugeno model. The corresponding equivalent ANFIS archi- tecture is as shown in Fig. 28(b), where nodes of the same

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 395

Page 19: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

Y

layer 1

f2 = p2x +%Y +5 = w, f , + w2 f2

layer 4

X Y layer 5 layer 3 .1

.1 I I I

X

f

Y

(b)

Fig. 28. architecture.

(a) A two-input first-order Sugeno fuzzy model with two rules; (b) equivalent ANFIS

layer have similar functions, as described below. (Here we denote the output node i in layer 1 as Ol,i.)

Layer I: Every node i in this layer is an adaptive node with a node output defined by

where z (or y) is the input to the node and A, (or Bi-2) is a fuzzy set associated with this node. In other words, outputs of this layer are the membership values of the premise part. Here the membership functions for A, and B, can be any appropriate parameterized membership functions introduced in Section 11. For example, A, can be characterized by the generalized bell function:

where {ai, bi, ci} is the parameter set. Parameters in this layer are referred to as premise parameters.

Layer 2: Every node in this layer is a fixed node labeled II, which multiplies the incoming signals and outputs the product. For instance,

(47)

Each node output represents the firing strength of a rule. (In fact, any other T-norm operators that perform fuzzy AND can be used as the node function in this layer.)

Layer 3: Every node in this layer is a fixed node labeled N . The ith node calculates the ratio of the ith rule's firing strength to the sum of all rules' firing strengths:

0 2 , a = w, = p A z ( Z ) x p&(y), i = 1.2.

For convenience, outputs of this layer will be called nor- malized jiring strengths.

Layer 4: Every node i in this layer is an adaptive node with a node function

(49)

where E,i is the output of layer 3 and {pi, qi , ri} is the parameter set. Parameters in this layer will be referred to as consequent parameters.

04,a = Wafa = Ei(piX + qiy + Ti)

396 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Page 20: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

Table 1 Two Passes in the Hybrid Learning Procedure for ANFIS

Signals Fig. 29. Sugeno fuzzy model.

Another ANFIS architecture for the two-input two-rule Node Outputs Error Signals

- t Premise Fixed Gradient

Parameters Descent

Parameters Estimate

h y e r 5 : The single node in this layer is a fixed node labeled E, which computes the overall output as the sum- mation of all incoming signals:

2

Thus we have constructed an adaptive network that has exactly the same function as a Sugeno fuzzy model. Note that the structure of this adaptive network is not unique; we can easily combine layers 3 and 4 to obtain an equivalent network with only four layers. Similarly, we can perform weight normalization at the last layer; Fig. 29 illustrates an ANFIS of this type.

Fig. 30(a) is an ANFIS architecture that is equivalent to a two-input first-order Sugeno fuzzy model with nine rules, where each input is assumed to have three associated MF’s. Fig. 30(b) illustrates how the 2-D input space is partitioned into nine overlapping fuzzy regions, each of which is governed by fuzzy if-then rules. In other words, the premise part of a rule defines a fuzzy region, while the consequent part specifies the output within this region.

For ANFIS architectures for the Mamdani and Tsukamoto fuzzy models, the reader is referred to [30] and [35) for more detail.

B. Hybrid Learning Algorithm From the ANFIS architecture shown in Fig. 28(b), we

observe that when the values of the premise parameters are fixed, the overall output can be expressed as a linear combination of the consequent parameters. In symbols, the outpht f in Fig. 28(b) can be rewritten as

f 2 101 U’2

f=----- fl + ~

w1 + w 2 W l + W 2

=E1 f l + 732 f2

= @ l X h + (WlY)91 + (W1)Tl

+ ( a 2 X ) P z + (z2y)qz + ( w 2 ) ~ (51)

which is linear in the consequent parameters p l , q1, r1.

pa, 92, and rg. Therefore the hybrid learning algorithm developed in the previous section can be applied directly. More specifically, in the forward pass of the hybrid learning algorithm, node outputs go forward until layer 4 and the consequent parameters are identified by the least-squares method. In the backward pass, the error signals propagate

backward and the premise parameters are updated by gra- dient descent. Table 1 summarizes the activities in each pass.

As mentioned earlier, the consequent parameters thus identified are optimal under the condition that the premise parameters are fixed. Accordingly, the hybrid approach converges much faster since it reduces the dimension of the search space of the original back-propagation method.

If we fix the membership functions and adapt only the consequent part, then ANFIS can be viewed as a functional-link network [47], [70] where the “enhanced representations” of the input variables are obtained via the membership functions. These “enhanced representations,” which take advantage of human knowledge, apparently express more insight than the functional expansion and the tensor (outer product) models 1701. I3y fine-tuning the membership functions, we actually make this “enhanced representation” also adaptive.

From (42), (43), and (50), it is not too hard to see the resemblance between the radial basis function network (RBFN) and the ANFIS for the Sugeno model. Actually these two computing framework are functionally equivalent under certain minor conditions [33]; this cross-fertilize both disciplines in many respect$.

C. Application to Chaotic Time Series Prediction ANFIS can be applied to a wide range of areas, such

as nonlinear function modeling [25] , [30], time series prediction [30], [34], on-line parameter identification for control systems [30], and fuzzy controller design [27], [29]. In particular, GE has been using ANFIS for modeling correction factors in steel rolling mills [6]. Here we will briefly report the application of ANFIS to chaotic time series prediction [30], [34].

The time series used in our simulation is generated by the Mackey-Glass differential delay [6Ol:

(52) 0.2z(t - 7)

1 + z y t - T ) i ( t ) = - O.lz(t).

The prediction of future values of this time series is a benchmark problem that has been used and reported by a number of connectionist researchers, such as Lapedes and Farber [49], Moody [64], [66], Jones et al. [36], Crower [76], and Sanger [80] . The simulation results presented here were reported in [30], [34]; more details can be found therein,

JANG AND SUN: NEUUO-FUZZY MODELING AND CONTROL 397

Page 21: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

Y Y

Methods

ANFIS

AR Model

Cascade-Correlation NN

Back-Prop NN

6th-order Polynomial

Linear Predictive Method

Fig.

~~

Training Data NDEI

500 0.007

500 0.19

500 0.06

500 0.02

500 0.04

2000 0.55

30. ANFIS architec

f Y

f (a)

:ture for a two-input first-order Sugeno fuzzy (b) partition of the input space into nine fuzzy regions.

Mackey-Glass Time Series 1.4, ,

1.2

1

0.8

0.6

0.4 200 400 600

time

(a)

pndictioo Ermrs 0.01 I

0.005 I II I

0

' 1' I

m 400 600 800 1wO -0.01 I

time

(b)

Fig. 31. (a) Mackey-Glass time series from t = 124 to 1123 and six-step ahead prediction (which is indistinguishable from the time series here); (b) prediction error. (Note that the first 500 data points are training data, while the remaining are for validation.)

The goal of the task is to use past values of the time series up to the point x = t to predict the value at some point in the future z = t + P. The standard method for this type of prediction is to create a mapping from D points of the time series spaced A apart, that is, (z(t - (D - l )A) , . . . , z ( t - A), ~ ( t ) ) , to a predicted future value z ( t + P ) . To allow comparison with earlier work (Lapedes and Farber [49], Moody [64], [66], Crower [76]), the values D = 4 and A = P = 6 were used. All other simulation settings were arranged to be as similar as possible to those reported in

From the Mackey-Glass time series ~ ( t ) , we extracted ~761.

1000 input-output data pairs of the following format:

[ ~ ( t - 18), ~ ( t - 12). ~ ( t - 6 ) , 5 ( t ) : X ( t + 6)] (53)

where t = 118 to 11 17. The first 500 pairs (training data set) were used for training ANFIS, while the remaining 500

- f

model

:g with nine

.# 1 4 7

1 - - - - m (b)

rules;

Table 2 Generalization Result Comparisons for P = 6

pairs (checking data set) were used for validating the model identified. The number of membership functions assigned to each input of the ANFIS was set to two, so the number of rules is 16. The ANFIS used here contains a total of 104 fitting parameters, of which 24 are premise parameters and 80 are consequent parameters

Fig. 3 1 shows the results after about 500 epochs of learn- ing. The desired and predicted values for both training data and checking data are essentially the same in Fig. 31(a); the differences between them can only be seen on a much finer scale, such as that in Fig. 31(b).

Table 2 lists the generalization capabilities of other methods, which were measured by using each method to predict 500 points immediately following the training set. The last four row of Table 2 are from [76] directly. The nondimensional error index (NDEI) [49], 1761 is defined as the root mean square error divided by the standard deviation of the target series. The remarkable generalization capability of ANFIS is attributed to the following facts:

ANFIS can achieve a highly nonlinear mapping, there- fore it is well suited for predicting nonlinear time series. The ANFIS used here has 104 adjustable parameters, far fewer than those used in the cascade-correlation NN (693, the median) and back-prop NN (about 540) listed in Table 2. Though not based on a priori knowledge, the initial parameter settings of ANFIS are intuitively reasonable

398 PROCEEDINGS OF THE IEEE. VOL. 83. NO. 3. MARCH 1995

Page 22: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

I -J

- u(k)

plant ?r

Fig. 32. Block diagram for a continuous time feedback control system.

-+ X(k+l)

U

Fig. 33. The inverted pendulum system.

and results in fast convergence to good parameter values that captures the underlying dynamics. ANFIS consists of fuzzy rules which are actually local mappings (which are called local experts in [37]) instead of global ones. These local mappings facilitate the minimal disturbunce principle [ 1091, which states that the adaptation should not only reduce the output error for the current training pattern but also minimize disturbance to response already learned. This is partic- ularly important in on-line learning. We also found the use of least-squares method to determine the output of each local mapping is of particular importance. Without using LSE, the learning time would be ten times longer.

Other generalization tests and comparisons with neural network approaches can be found in 1301.

The original ANFIS C codes and several examples (including this one) can be retrieved via anonymous ftp in u se r / a i / a r eas / fuzzy / sys t ems /an f i s at f t p . cs . cmu . edu (CMU Artificial Intelligence Repository).

V. NEURO-FUZZY CONTROL Once a fuzzy controller is transformed into an adap-

tive network, the resulting ANFIS can take advantage of all the NN controller design techniques proposed in the literature. In this section we shall introduce common design techniques for ANFIS controllers. Most of these methodologies are derived directly from their counterparts for NN controllers. However, certain design techniques apply exclusively to ANFIS, which will be pointed out explicitly.

As shown in Fig. 32, the block diagram of a typical feedback control system consists of a plant block and a controller block. The plant block is usually represented by a set of differential equations that describe the phy$ical sys- tem to be controlled. These equations govern the behavior of the plant state x ( t ) , which is assumed to be accessible in our discussion. In contrast, the controller block is usually a static function denoted by g; it maps the the plant state x(t) into a control action ~ ( t ) that can hopefully achieve a given control objective. Thus for a general time-invariant control system, we have the following equations:

x(t) = f(x(t), u( t ) ) (plant dynamics), u(t> = g(x(t)) (controller).

The control objective here is to design a controller function g(.) such that the plant state x(t) can follow a desired trajectory x d ( t ) as closely as possible.

A simple example of a feedback control system is the inverted pendulum system (Fig. 33) where a rigid pole is hinged to a cart through a free joint with only one degree of freedom, and the cart moves on the rail tracks to its right or left depending on the force exerted on it. Thc control goal is to find the applied force 'U as a function of the state variable x = [8 , 8, z , i] (where 8 is the pole angle and z is the cart position) such that the pole can be balanced from a given nonzero initial condition.

For a feedback control system in a discrete time domain, a general block diagram representation is as shown in Fig. 34. Note that the inputs to the plant block include the control action u(k) and the previous plan1 output x(k), so the plant block now represents a static mapping. In symbols, we have

x(k + 1) = f(x(k), u(k)) (plant), u(k) = g(x(k)) (controller).

A central problem in control engineering is that of finding the control action U as a function of the plant output x in order to achieve a given control goal. Each design method for neuro-fuzzy controllers corresponds to a way of obtaining the control action; these methods are discussed next.

JANG AND SUN: NEURO-FUZZY MODELING AND CONTROL 399

Page 23: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

- I I L I I /

+i - x(k+ 1) x(k) ANFIS

x, (k) controller plant

(b)

Fig. 35. phase; (b) application phase.

Block diagram for inverse control method: (a) learning

A. Mimicking Another Working Controller Most of the time, the controller being mimicked is an

experienced human operator who can control the plant satisfactorily. In fact, the whole concept of mimicking a human expert is the original intention of fuzzy controllers whose ultimate goal is to replace human operators who can control complex systems such as chemical reaction processes, subway trains, and traffic systems. An expe- rienced human operator usually can summarize his or her control actions as a set of fuzzy if-then rules with roughly correct membership functions; this corresponds to the linguistic information. Prior to the emergence of neuro- fuzzy approaches, relining membership function is usually obtained via a lengthy trial-and-error process. Now with learning algorithms, we can further take advantage of the numerical information (input/output data pairs) and refine the membership functions in a systematic way. Note that the capability to utilize linguistic information is specific to fuzzy inference systems; it is not always available in neural networks. Successful applications of fuzzy controller based on linguistic information plus trial-and-error tuning includes steam engine and boiler control [61], Sendai subway systems [ 1151, container ship crane control [ 1141, elevator control [551, nuclear reaction control [ 5 ] , au- tomobile transmission control [41], aircraft control [ 141, and many others [88]. With the availability of learning algorithms, a wider range of applications is expected.

Note that this approach is not only for control appli- cations. If the target system to be emulated is a human physician or a credit analyst, then the resulting fuzzy infer- ence systems become a fuzzy expert system for diagnosis and credit analysis, respectively.

B. Inverse Control Another scheme for obtaining desired control action is

the inverse control method shown in Fig. 35. For simplicity, we assume that the plant has only one state z ( k ) and one

I I

I I

(b) Fig. 36. ized learning with model reference.

Block diagram for (a) specialized learning; (b) special-

input u(k) . In the learning phase, a training set is obtained by generating inputs u(k) at random, and observing the corresponding outputs z ( k ) produced by the plant. The ANFIS in Fig. 35(a) is then used to leam the inverse model of the plant by fitting the data pairs ( x ( k ) , z ( k + 1); u(k):t. In the application phase, the ANFIS identifier is copied to the ANFIS controller in Fig. 35 for generating the desired output. The input to the ANFIS controller is ( ~ ( k ) , z d ( k ) ) ; if the inverse model (ANFIS identifier) that maps ( z ( k ) , x ( k + l ) ) to u(k ) is accurate, then the generated u(k) should result in z ( k + 1) that is close to zd(k). That is, the whole system in Fig. 35 will behave like a pure unit-delay system.

This method seems straightforward and only one learning task is needed to find the inverse model of the plant. However, it assumes existence of the inverse of a plant, which is not valid in general. Moreover, minimization of the network error I le, ( k ) 1 1 ’ does not guarantee minimization of the overall system error llxd(k) - x(k)I l2 .

Using ANFIS for adaptive inverse control can be found in [43].

C. Specialized Learning The major problem with the inverse control scheme is

that we are minimizing the network error instead of the overall system error. An alternative is to minimize the system error directly; this is called “specialized learning” [75]. In order to back-propagate error signals through the plant block in Fig. 36, we need to find a model representing the behavior of the plant. In fact, in order to apply back- propagation learning, all we need to know is the Jacobian matrix of the plant, where the element at row i and column j is equal to the derivative of the plant’s ith output with respect to its j th input.

400 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Page 24: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

actual deslrsd mjectoty ”tory

1 .1

Fig. 37. A trajectory network for control application (FC stands for “fuzzy controller”).

If the Jacobian matrix is not easy to find, an alternative is to estimate it on-line from the changes of the plant’s inputs and outputs during two consecutive time instants. Other similar methods that aim at using an approximate Jacobian matrix to achieve the same leaming effects can be found in [ 111, [42], [ 1011. Applying specialized learning to find an ANFIS controller for the inverted pendulum was reported in [ 2 8 ] .

It is not always convenient to specify the desired plant output z d ( l c ) at every time instant IC. As a standard approach in model reference adaptive control, the desired behavior of the overall system can be implicitly specified by a (usually linear) model that is able to achieve the control goal satisfactorily. This alternative approach is shown in Fig. 36(b), where the desired output .ud(k + 1) is generated through a desired model.

D. Back-Propagarion Through Time and Real Time Recurrent Learning

If we replace the controller and plant blocks in Fig. 34 with two adaptive networks, we can duplicate and cascade these networks to form a huge trajectory network, as shown in Fig. 37, in order to find the trajectory of each node’s output. And by applying back-propagation to the trajectory network, we can force the plant block to generate a desired trajectory. The operation to get trajectory networks is called unfolding of time; the back-propagation used here is thus referred to as back-propagation through time.

In particular, the inputs to the trajectory network are initial conditions of the plant; the outputs are the state trajectory from k = 1 to k = rn. The adjustable param-

eters are all pertaining to the FC (fuzzy controller) block implemented as an ANFIS. Though there are m FC blocks, all of them refer to the same parameter set. For clarity, this parameter set is shown explicitly in Fig. 37 and it is updated according to the output of the error measure block.

Use of back-propagation through time to train a neural network for backing up a tractor-trailer system is reported in [68]. The same technique was used to design an ANFIS controller for balancing an inverted pendulum [29]. Note that back-propagation through time is usually an off-line leaming algorithms in the sense that the parameters will not be updated till the sequence (k == 1 to m) is over. If the sequence is too long or if we want to update the parameters in the middle of the sequence, we can always apply real time recurrent learning [ 1 121.

E. Feedback Linearization and Sliding Control The equations of motion of a class of dynamic systems in

continuous time domain can be expressed in the canonical form:

&(t) = f(z(t),.(t), . . 4 - ’ ) ( t ) ) + b U ( t ) (54)

where f is an unknown continuous function, b is the control gain, and U E R and y E R are the input and output of the system, respectively. The control objective is to force the state vector 2 = [x. X I . . . , 2(n-1)]T to follow a specified desired trajectory z d = [ ~ d , X d 3 . . . , ~ r - ’ ) ] ~ . If we define the tracking error vector as e = x - xd. then the control objective is to design a control law u ( t ) which ensures e + 0 as t + cc. (For simplicity, we assume b = 1 in the following discussion.)

JANC AND SUN: NEURO-FUZZY MODELING AND CONTROL 401

Page 25: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

Equation (54) is a typical feedback linearizable system since it can be reduced to a linear system if f is known exactly. Specifically, the following control law

~ ( t ) = - f ( z ( t ) ) + zp) + kTe (55)

would transform the original nonlinear dynamics into a linear one:

where k = [ I C n , . . . , k1IT is an appropriately chosen vector that ensures satisfactory behavior of the close-loop linear system in (56).

Since f is unknown, an intuitive candidate of U would be

(57) U = - F ( z , p ) + xy) + kTe + v

where v is an additional control input to be determined later, F is an parameterized function (such as ANFIS, neural networks, or any other types of adaptive networks) that is rich enough to approximate f . Using this control law, the close-loop system becomes

e'") + + . . . + k,e = ( f - F ) + V . (58)

Now the problem is divided into two tasks:

so that F ( z , p ) z f ( z ) for all z.

is approximating f during the whole process.

How to update the parameter vector p incrementally

How to apply v to guarantee global stability while F

The first task is not too difficult as long as F , which could be a neural network or a fuzzy inference system, is equipped with enough parameters to approximate f . For the second task, we need to apply the concept of a branch of nonlinear control theory called sliding control [85], [loo]. The standard approach is to define an error metrics as

The equation s ( t ) = 0 defines a time varying hyper- plane in R" on which the tracking error vector e ( t ) = [e@), e ( t ) , . . . , e"-'(t)lT decays exponentially to zero, so that perfect tracking can be obtained asymptotically. More- over, if we can maintain the following condition:

then Is( t ) I will approach the hyperplane Is@) I = 0 in a finite time less than or equal to l s (O) l /q . For details about how to maintain the above condition, the reader is referred to [MI. Applications of this technique to neural and fuzzy control can be found in [81] and [102], respectively. This approach uses a number of nonlinear control design techniques and possesses rigorous proofs for global stability. However, its applicability is restricted to feedback linearizable systems.

F. Gain Scheduling Under certain arrangements, the first-order Sugeno fuzzy

model becomes a gain scheduler that switches between several sets of feedback gains. For instance, a first-order Sugeno fuzzy controller for an hypothetical inverted pen- dulum system with varying pole length may have the following fuzzy if-then rules:

This is in fact a gain scheduling controller, where the scheduling variable is the pole length and the control action is switching smoothly between three sets of feedback gains depending on the value of the scheduling variable. In general, the scheduling variables only appear in the premise part while the state variables only appear in the consequent part. The design method here is standard in gain scheduling: find several nominal points in the space formed by scheduling variables and employ any of the linear control design techniques to find appropriate feedback gains. If the number of nominal points is small, we can construct the fuzzy rules directly. On the other hand, if the number of nominal points is large, we can always use ANFIS to fit desired control actions to a fuzzy controller.

Examples of applying this method to both one-pole and two-pole inverted pendulum systems with varying pole lengths can be found in the demo programs in [32].

G. Others Other design techniques that do not use the learning

algorithm in neuro-fuzzy modeling are summarized here. For complex control problems with perfect plant models,

we can always use gradient-free optimization schemes, such as genetic algorithms [19], [23], simulated annealing [45], downhill Simplex method [67], and random method [63]. In particular, use of genetic algorithms for neural network controllers can be found in [ 1 111; for fuzzy logic controllers, see [39], [40], [53].

If the plant model is not available, we can apply rein- forcement learning [2] to find a working controller directly. The close relationship between reinforcement learning and dynamic programming was addressed in [3], [108]. Other variants of reinforcement learning includes temporal dif- ference methods (TD(X) algorithms) and Q-learning [ 1061. Representative applications of reinforcement learning to fuzzy control can be found in [4], [12], [521, [581.

Some other design and analysis approaches for fuzzy controllers include cell-to-cell mapping techniques [ 131, [86], model-based design method [97], self-organizing con- trollers [74], [98], and so on. As more and more people are working in this field, new design methods are coming out sooner than before.

402 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Page 26: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

VI. CONCLUDING REMARKS [lo], [1051, noise or echo cancelling [ I lo], predictive coding [54], and so on.

A. Current Problems and Possible Solutions A typical modeling problem includes structure determi-

nation and parameter identijkation. We address the param- eter identification problem for ANmS in this paper, which is solved via the back-propagation gradient descent and the least-squares method. The structure determination problem, which deals with the partition style, the number of MF’s for each input, and the number of fuzzy if-then rules, and so on, is now an active research topic in the field. Work along this direction includes Jang’s fuzzy CART approach [31], Lin’s reinforcement learning method [56], Sun’s fuzzy k-d trees [91], Sugeno’s iterative method [90] and various clustering algorithms proposed by Chiu [ 151, Khedkar [44], and Wang [ 1031. Moreover, advances on the constructive and destructive learning of neural networks [18], [54] can also shed some lights on this problem.

Though we can speed up the parameter identification problem by introducing the least-squares estimator into the learning cycle, gradient descent still slows down the training process and the training time could be prohibitively long for a complicated task. Therefore the need to search for better learning algorithms hold equally true for both neural networks and fuzzy models. Variants of gradient descent proposed in the neural network literature; includ- ing second-order back-propagation [7 I ] , quick-propagation [17], and so on, can be used to speed up training. A number of techniques used in nonlinear regression can also contribute in this regard, such as the Guass-Newton method (linearization method) and the Marquardt proce- dure [62]. Another important resource is the rich literature of optimization, which offers many better gradient-based optimization routines, such as quadratic programming and conjugate gradient descent.

B. Future Directions Due to the extreme flexibility of adaptive networks,

ANFIS can have a number of variants that are different from what we have proposed here. For instance, we can replace the II nodes in layer 2 of ANFIS with the parameterized T- norm operator [I61 and let the learning algorithm decide the best T-norm function for a specific application. By employing the adaptive network as a common framework, we have also proposed other adaptive fuzzy models tailored for different purposes, such as the neuro-fuzzy classifier [92], [93] for data classification and the fuzzy filter scheme [94], [95] for feature extraction. There are a number of possible extensions and applications and they are currently under investigation.

During the past years, we have witnessed the rapid growth of the application of fuzzy logic and fuzzy set theory to consumer electronic products, automotive industry and process control. With the advent of fuzzy hardware with possibly on-chip learning capability, the applications to adaptive signal processing and control are expected. Potential applications within adaptive signal processing includes adaptive filtering [21], channel equalization [SI,

ACKNOWLEDGMENT The authors wish to thank Steve Chiu for providing

numerous helpful comments. Most of this paper was fin- ished while the first author was a research associate at UC Berkeley, so the authors would like to acknowledge the guidance and help of Prof. Lotfi A. Zadeh and other members of the “fuzzy group” at UC Berkeley.

REFERENCES

[I ] K. J. Astrom and B. Wittenmark, Computer Controller Sys- tems: Theory and Design. Englewood Cliffs, NJ: Prentice- Hall, 1984.

[2] A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuron- like adaptive elements that can solve difficult learning control problems,” IEEE Trans. Syst., Man. and Cyhern.. vol. 13, pp. 834-846, Oct. 1983.

[3] A. G. Barto, R. S. Sutton, and C. J. C. H Watkins, “Learning and sequential decision making,” in Lcarning and Coniputational Neuroscience, M. Gabriel and J. W. Moore, Eds. Cambridge: MIT Press, 1991.

[4] H. R. Berenji and P. Khedkar, “Learning and tuning fuzzy logic controllers through reinforcements,” IEEE Trans. Neural Networks, vol. 3, pp. 724740, May 1992,.

[5] J . A. Bernard, “Use of rule-based system for process control.” IEEE Control Syst. Mugazine, vol. 8, no. 5, pp. 3-1 3, 1988.

[61 P. Bonissone, V. Badami, K. Chiang, P. Khedkar, K. Marcelle, and M. Schutten, “Industrial applications of fuziy logic at general electric,” in Proc. IEEE,this issue.

[7] D. S. Broomhead and D. Lowe. “Multivariable functional interpolation and adaptive networks,” Complex Sysrems. vol. 2, pp. 321-355, 1988.

[8] S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares leaning algorithm for radial basis function networks,” IEEE Trans. Neural Networks, vol. 2, pp. 302-309, Mar. 1991.

[9] S. Chen, G. J . Gibson, C. F. N. Cowan, and P. M. Grant. “Adap- tive equalization of finite nonlinear channels using multilayer perceptrons,” Sign. Proc. , vol. 20, pp. 107-1 19. 1990.

[IO] S. Chen, G. J. Gibson, C. F. N. Cowan, and P. M. Grant, “Reconstruction of binary signals using an adaptive radial- basis-function equalizer,” S i p . Proc. , vol. 22, pp. 77-93, 1991.

(111 V. C. Chen and Y. H. Pao, “Learning control with neural networks,” in Proc. Int. Conf: on Rohotics arid Automcction,

1121 Y.-Y. Chen, “A self-learning fuzzy controller,” in Proc. IEEE Int. Conf on Fuzzy Sy’st., Mar. 1992.

1131 Y.-Y. Chen and T.-C. Tsao, “A description of the dynamic be- havior of fuzzy systems,” IEEE Trans. Sy’sf., Man. and Cyhern., vol. 19, pp. 745-755, July 1989.

1141 S. Chiu, S. Chand, D. Moore, and A. Chaudhary, “Fuzzy logic for control of roll and moment for a flexible wing aircraft,” IEEE Control Sysr. Magazine, vol. I I , no. 4, pp. 42 48, 1991.

[ 151 S. L. Chiu, “Fuzzy model identification based on cluster esti- mation,” J . Intell. und F u z y Syst.. vol. 2, no. 3. 1904.

[I61 D. Dubois and H. Prade, Fuzzy Sets and Systems: ‘Thcop and Applications. New York: Academic, 1980.

[ 171 S . E. Fahlman, “Faster-learning variations on back-propagation: An empirical study,” in Proc. 1988 Connectioni.vt Models Sum- mer School, D. Touretzky, G. Hinton, and T. Sejnciwski, Eds. Camegie Mellon University, 1988, pp. 38-5 1.

[ 181 S. E. Fahlman and C. Lebiere, “The cascade-correlation learning architecture,” in Advances in Neural Infbrmation P rocessing Systems If, D. S. Touretzky, G. Hinton, and T. Sejnowski. Eds. New York: Morgan Kaufmann, 1990.

[ 191 D. E. Goldberg. Genetic Algorilhms in Search, Optimi:ation, and Machine Learning. Reading, MA: Addison-Wesley, 1989.

[20] G. C. Goodwin and K. S. Sin, Adaptive Filtering Prediction and Control. Englewood Cliffs, NJ: F’rentice-Hall, 1984.

[21] S. S. Haykin, Adaptive Filter Theoy. Englewood Cliffs, NJ: Prentice-Hall, 2nd ed., 1991.

1989, pp. 1448-1453.

JANG AND SUN, NEURO-FUZZY MODELING AND CONTROL 403

Page 27: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

1261

I281

1291

1321

1331

(341

[351

I361

1421

1431

1461

~471

J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the The- oy~’ of Neural Computation. Reading, MA: Addison-Wesley, 1991. J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor, MI: Univ. Michigan Press, 1975. T. C. Hsia, System Identijication: Least-Squares Methods. New York: Heath, 1977. J . 3 . R. Jang, “Fuzzy modeling using generalized neural net- works and (K}alman filter algorithm,” in Proc. 9th Nut. Con$ on Artif: Intell. (AAAI-91), July 1991, pp. 762-767. -, “Rule extraction using generalized neural networks,” in Proc. 4th IFSA World Congress, July 1991, pp. 82-86 (Volume for Artificial Intelligence). __ , “A self-leaming fuzzy controller with application to auto- mobile tracking problem,” in Proc. IEEE Roundtable Discussion on Fuzzy and Neural Systems, and Vehicle Application, Tokyo, Japan, Institute of Industrial Science, Univ. of Tokyo, Nov. 1991, page paper no. 10. -, “Furzy controller design without domain experts,” in Proc. IEEE Int. Coqf on Fuzzy Syst., Mar. 1992. __ , “Self-learning fuzzy controller based on temporal back-propagation,’’ IEEE Trans. Neural Networks, vol. 3, pp. 716723, Sept. 1992. __ , “ANFIS: Adaptive-network-based fuzzy inference sys- tems,” IEEE Trans. Syst., Man, and Cybern., vol. 23, pp. 665-685, May 1993. -, “Structure determination in fuzzy modeling: A fuzzy CART approach,” in Proc. IEEE Inr. Con$ on Fuzzy Syst., Orlando, FL, June 1994. J . 3 . R. Jang and N. Gulley, The Fuzzy Logic Toolbox for Use with MATUB. J.-S. R. Jang and C.-T. Sun, “Functional equivalence between radial basis function networks and fuzzy inference systems,” IEEE Trans. Neural Networks, vol. 4, no. I , pp. 156-159, Jan. 1993. -, “Predicting chaotic time series with fuzzy if-then rules,” in Proc. IEEE Int. Cont on Fuzzy Syst., San Francisco, Mar. 1993. __ , “Neuro-fuzzy modeling: an computational approach to intelligence,” 1995, submitted for publication. R . D. Jones, Y. C. Lee, C. W. Bames, G. W. Flake, K. Lee, and P. S. Lewis. “Function approximation and time series prediction with neural networks,” in Proc. IEEE In?. Joint Con$ on Neural Networks, 1990, pp. 1-649-665. M. 1. Jordan and R. A. Jacobs, “Hierarchical mextures of experts and the EM algorithm,” MIT Tech. Rep., 1993. A. Kandel, Ed., Fuzzy Expert Systems. Boca Raton, FL: CRC Press, 1992. C . I-. Karr. “GA’s for fuzzy controllers,” AI Expert, vol. 6, no. 2 , pp. 26-33, Feb. 1991. c‘. L. Karr and E. J. Gentry, “Fuzzy control of pH using genetic algorithms,’’ IEEE Trans. Fuzzy Syst., vol. 1, pp. 46-53, Feb. 1993. Y. Kasai and Y. Morimoto, “Electronically controlled contin- uously variable transmission,” in Proc. Int. Congress Transp. Electronics. Dearbom, MI, 1988. M. Kawato, K. Furukawa, and R. Suzuki, “A hierarchical neural network model for control and learning of voluntary movement,” Biological Cybern., vol. 57, pp. 169-185, 1987. D. J. Kelly, P. D. Burton, and M. A. Rahman, “The application of a neural fuzzy controller to process control,” in Proc. Int. Joint Con$ cfthe N. Am. Fuzzy Inform. Processing Society Bian- nual Con$, the Industrial Fu7.z~ Contr~~l and Intell. Syst. Con$, und the NASA Joint Technol. Workshop on Neural Networks and Fuzzy Logic, San Antonio, TX, Dec. 1994. P. S. Khedkar, Learning as Adaptive Interpolation in Neural Fuzzy Systems, Ph.D. dissertation, Computer Science Division, Dept. EECS, Univ. Calif. at Berkeley, 1993. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Res. Rep. 9335, IBM, T. J. Watson Center, 1983. __, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671-680, May 1983. M. S. Klassen and Y.-H. Pao, “Characteristics of the functional- link net: A higher order delta rule net,” in IEEE Proc. In?. Con$ on Neural Networks. San Diego, June 1988. B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Svsrems Approach. Englewood Ciffs, NJ: Prentice Hall, 1991.

Natick, MA: The Mathworks, Inc., 1995.

[49] A. S . Lapedes and R. Farber, “Nonlinear signal processing using neural networks: Prediction and system modeling,” Tech. Rep. LA-UR-87-2662, Los Alamos National Lab., Los Alamos, NM, 1987.

[50] C.-C. Lee, “Fuzzy logic in control systems: Fuzzy logic con- troller-Part 1,” IEEE Trans. Syst., Man, and Cyhern., vol. 20, pp. 404-418, Feb. 1990.

[51] -, “Fuzzy logic in control systems: fuzzy logic con- troller-Part 2,” IEEE Trans. Syst., Man, and Cvbern., vol. 20, pp. 419-435, Feb. 1990.

[52] __ , “A self-learning rule-based controller employing approx- imate reasoning and neural net concepts,” Int. J. Inte/Z. Syst., vol. 5, no. 3, pp. 71-93, 1991.

[53] M. A. Lee and H. Takagi, “Integrating design stages of fuzzy systems using genetic algorithms,” in Proc. 2nd IEEE In?. Con$ on Fuzzy Syst., San Francisco, 1993, pp. 612-617.

[54] T.-C. Lee, Structure Level Adaptation for Arrz$cial Neural Networks. New York Kluwer, 1991.

[55] Fujitec Company Limited, “FLEX-8800 series elevator group control system,” Osaka, Japan, 1988.

[56] C.-T. Lin and C. S. G. Lee, “Reinforcement structurelparameter leaming for neural-network-based fuzzy logic control systems,” IEEE Trans. Fuzzy Syst., vol. 2, Jan. 1994.

[57] -, “Neural-network-based fuzzy logic control and decision system,” IEEE Trans. Computers, vol. 40, pp. 1320-1336, Dec. 1991.

[58] __ , “Reinforcement structure/parameter learning for neural- network-based fuzzy logic control systems,” in Proc. IEEE In?. Con$ on Fuzzy Systems, San Francisco, Mar. 1993, pages 88-93.

[59] L. Ljung, System Identification: Theory for the User. Engle- wood Cliffs, NJ: Prentice-Hall, 1987.

[60] M. C. Mackey and L. Glass, “Oscillation and chaos in physio- logical control systems,” Science, vol. 197, pp. 287-289, July 1977.

[61] E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” Int. J. Man-Machine Studies, vol. 7, no. 1, pp. 1-13, 1975.

[62] D. W. Marquardt, “An algorithm for least squares estimation of nonlinear parameters,” J . Society of Ind. and Appl. Math., vol. 2, pp. 431-441, 1963.

[63] W. S. Meisel, “Computer-oriented approaches to pattem recog- nition,” in Mathematics in Science and Engineering, Vol. 83. New York: Academic, 1972.

[64] J. Moody, “Fast learning in multi-resolution hierarchies,” in Advances in Neural Information Processing Systems I , D. S. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann, chap. 1,

[65] J. Moody and C. Darken, “Learning with localized receptive fields,” in Proc. 1988 Connectionist Models Summer School, D. ’Touretzky, G. Hinton, and T. Sejnowski, Eds. Morgan K a u f m d C a m e g i e Mellon Univ., 1988.

[66] __, “Fast learning in networks of locally-tuned processing units,” Neural Computation, vol. 1 , pp. 281-294, 1989.

[67] J. A. Nelder and R. Mead, “A simplex method for function minimization,” Computer J., vol. 7, pp. 308-313, 1964.

[68] D. H. Nguyen and B. Widrow, “Neural networks for self- leaming control systems,” IEEE Control Syst. Magazine, pp. 18-23, Apr. 1990.

[69] N. J. Nilsson, Learning Machines: Foundations of Trainable Pattern Classifying Systems. New York: McGraw-Hill, 1965.

[70] Y.-H. Pao, Adaptive Pattern Recognition and Neural Nei- works. Reading, MA: Addison-Wesley, 1989, chap. 8, pp.

[71] D. B. Parker, “Optimal algorithms for adaptive networks: Sec- ond order back propagation, second order direct propagation, and second order [ HJebbian learning,” in Proc. IEEE Inc. Con$ on Neural Networks, 1987, pp. 593-600.

[72] N. Pfluger, J. Yen, and R. Langari, “A defuzzification strategy for a fuzzy logic controller employing prohibitive information in command formulation,” in Proc. IEEE frit . Conj: on Fuzzy Syst., San Diego, Mar. 1992, pp. 717-723.

[73] M. J. D. Powell, “Radial basis functions for multivariable interpolation: A review,” in Algorithms for Approximation, J. C. Mason and M. G. Cox, Eds. Oxford, UK: Oxford Univ.

[74] T. J. Procyk and E. H. Mamdani, “A linguistic self-organizing process controller,” Automatica, vol. 15, pp. 15-30, 1978.

pp. 29-39, 1989.

197-222.

Press, 1987, pp. 143-167.

404 PROCEEDINGS OF THE IEEE, VOL. 83, NO. 3, MARCH 1995

Page 28: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

[75J D. Psaltis, A. Sideris, and A. Yamamura, “A multilayered neural network controller,” IEEE Control Syst. Magazine. vol. 8, no. 4, pp. 17-21, Apr. 1988.

[76] R. S. Crowder 111, “Predicting the Mackey-Glass timeseries with cascade-correlation learning,” in Proc. I990 Connectionist Models Summer School, D. Touretrky, G. Hinton, and T. Sejnowski, Eds. Pittsburgh: Carnegie Mellon Univ., 1990. pp. I 1 7-1 23.

[771 F. Rosenblatt, Principles of Neurodynunzics: PerceptronA und the Theory of Brain Mechanisms.

[78] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learn- ing internal representations by error propagation,” in furallel Distributed Processing: E.rp1orarioti.s in the Microstructure of Cognition, W. I , D. E. Rumelhart and James L. McClelland, Eds. Cambridge, MA: MIT Press, 1986. chap. 8, pp. 3 18-362.

I791 T. A. Runkler and M. Glesner. “Defuzzification and ranking in the context of membership value semantics, rule modality, and measurement theory,” in Europe. Congress on Fuzzy und Intell. Techno[., Aachen, Sept. 1994.

[801 T. D. Sanger. “A tree-structured adaptive network for function approximate in high-dimensional spaces,” IEEE Trans. Neural Nerworks. vol. 2, pp. 285-293, Mar. 1991.

(8 1 1 R. M. Sanner and J. J. E. Slotine, “Gaussian networks for direct adaptive control.” IEEE Trans. Neural Networks. vol. 3 , pp. 837-862. 1992.

[82] S. Shah. F. Palmieri. and M. Datum, “Optimal filtering algo- rithms for fast learning in feedforward neural networks.” Neural Nijfworks, vol. 5, no. 5, pp. 779-787, 1992.

[83] S . Shar and F. Palmieri, “MEKA-a fast, local algorithm for training feedforward neural networks,” in Proc. Int. Joirit Corcc on Neural Networks, 1990, pp. 111 41-46,

[84] S. Singhal and L. Wu, “Training multilayer perceptrons with the extended kalman algorithm,” in Advunces in Neural Information Processing Systems I , David S. Touretzky, Ed. New York: Morgan Kaufmann, 1989, pp. 133-140.

[ U ] J.-J. E. Slotine and W. Li, Applied Nonlineur Control. Engle- wood Cliffs, NJ: Prentice Hall. 1991.

[86] S. M. Smith and D. J. Comer, “Automated calibration of a fuzzy logic controller using a cell state space algorithm,” lEEE Control Syst. Mugazine, vol. 1 1 . no. 5, pp. 18-28, Aug. 1991.

[E71 K. Stokbro, D. K. Umberger, and J . A. Hertz. “Exploiting neurons with localized receptive fields to learn chaos,” Complex SJstems, vol. 4, pp. 603-622. 1990.

[88] M. Sugeno, Ed.. Industria/ App/icutions of Fuzzy Control. New York: Elsevier, 1985.

[89] M. Sugeno and G. T. Kang, “Structure identification of fuzzy model,’’ Fuzz?; Sets and Systems, vol. 28, pp. 15-33, 1988.

[90] M. Sugeno and T. Yasukawa. “A fuzzy-logic-based approach to aualitative modeling.” IEEE Truns. Fuzzv Svst., vol. I , UP.

New York: Spartan. 1962.

. , .. 7T3.1, Feb. 1993.

[91] C.-T. Sun, “Rulebase structure identification in an adaptive network based fuzzy inference system,” lEEE Trans. Fuzzy Syst., vol. 2, pp. 64-73. Feb. 1994.

R. Jang, “Adaptive network based fuz,zy classification,” in P roc. Japan-L’SA Syrnp. on Flexible Automa- tion. July 1992.

[93] ___ , “A neuro-fuzzy classifier and its applications,” in Proc. IEEE Int. Conf on Fuzzy Sysr., San Francisco, Mar. 1993.

[94] C.-T. Sun, J.-S. R. Jang, and C.-Y. Fu. “Neural network analysis ot plasma spectra,” in Proc. Int. Conj: on Arrlficial Neural Networks, Amsterdam, Sept. 1993.

[95] C.-T. Sun, T.-Y. Shuai, and G.-L. Dai, “Using fuzzy filters as feature detectors,” in Proc. IEEE Int. ConJ: on Fuzzy Syst., Orlando, FL, June 1994, vol. I , pp. 406410.

[96] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,” ZEEE Trans. Systems, Man, and Cybernetics, vol. 15, pp. 116--132, 1985.

[97] K . Tanaka and M. Sugeno, “Stability analysis and design of fuzzy control systems,” Fuzzy Set.v and Syst., vol. 45, pp. 135-156, 1992.

[98] R. Tanscheit and E. M. Scharf, “Experiments with the use of a rule-based self-organizing controller for robotics applications,” Fuzzy Sets und Syst.. vol. 26, pp. 195-214, 1988.

[99] Y. Tsukamoto, “An approach to fuzzy reasoning method,” in Advunces in Fuzzy Set Theoq and Applications, Madan M. Gupta, Rammohan K. Ragade, and Ronald R. Yager, Eds. Amsterdam: North-Holland, 1979, pp. 137-149.

[ 1001 V. 1. Utkin, “Variable structure systems with sliding mode: A survey,” IEEE Trunx Autom. Control, vol. 22, p. 212, 1977.

[92] C.-T. Sun and

lo l l K. P. Venugopal, R. Sudhakar, and A. S. Pandya, “An improved scheme for direct adaptive control of dynamical systems using backpropagation neural networks,” .I. Circ., Syst., Signal Proc., to be published.

1021 L.-X. Wang, “Stable adaptive fuzzy control of nonlinear sys- tems” IEEE Trans. Fuzzy Syst., vol. I , pp. 146-155, Jan. 1993.

1031 -, “Traning fuzzy logic systems using nearest neighbor- hood clustering,” in Proc. IEEE Int. Con$ on Furzy Syst., San Francisco, CA, Mar. 1993.

[lo41 L.-X. Wang and J. M. Mendel, “Back-propagation fuzzy sys- tems as nonlinear dynamic system identifiers,” in Proc. IEEE Int. Cont on Fuzzy Syst., San Diego, Mar. 1992.

[ 1051 -, “Fuzzy adaptive filters, with application to nonlinear channel equalization,” IEEE Trans. Fuzzy Sur., vol. 1, pp. 161-170, Mar. 1993.

(1061 C . J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 279-292. 1992.

[lo71 P. Werbos. “Beyond regression: New tools for prediction and analysis in the behavioral sciences,” Ph.11. dissertation, Harvard University, 1974.

[IO81 -, “A menu for designs of reinforcement learning over time,” in Neurtrl Networks for Conrrol. W. T. Miller 111, R. S. Sutton, and P. J. Werbos, Eds. Bradford, MA: MIT Press, 1990.

[lo91 B. Widrow and M. A. Lehr. “30 years of adaptive neural networks: Perceptron, madline, and backpropagation,” in Proc. IEEE, vol. 78, pp. 1415-1442. Sept. 1990.

[ I IO] B. Widrow and D. Steams, Adaptive Signal Proc,es.sing. En- glewood Cliffs, NJ: Prentice-Hall, 19815.

[ 1 I 11 A. P. Wieland, “Evolving controls for unstable systems,’’ in Proc. 1990 Connectionist Models Summer School, 11. Touretzky, G. Hinton, and T. Sejnowski, Eds. Camegie Mellon Univ.,

[ I 121 R. J. William and D. Zipser, “A learning algorithm for contin- ually running fully recurrent neural networks,” Neural Compu- tation, vol. I , pp. 270-280, 1989.

[ I 13) R. R. Yager and D. P. Filev, “SLIDE: A simple adaptive defuzzification method,” IEEE Tram. Fuzzy Syst., vol. I , pp. 69-78, Feb. 1993.

[ 1141 S. Yasunobu and G. Hasegawa, “Evaluation of an automatic container crane operation system based on predictive fuzzy control.” Control Theorv and Advanced Technol., vol. 2. no.

1990, pp. 91-102.

2, pp. 419432, 1986. [ I 151 S. Yasunobu and S. Miyamoto, “Automatic train operation by

predictive fuzzy control,” in Industrial Applications of Fuzzy Control, M. Sugeno, Ed. Amsterdam: North-Holland, 1985.

[ I 161 L. A. Zadeh, “Fuzzy sets,” Inform. and Contr.. vol. 8, pp. 338-353, 1965.

[ 1171 -, “Outline of a new approach to the analysis of complex systems and decision processes,” IEEE Trans. Sy.ct., Man. and Cybern., vol. 3, pp. 2 8 4 4 , Jan. 1973.

p. 1-18.

Jyh-Shing Roger Jang (Member, IEEE) was born in Taipei, Taiwan, in 1962 He received the B S. degree in electrical engineering from National Taiwan University in 1984, and the Ph.D degree in the Department of Electrical Engineering and Computer Sciences at the Uni- versity of Califomla, Berkeley, in 1992.

During the summer of 1989, he was a sum- mer student in NASA Arne\ Re5earch Center, working on the deugn and implementation of fuzzy controllers. Between 1991 and 1992 he

was a Rescarch Scienti5t in the Lawrence Livermore National Laboratory, working on spectrum modeling and analysis using neural networks and fuzzy logic. After receiving the Ph.D. degree, he was a Research Associate in in the same department, working on machine learning techniques using fuzzy logic Since 1993, he has been with The Mathworks. Inc., working on the Fuzzy Logic Toolbox used with MATLAB Hir intents lie in the area of neuro-fuzzy modeling, system identification, machine learning, nonlinear regrewon, optimization, and computer aided control system design

JANG AND SUN, YEURO-FL LLY MODELING AND CONTROL 405

Page 29: Neuro-Fuzzy Modeling and Controlgraphs, fuzzy interpolation, fuzzy topology, fuzzy rea- soning, fuzzy inferences systems, and fuzzy modeling. The applications, which are multi-disciplinary

Chuen-Tsai Sun (Member, IEEE) received the B S degree in electrical engineering in 1979 and the M A degree in history in 1984, both from National Taiwan University, Taiwan He received the Ph D degree in computer science from the University of Califomia at Berkeley in 1992

During 1989-1990 he worked as a consultant with the Pacific Gas and ElectriL Company, San Francisco, CA, where he was in charge of designing and implementing an expert system

for protective device coordmation in electnc distribution circuits During 1901-1992 he was a research wentist in the Lawrence Livermore National Laboratory, working on plasma analysis using neural networks and f u r y logic Since 1992, he has been on the faculty of the Department of Computer and Information Science at National Chiao Tung Univer\ity His Lurrent research interests include computational intelligence, system modeling, and computer assisted learning

Dr Sun was the Arthur Gould Tasheira Scholarship winner in 1986 He wab dso honored with the Phi Hua Scholar Award in 1985 for his publications i n history

406 PROCEEDINGS OF THE IEEE, VOL.. 83. NO. 3. MARCH 1995