A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

17
1424 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009 A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil Blending Wen Yu, Senior Member, IEEE Abstract—In this brief, we propose a new fuzzy-neural-network (FNN) modeling approach which is applied for the modeling of crude-oil blending. The structure and parameters of FNNs are up- dated online. The new idea for the structure identification is that the input (precondition) and the output (consequent) spaces par- titioning are carried out in the same time index. This idea gives a better explanation for input–output mapping of nonlinear sys- tems. The contributions of the parameters identification are as fol- lows: 1) A time-varying learning rate is applied for the commonly used backpropagation algorithm, and the upper bound of mod- eling error and stability are proved, and 2) since the data of the precondition and the consequent are in the same temporal interval, we can train each rule by its own group data. Index Terms—fuzzy neural networks, online clustering, crude-oil blending. I. INTRODUCTION B OTH NEURAL networks and fuzzy logic are universal estimators; they can approximate any nonlinear function to any prescribed accuracy, provided that sufficient hidden neu- rons or fuzzy rules are given. Recent results show that fusion procedure of these two different technologies seems to be very effective for nonlinear-system modeling [4]. It falls in two cat- egories [15], [17]: structure identification and parameter iden- tification. The parameter identification is usually addressed by some gradient-descent variants, e.g., the least squares algorithm and backpropagation (BP) [22]. Structure identification is to select fuzzy rules; it lies on a substantial amount of heuristic observation to express proper strategy’s knowledge. It is often tackled by offline trial-and-error approaches, like the unbias criterion [19]. There are several approaches which generate fuzzy rules from nu- merical data. One of the most common methods for structure initialization is uniform partitioning of each input variable; it results to a fuzzy grid [13]. In [2], the Takagi-Sugeno-Kang (TSK) model is used for designing various neurofuzzy iden- tifiers. The earlier approaches consist of two learning phases. First is structure learning which involves finding the main input variables of all the possible, specifying the membership func- tions, partitioning the input space, and determining the number of fuzzy rules. Second is parameter learning that involves the unknown parameters determination and optimization; it uses Manuscript received January 24, 2008; revised August 23, 2008. Manuscript received in final form October 14, 2008. First published April 14, 2009; cur- rent version published October 23, 2009. Recommended by Associate Editor J. Sarangapani. The author is with the Departamento de Control Automático, Centro de In- vestigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cin- vestav-IPN), Av.IPN 2508, México D.F., 07360, México (e-mail: [email protected] vestav.mx). Color versions of one or more of the figures in this brief are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCST.2008.2008194 some optimization methods based on the linguistic information obtained from the human expert and the numeric data of the ac- tual system. These two learning phases are interrelated; none of them can be carried out independently from the other one. Tra- ditionally, they are done sequentially and offline. The parameter updating is employed after the structure is decided. Most of structure-identification methods are based on data clustering, such as fuzzy C-means clustering [5], mountain clustering [17], and subtractive clustering [7]. These approaches require that all input–output data are ready before we start to identify the plant. These structure-identification approaches are offline. There are a few online-clustering methods in the literature. In order to maintain up-to-date clustering structure, an online version of the classical K-means clustering algorithm is devel- oped by Beringer and Hüllermeier [3]. In [11], the input space is partitioned according to an aligned-clustering-based algorithm. After the number of rules is decided, the parameters are tuned by recursive least square algorithm. A combination of the on- line clustering and the genetic algorithm for fuzzy systems is proposed in [12]. The preconditions of a fuzzy systems are on- line constructed by an aligned-clustering-based approach. The consequents are designed by genetic reinforcement learning. In [21], the input space is automatically partitioned into fuzzy subsets by adaptive resonance theory mechanism; fuzzy rules that tend to give high output error are split in two by a specific fuzzy-rule splitting procedure. In [20], the Takagi–Sugeno fuzzy inference system is applied for online knowledge learning; it requires more training data than the models which use global generalization such as adaptive-network-based fuzzy inference system (ANFIS) [13] and multilayer perceptrons [22]. The on- line clustering for the input-output data with a recursively calcu- lated spatial proximity measure is given in [1], It is instrumental for the online identification of Takagi-Sugeno models with the recursive modified weighted least squares estimation. There exist two weaknesses in the earlier online-clustering methods: 1) The input-output mapping of nonlinear systems is through time, but the input (precondition) and the output (con- sequent) spaces partitioning does not take into account the same time interval, and 2) since the data of precondition and conse- quent are not assured to be in the same temporal interval, they have to use all data to train each rule. In this brief, a novel online-clustering approach is proposed to overcome the aforementioned two weaknesses for nonlinear- system modeling. There are three new contributions of this brief. 1) The new idea for the structure identification is that the input (precondition) and the output (consequent) spaces parti- tioning are carried out in the same time index which in turn renders a better explanation for input–output mapping of nonlinear systems. 2) A time-varying learning rate is applied to the BP algorithm; the upper bound of modeling error and stability are ob- tained. 1063-6536/$26.00 © 2009 IEEE Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Transcript of A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

Page 1: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

1424 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil BlendingWen Yu, Senior Member, IEEE

Abstract—In this brief, we propose a new fuzzy-neural-network(FNN) modeling approach which is applied for the modeling ofcrude-oil blending. The structure and parameters of FNNs are up-dated online. The new idea for the structure identification is thatthe input (precondition) and the output (consequent) spaces par-titioning are carried out in the same time index. This idea givesa better explanation for input–output mapping of nonlinear sys-tems. The contributions of the parameters identification are as fol-lows: 1) A time-varying learning rate is applied for the commonlyused backpropagation algorithm, and the upper bound of mod-eling error and stability are proved, and 2) since the data of theprecondition and the consequent are in the same temporal interval,we can train each rule by its own group data.

Index Terms—fuzzy neural networks, online clustering,crude-oil blending.

I. INTRODUCTION

B OTH NEURAL networks and fuzzy logic are universalestimators; they can approximate any nonlinear function

to any prescribed accuracy, provided that sufficient hidden neu-rons or fuzzy rules are given. Recent results show that fusionprocedure of these two different technologies seems to be veryeffective for nonlinear-system modeling [4]. It falls in two cat-egories [15], [17]: structure identification and parameter iden-tification. The parameter identification is usually addressed bysome gradient-descent variants, e.g., the least squares algorithmand backpropagation (BP) [22].

Structure identification is to select fuzzy rules; it lieson a substantial amount of heuristic observation to expressproper strategy’s knowledge. It is often tackled by offlinetrial-and-error approaches, like the unbias criterion [19]. Thereare several approaches which generate fuzzy rules from nu-merical data. One of the most common methods for structureinitialization is uniform partitioning of each input variable; itresults to a fuzzy grid [13]. In [2], the Takagi-Sugeno-Kang(TSK) model is used for designing various neurofuzzy iden-tifiers. The earlier approaches consist of two learning phases.First is structure learning which involves finding the main inputvariables of all the possible, specifying the membership func-tions, partitioning the input space, and determining the numberof fuzzy rules. Second is parameter learning that involves theunknown parameters determination and optimization; it uses

Manuscript received January 24, 2008; revised August 23, 2008. Manuscriptreceived in final form October 14, 2008. First published April 14, 2009; cur-rent version published October 23, 2009. Recommended by Associate Editor J.Sarangapani.

The author is with the Departamento de Control Automático, Centro de In-vestigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cin-vestav-IPN), Av.IPN 2508, México D.F., 07360, México (e-mail: [email protected]).

Color versions of one or more of the figures in this brief are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCST.2008.2008194

some optimization methods based on the linguistic informationobtained from the human expert and the numeric data of the ac-tual system. These two learning phases are interrelated; none ofthem can be carried out independently from the other one. Tra-ditionally, they are done sequentially and offline. The parameterupdating is employed after the structure is decided. Most ofstructure-identification methods are based on data clustering,such as fuzzy C-means clustering [5], mountain clustering [17],and subtractive clustering [7]. These approaches require thatall input–output data are ready before we start to identify theplant. These structure-identification approaches are offline.

There are a few online-clustering methods in the literature.In order to maintain up-to-date clustering structure, an onlineversion of the classical K-means clustering algorithm is devel-oped by Beringer and Hüllermeier [3]. In [11], the input space ispartitioned according to an aligned-clustering-based algorithm.After the number of rules is decided, the parameters are tunedby recursive least square algorithm. A combination of the on-line clustering and the genetic algorithm for fuzzy systems isproposed in [12]. The preconditions of a fuzzy systems are on-line constructed by an aligned-clustering-based approach. Theconsequents are designed by genetic reinforcement learning.In [21], the input space is automatically partitioned into fuzzysubsets by adaptive resonance theory mechanism; fuzzy rulesthat tend to give high output error are split in two by a specificfuzzy-rule splitting procedure. In [20], the Takagi–Sugeno fuzzyinference system is applied for online knowledge learning; itrequires more training data than the models which use globalgeneralization such as adaptive-network-based fuzzy inferencesystem (ANFIS) [13] and multilayer perceptrons [22]. The on-line clustering for the input-output data with a recursively calcu-lated spatial proximity measure is given in [1], It is instrumentalfor the online identification of Takagi-Sugeno models with therecursive modified weighted least squares estimation.

There exist two weaknesses in the earlier online-clusteringmethods: 1) The input-output mapping of nonlinear systems isthrough time, but the input (precondition) and the output (con-sequent) spaces partitioning does not take into account the sametime interval, and 2) since the data of precondition and conse-quent are not assured to be in the same temporal interval, theyhave to use all data to train each rule.

In this brief, a novel online-clustering approach is proposedto overcome the aforementioned two weaknesses for nonlinear-system modeling. There are three new contributions of this brief.

1) The new idea for the structure identification is that the input(precondition) and the output (consequent) spaces parti-tioning are carried out in the same time index which in turnrenders a better explanation for input–output mapping ofnonlinear systems.

2) A time-varying learning rate is applied to the BP algorithm;the upper bound of modeling error and stability are ob-tained.

1063-6536/$26.00 © 2009 IEEE

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 2: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING 1425

3) An application on modeling of crude-oil blending is pro-posed to show that the online-clustering method can be ap-plied on nonlinear-system modeling via fuzzy neural net-works (FNNs).

II. NONLINEAR-SYSTEM MODELING VIA FNNS

We start from a state-space discrete-time smooth nonlinearsystem

(1)

where is input vector, is state vector,and is output vector. and are general nonlinearsmooth functions. Equation (1) can be rewritten as

(2)

Denoting ,, ,

and . Since (1) is a smooth nonlinear system,(2) can be expressed as . Thisleads to multivariable nonlinear autoregressive moving average(NARMA) model

(3)

where

(4)

is an unknown nonlinear function representing the plantdynamics, and are measurable scalar input andoutput, and is time delay.

A generic fuzzy model is presented as a collection of fuzzyrules in the following form (Mamdani fuzzy model [16])

(5)

We use fuzzy IF–THEN rules to perform amapping from the input linguistic vector

to the output linguistic vector .and are standard fuzzy sets. Each input

variable has fuzzy sets. In the case of full connection,. By using product inference, center-average,

and singleton fuzzifier, the th output of the fuzzy-logic systemcan be expressed as

(6)

Fig. 1. Partitioning of input–output spaces.

where is the membership functions of the fuzzy sets andis the point at which . When we have prior infor-

mation of the identified plant, we can construct fuzzy rules as(5). The object of fuzzy neural modeling is to find the centervalues of , as well as the membership functions

, such that the FNNs (6) can follow the nonlinearplant (3).

III. STRUCTURE IDENTIFICATION

The objective of structure identification is to partition theinput and the output data of nonlinear systems,here , and to find how manygroups we need or what is for the following rules:

Now, we use the following example to explain the importanceof the online clustering in the same time interval. We consider anonlinear function

(7)

The data pair is shown in Fig. 1; by the normalonline-clustering methods proposed in [1], [11], [20], and [21],the input and output may be partitioned into four groups. Thesegroups can be formed into four rules as “IF is , THEN

is ,” . Obviously, for the third rule: “IFis , THEN is ,” it does not satisfy the relation

(7)because the precondition and the consequent donot occur at the same time.

One of possible methods to deal with this kind of continu-ously increasing sequence of time-stamped data is to use an in-cremental version of K-means algorithm [3]. Where the stan-dard K-means algorithms run on the current data streams. Whennew block is available for all streams, the current streams are up-dated by a sliding-windows operation. Therefore, the clusteringstructure of the current streams is taken as an initialization forthe clustering structure of the new streams.

In this brief, the basic idea of online clustering is that the inputand the output spaces partitioning are carried out in the sametime interval. If the distance from a point to the center is less

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 3: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

1426 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

than a required length, the point is in this group. When new datacome, the center and the group should be changed according tothe new data. We give the following algorithm. The Euclideandistance at time is defined as

(8)

where and are the centers of and at time andand are positive factors; normally we can choose

. ; ; ;. Usually, fuzzy models are developed by parti-

tioning data in input and output space domains separately. In thisbrief, we consider time domain as . The newidea of the online clustering of this brief is that the input–outputspaces partitioning is carried out in the same temporal interval.There are two reasons: 1) Nonlinear-system modeling is to finda suitable mapping between the input and the output spaces,these two spaces are connected by time index, and 2) we willpropose an online modeling approach based on the online clus-tering. When a new group (or a new rule) is created, we do notwant to use all data to train it as in [1], [11], [20], and [21].

If the data have time property, we can use the data in thecorresponding time interval to train the rule. Therefore, clus-tering with time interval will simplify parameter identificationand make the online modeling easier. For the Group , the cen-ters are updated by

(9)

where is the first number of the Group and is the lastnumber of the Group . The length of Group is

. The time interval of Group is . The process ofthe structure identification can be formed as the following steps.

1) For the first data , . are the centers ofthe first group, .

2) If new data come, then ; we use(8) and (9) to calculate . If no any new data come, go to5).

3) If , then is still in group ; go to 2).4) If , then is in a new group ;

the center of the is ;go to 2).

5) Check the distances between all centers , if, the two groups

and are combined into one group.There are three design parameters , , and . and can

be regarded as the weights on the input and the output spaces,

respectively. If the input dominates the dynamic property, weshould increase and decrease . Usually, we select

such that the input and the output are of the same impor-tance. If we let , then it becomes the normalonline clustering.

is the threshold of creating new rules; it is the lowest pos-sible value of similarity required to join two objects in onecluster. How to choose the user-defined threshold is a tradeoff.If the threshold value is too small, there will still be manygroups present at the end, and many of them will be singletons.Conversely, if the threshold is too large, many objects that arenot very similar may end up in the same cluster. Since

, .If we want the algorithm to partition several groups, we shouldlet ; otherwise, there is only one group.

There are some approaches to select the optimal clusternumber, for example, in [3], the optimal cluster number isupdated by

(10)

where is a quality measure for the cluster number .Going from to means that one of the current clustershas disappeared, e.g., the streams in this cluster have becomevery similar to the streams in a neighbored cluster. Going from

to means that an additional cluster has emerged,e.g., a homogeneous cluster of streams has separated into twogroups. If we do not use the threshold and change the onlineEuclidean distance (8) into the sliding windows, (10) can beapplied to decide the cluster number.

IV. PARAMETER IDENTIFICATION

For the Group , there is one fuzzy rule

(11)We use the input–output data to trainthe membership functions and , i.e., theparameter identification of the membership functions are per-formed in the corresponding input/output time interval found inthe structure identification.

We use Gaussian functions as the membership functions. Ifwe use singleton fuzzifier and Mamdani implication, the outputof the th group can be expressed as

(12)

where is the center of and (normalfuzzy set), . We select

as initial conditions. We use the datapair to find some suitable membership functionsin the zone . It can be transformed into a modelingproblem to determine parameters , , and ; the objectiveis to find the center value of , as well as the membershipfunctions , such that . We assume that the

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 4: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING 1427

data pair in this group can be expressed in Gaussianmembership function

(13)

where , , and are unknown parameters which mayminimize the modeling error . In the case of three independentvariables, a smooth function has Taylor formula

where is the remainder of the Taylor formula. If we letcorrespond to , and and

correspond to , and , then we have

(14)

(15)

where and is a second-

order approximation error of the Taylor series. Using the chainrule, we get

We define

(16)

so

(17)

where .

Theorem 1: If we use Mamdani-type FNN (12) to identifynonlinear plant (13), the following BP algorithm makes mod-eling error bounded

(18)

where ,

, , and

. The identification error satisfiesthe following average performance

(19)

where and .

The only difference between the stable learning (18) and thegradient descent is the learning gain. For gradient descent,is a positive constant; it should be small enough such that thelearning process is stable. In (18), the normalizing learning rate

is time-varying in order to assure that the identification isstable. The time-varying learning rate is easier to be decided;no any prior information is required, for example, we may select

. The contradiction in fast convergence and stable learningmay be avoided. If we select as dead-zone function

ifif

equation (18) is the same as in [23]. If a —modification termor modified —rule term is added in (18), it becomes that of[14]. However, all of them need the upper bound of modelingerror . Moreover, the modeling error is enlarged by the robustmodifications [10]. The learning law with a constant learningrate can assure the parameters convergent to optimal (or localoptimal) values. The learning law (18) cannot assure that theparameters converge to optimal values.

Compared with other FNNs [11], [15], [17], the parameter-identification algorithm (18) proposed in this brief has two ad-vantages: 1) We use the data in the time interval , whichcorresponds to the group , to train each fuzzy rule indepen-dently. Generally, it has better model accuracy than the normalFNNs. 2) The time-varying in the BP-like algorithm canassure the boundedness and the convergence of the modelingerror. The relations between the structure and parameter identi-fications are given in the following remark.

Remark 1: The parameter learning is also affected by thestructure identification. If structure identification is poor (e.g.,

and are not suitable), then the rule (11) cannot representthe mapping in the time interval well; the unmodeled

dynamic in (13) becomes bigger. Therefore, the bound of

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 5: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

1428 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

Fig. 2. TMDB crude-oil blending process.

modeling error increases with Anotherinfluence factor of structure identification is . This may happenwhen new rules are added to the system. If is too small, thetraining data are not enough to guarantee the learning procedureconvergent. If is too big (when , clustering does notexist, it is a normal FNN), then the time interval becomeslonger, and one rule (11) cannot approximate the complete dy-namic in this long period. In this case, the unmodeled dynamic

in (13) increases, and modeling error is bigger.

V. APPLICATION TO THE MODELING

OF CRUDE-OIL BLENDING

Crude oils are often blended to increase the sale price or theprocess-ability of a lower grade crude oil by blending it with ahigher grade higher price crude. American Petroleum Institute(API) gravity is the most used indication of density of crude oil.Usually, crude-oil blending is realized by a model-based opti-mization algorithm. The static properties of crude-oil blendingcan be obtained by thermodynamic analysis. However, mathe-matical models work only in some special conditions [6]. In thisreal application, we have only input/output data; FNNs can beapplied in modeling the crude-oil blending.

In this brief, we discuss a typical crude-oil blending processin PEMEX (a Mexican petroleum company); it is called Ter-minal Marítima de Dos Bocas Tabasco (TMDB). The flow sheetis shown in Fig. 2(a). It has three blenders ( , and ),one dehydration equipment, and one tank. Fig. 2(b) shows thestatic process of the crude- oil blending; is flow rate; andis the property of th feed stock, it can be API gravity. The dataare recorded in the form of Microsoft Excel daily. Each day, wehave input data and output data

, it is called as integrated model. Because there are newdata coming continuously, we will use online-clustering tech-nique proposed in this brief for FNN modeling. Original dataare obtained in each hour; the mean value are calculated and aresaved in each day, such that the measurement noise is reduced.We use 730 input/output pairs, which correspond to two yearsworth of records, to train the fuzzy model.

1) Structure Identification: The following Mamdani fuzzymodel is used, for rule:

(20)

for the crude-oil blending, the output API gravity affects morethan the input flow for partitioning. We select

. From Fig. 3, it is shown that the maximum changes in theinput and the output are about three and one; Remark 2 tellsus . Therefore,should be chosen as ; in this application, we select

. The input–output partitioning of two-month data areshown in Fig. 3, where “ ” represents the center of each group,

is the boundary between the groups. We can see that thereare three groups (rules) in the two-month data. For example,the time intervals of the second group are ,the group length is 15, and the center of the second group is

when . The time intervals of the input and theoutput are identical. Then, we do online clustering for the other22-month data. There exist six combinations for each group(step 5) in Section III). The final group number is .

2) Parameter Identification: Each group has one fuzzyrule in the form of (20). We use (18) to train the membershipfunctions of each rule; in this brief, we select . Afterparameter training, the final fuzzy model is obtained by theproduct inference, center-average, and singleton fuzzifier, i.e.,

.3) Testing: We use 28 testing data, which are one months

worth of records in the other year. In this way, we can assurethat the testing phase is independent of the training phase. Thetesting data are 28 days; it is a very short period as comparedwith the learning period (730), because training phase is slowand needs a large amount of data to assure convergence. Themodeling results are shown in Fig. 4. The first figure showsthe training phase; in order to make it clear, only a part of thetraining data (from 600 to 740) are reported. The behavior of thetesting phase and the modeling error are shown in the secondand third figures in Fig. 4.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 6: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING 1429

Fig. 3. Input and output partitioning of two-month data.

Fig. 4. Modeling of crude-oil blending with online clustering and FNNs.

TABLE ICOMPARISONS FOR DIFFERENT PARAMETERS

A. Comparisons

First, we discuss how the parameters , , and affect thestructure identification and how to select testing sets. The resultsare shown in Table I.

In this section, “Rule #” is rule number; the “Training” and

“Testing” are RMS errors defined as . The“case 1,” “case 2,” and “case 3” correspond to . Thefirst 22-month data are for training; the other one-month dataare for testing. “R1” and “R2” are two random selections for thetraining sets (22-month data) and testing sets (one-month data)

from the two-year data, and , , and .From Table I, we can obtain the following conclusions.

1) and can be regarded as weights in input and output. Inthis application example, we try different values for themand find that and not only decide the structure (rulenumber) but also influence the parameter learning. Thetesting errors reach minimum when and ,and the training errors are almost the same.

2) The number of groups in which the spaces are partitionedalso depends on a threshold parameter . From the earlieranalysis, we know . When is big ( ), thereare less groups. Each group has one fuzzy rule; the numberof fuzzy rules is also less. Now, for the same training data,there are more data in each group (or for each fuzzy rule),so the training error is small. However, the structure errorfor the three fuzzy rules system is big, although the param-eters’ training is good, so the testing error is big. Whenis small ( ), there are more rules and less data ineach rule.

3) If the training set and testing set are selected randomly,the training error and testing error do not change a lot,because the crude-oil blending policy remains the same forthe whole year.

Second, in order to illustrate structure identification, wecompare our approach: Online clustering for FNNs (OFNs),with BP [18]; Online Fuzzy clustering with Independent inputand output partition (OFI) [11], [20]; Discrete-time NeuralNetworks (DNNs) [24]; normal FNNs [13]; and FNNs withstable learning (FNN1) [22]. The training epoch for all modelsare the same, i.e., 730 (two-year data). The results are shownin Table II.

In the method proposed in this brief, we use two learningrates: and . The normal BP algorithm [18]is compared with our stable parameter-identification algorithm(18). We use the same multilayer neural networks as [18]; itis (the numbers of input layer, hidden layer, and outputlayer are 8, 5, and 1, respectively) and . The learning rate

; we found that after it becomes unstable. OFIuse the whole data and BP algorithm to train the fuzzy rules.The thresholds for the input and output are selected in two cases:

and . The DNN [24] usethe similar time-varying learning algorithm; the structure is thesame as BP [18], i.e., and . The normal FNNs [13],[22] uses six and eight rules, training set is from 22-month data,and testing set is the other one-month data. From Table II, wecan obtain the following conclusions.

1) The time-varying learning rate for the steepest descent up-date equations is faster than normal BP, particularly whenthe training data set is not large enough to assure conver-gence of the BP. Therefore, the training and testing errorsof BP are bigger than OFN when the training set is from22-month data. Moreover, FNN2 is better than FNN1.

2) OFI and FNN use the whole data to train all membershipfunctions of the fuzzy system. Although FNNs can learnthrough local learning data inherently due to the local map-ping of fuzzy rules, they are not like OFN which uses thedata in certain interval to train each rule independently.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 7: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

1430 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

TABLE IICOMPARISONS FOR DIFFERENT APPROACHES

OFN has better model accuracy than OFI and FNN for thisexample.

3) OFN and OFI obtain rule number automatically, but OFIdoes not consider the time index. We can find that, withthe same threshold, our method has less model complexity;the fuzzy rules of our method are less with a high modelingaccuracy.

4) It is well known that both neural networks and fuzzy sys-tems can approximate any nonlinear function to any pre-scribed accuracy. However, the fuzzy systems (OFN, OFI,FNN) are more complex than the neural networks (BP,DNN); here, each rule has eight Gaussian membershipfunctions and one consequent parameter. The total param-eters are , while the neural net-works has parameters. Com-paring OFN and DNN, we see that fuzzy systems andneural networks can achieve high modeling accuracy withtime-varying learning rates. However, the hidden nodes ofDNN should be specified in advance, and the structure ofOFN is obtained automatically.

The data arrive once a day. It seems that there is no needfor online learning, because we can retrain a fuzzy system inbatch every day. The advantage of “online” clustering is whenthe nonlinear system is changed, for example, the prescriptionof crude-oil blending is modified. In this case, the history datacannot be applied; for bath method, we have to use forgettingfactor or moving window to select recent data. The online clus-tering proposed in this brief can avoid this kind of problem.

VI. CONCLUSION

In this brief, we have presented a quick and efficient approachfor nonlinear-system modeling using FNNs. Both the structureidentification and the parameter learning are done online. By thenovel online-clustering approach and the time-varying learninglaw, we resolve the two problems in online clustering for non-linear-system modeling: 1) The input and output data are madeto correspond by the same time and (b) the parameters are up-dated by their own group data, and the learning process is stable.

APPENDIX

PROOF OF THEOREM 1

We selected a positive-defined scalar

(21)

where . Theupdating law (18) can be written as

Therefore, we have

(22)

Because ,then

Since , then

Therefore,

We define ; we chooseas

(23)

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 8: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING 1431

where is defined in (19). Therefore, the Lyapunov functionsatisfies

(24)

where is functions and is a function; istime varying. In addition, satisfies

where and are functions. From [8], we knowis stable; the modeling error bounded.

From (23), we have

It is noted that if , then ;the total time during which must be finite. Let

denotes the time interval during . If onlyfinite times that stay outside the ball of radius andthen reenter, will eventually stay inside of this ball. If leavethe ball infinite times, since the total time leave the ballis finite, . Therefore, is

bounded; the identification error and the weights are bounded.Let denotes the largest tracking error during theinterval. Then, bounded implies that

. Therefore, will converge to . Equa-tion (19) is obtained.

REFERENCES

[1] P. Angelov, “An approach for fuzzy rule-base adaptation using onlineclustering,” Int. J. Approx. Reason., vol. 35, no. 3, pp. 275–289, Mar.2004.

[2] M. F. Azeem, M. Hanmandlu, and N. Ahmad, “Structure identificationof generalized adaptive neuro-fuzzy inference systems,” IEEE Trans.Fuzzy Syst., vol. 11, no. 5, pp. 666–681, Oct. 2003.

[3] J. Beringer and E. Hüllermeier, “Online clustering of parallel datastreams,” Data Knowl. Eng., vol. 58, no. 2, pp. 180–204, Aug. 2006.

[4] M. Brown and C. J. Harris, Neurofuzzy Adaptive Modelling and Con-trol. Englewood Cliffs, NJ: Prentice-Hall, 1994.

[5] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Al-gorithms. New York: Plenum, 1981.

[6] D. M. Chang, C. C. Yu, and I. L. Chien, “Coordinated control ofblending systems,” IEEE Trans. Control Syst. Technol., vol. 6, no. 4,pp. 495–506, Jul. 1998.

[7] S. L. Chiu, “Fuzzy model identification based on cluster estimation,”J. Intell. Fuzzy Syst., vol. 2, no. 3, pp. 267–278, 1994.

[8] M. J. Corless and G. Leitmann, “Continuous state feedback guaran-teeing uniform ultimate boundness for uncertain dynamic systems,”IEEE Trans. Autom. Control, vol. AC-26, no. 5, pp. 1139–1144, Oct.1981.

[9] C. A. Gama, A. G. Evsukoff, P. Weber, and N. F. F. Ebecken, “Pa-rameter identification of recurrent fuzzy systems with fuzzy finite-stateautomata representation,” IEEE Trans. Fuzzy Syst., vol. 16, no. 1, pp.213–224, Feb. 2008.

[10] P. A. Ioannou and J. Sun, Robust Adaptive Control. Upper SaddleRiver, NJ: Prentice-Hall, 1996.

[11] C. F. Juang and C. T. Lin, “An online self-constructing neural fuzzyinference network and its applications,” IEEE Trans. Fuzzy Syst., vol.6, no. 1, pp. 12–32, Feb. 1998.

[12] C. F. Juang, “Combination of online clustering and�-value based GAfor reinforcement fuzzy system design,” IEEE Trans. Fuzzy Syst., vol.13, no. 3, pp. 289–302, Jun. 2005.

[13] J. S. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,”IEEE Trans. Syst., Man, Cybern., vol. 23, no. 3, pp. 665–685, May/Jun.1993.

[14] S. Jagannathan and F. L. Lewis, “Identification of nonlinear dynamicalsystems using multilayered neural networks,” Automatica, vol. 32, no.12, pp. 1707–1712, Dec. 1996.

[15] C. T. Lin, Neural Fuzzy Control Systems With Structure and ParameterLearning. New York: World Scientific, 1994.

[16] E. H. Mamdani, “Application of fuzzy algorithms for control of simpledynamic plant,” Proc. Inst. Elect. Eng.—Control Theory Appl., vol.121, no. 12, pp. 1585–1588, 1976.

[17] S. Mitra and Y. Hayashi, “Neuro-fuzzy rule generation: Survey in softcomputing framework,” IEEE Trans. Neural Netw., vol. 11, no. 3, pp.748–768, May 2000.

[18] K. S. Narendra and S. Mukhopadhyay, “Adaptive control using neuralnetworks and approximate models,” IEEE Trans. Neural Netw., vol. 8,no. 3, pp. 475–485, May 1997.

[19] I. Rivals and L. Personnaz, “Neural-network construction and selectionin nonlinear modeling,” IEEE Trans. Neural Netw., vol. 14, no. 4, pp.804–819, Jul. 2003.

[20] G. Serra and C. Bottura, “An IV-QR algorithm for neuro-fuzzy multi-variable online identification,” IEEE Trans. Fuzzy Syst., vol. 15, no. 2,pp. 200–210, Apr. 2007.

[21] S. G. Tzafestas and K. C. Zikidis, “NeuroFAST: Online neuro-fuzzyART-based structure and parameter learning TSK model,” IEEE Trans.Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 797–802, Oct. 2001.

[22] W. Yu and X. Li, “Fuzzy identification using fuzzy neural networkswith stable learning algorithms,” IEEE Trans. Fuzzy Syst., vol. 12, no.3, pp. 411–420, Jun. 2004.

[23] W. Yu, A. S. Poznyak, and X. Li, “Multilayer dynamic neural networksfor nonlinear system online identification,” Int. J. Control, vol. 74, no.18, pp. 1858–1864, Dec. 2001.

[24] X. Li and W. Yu, “Modeling of crude oil blending via discrete-timeneural networks,” Int. J. Comput. Intell., vol. 2, no. 1, pp. 63–70, 2005.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

Page 9: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009 983

Recurrent Neural Networks Training With StableBounding Ellipsoid Algorithm

Wen Yu, Senior Member, IEEE, and José de Jesús Rubio

Abstract—Bounding ellipsoid (BE) algorithms offer an attrac-tive alternative to traditional training algorithms for neural net-works, for example, backpropagation and least squares methods.The benefits include high computational efficiency and fast conver-gence speed. In this paper, we propose an ellipsoid propagation al-gorithm to train the weights of recurrent neural networks for non-linear systems identification. Both hidden layers and output layerscan be updated. The stability of the BE algorithm is proven.

Index Terms—Bounding ellipsoid (BE), identification, recurrentneural networks.

I. INTRODUCTION

R ECENT results show that neural network techniquesseem to be effective to identify a broad category of

complex nonlinear systems, when complete model informa-tion cannot be obtained. Neural networks can be classified asfeedforward and recurrent ones [8]. Feedforward networks, forexample multilayer perceptrons, are implemented to approxi-mate nonlinear functions. The main drawback of these neuralnetworks is that the weights’ updating does not utilize informa-tion on the local data structure and the function approximationis sensitive to the training data [17]. Since recurrent networksincorporate feedback, they have powerful representation ca-pabilities and can successfully overcome the disadvantagesof feedforward networks [13]. Even though backpropagationhas been widely used as a practical training method for neuralnetworks, there are some limitations such as slow convergence,local minima, and sensitivity to noise.

In order to overcome these problems, many methods forneural identification, filtering, and training have been proposed,for example, Levenberg–Marquardt, momentum algorithms[15], extended Kalman filter [23], and least squares approaches[17], which can speed up the backpropagation training. Mostof them use static structures. There are some special restric-tions for recurrent structure. In [2], the output layer must belinear and the hidden-layer weights are chosen randomly. Theextended Kalman filter with decoupling structure has fast con-vergence speed [22], however the computational complexity

Manuscript received December 03, 2007; revised July 17, 2008 and October21, 2008; accepted December 18, 2008. First published May 15, 2009; currentversion published June 03, 2009.

W. Yu is with the Departamento de Control Automático, CINVESTAV-IPN,México D.F. 07360, México (e-mail: [email protected]).

J. de Jesús Rubio is with the Sección de Estudios de Posgrado e Investigación,Instituto Politécnico Nacional-ESIME Azcapotzalco, Col.Sta. Catarina, MéxicoD.F. 07320, México.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNN.2009.2015079

in each interaction is increased. Decoupled Kalman filter withdiagonal matrix [19] is similar to the gradient algorithm, sothe convergence speed cannot be increased. A main drawbackof the Kalman filter training is that theoretical analysis requiresthe uncertainty of neural modeling to be Gaussian process.

In 1979, Khachiyan indicated how an ellipsoid method forlinear programming can be implemented in polynomial time[1]. This result caused great excitement and stimulated a floodof technical papers. The ellipsoid technique is a helpful tool instate estimation of dynamic systems with bounded disturbances[5]. There are many potential applications to problems outsidethe domain of linear programming. Weyer and Campi [27] ob-tained confidence ellipsoids which are valid for a finite numberof data points, whereas Ros et al. [20] presented an ellipsoidpropagation such that the new ellipsoid satisfies an affine re-lation with another ellipsoid. In [3], the ellipsoid algorithmis used as an optimization technique that takes into accountthe constraints on cluster coefficients. Lorenz and Boyd [14]described in detail several methods that can be used to de-rive an appropriate uncertainty ellipsoid for the array response.In [16], the problem concerning asymptotic behavior of ellip-soid estimates is considered for linear discrete-time systems.There are few application of ellipsoid on neural networks. In[4], unsupervised and supervised learning laws in the form ofellipsoids are used to find and tune the fuzzy function rules.In [12], ellipsoid type of activation function is proposed forfeedforward neural networks. In [10], multiweight optimiza-tion for bounding ellipsoid (BE) algorithms is introduced. In[6], a simple adaptive algorithm is proposed that estimates themagnitude of noise. They are based on two operations of ellip-soid calculus: summation and intersection which correspondto the prediction and correction phase of the recursive stateestimation problem, respectively.

In [21], we used the BE algorithm to train recurrent neuralnetworks. But the training algorithm does not have standard re-current form, so theory analysis cannot be implemented. In thispaper, we modify the above algorithm and analyze the stabilityof nonlinear system identification. To the best of our knowledge,neural network training and stability analysis with the ellipsoidor the BE algorithm has not yet been established in the literature,and this is the first paper to successfully apply the BE algorithmfor stable training of recurrent neural networks.

In this paper, the BE algorithm is modified to train the weightsof a recurrent neural network for nonlinear system identifica-tion. Both hidden layers and output layers can be updated. Sta-bility analysis of identification error with the BE algorithm isgiven by a Lyapunov-like technique.

1045-9227/$25.00 © 2009 IEEE

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 10: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

984 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

II. RECURRENT NEURAL NETWORKS TRAINING

WITH BE ALGORITHM

Consider following discrete-time nonlinear system:

(1)

where is a state vector, is an inputvector, is the upper bound of and

and are known. is an unknown nonlinear smoothvector-valued function . We use the following series-parallel [15] recurrent neural network to identify the nonlinearplant (1):

(2)where represents the state of the neural network. Thematrix is a stable matrix. The weights in an outputlayer are , the weights in a hidden layer are

is -dimensional vector function, andis a diagonal matrix

(3)

where and are sigmoid functions. The expressionsand on the right-hand side of (2) can

also be and , respectively, in which caseit is called parallel model [15]. By using our previous resultsin [28], the parallel model training has similar results as theseries-parallel model (2).

According to the Stone–Weierstrass theorem [13], the un-known nonlinear system (1) can be written in the followingform:

(4)

where represents unmodeled dynamics. By [13], we knowthat the term can be made arbitrarily small by simply se-lecting appropriate number of the hidden neurons.

Because the sigmoid functions and are differentiable,based on [18, Lemma 12.5], we conclude that

(5)

where is a known initial constant weight, isthe derivative of nonlinear activation function with respectto and is the remainder of the Lipschitz form (5).Also

(6)

where is the derivative of nonlinear activation functionwith respect to and is theremainder of the Lipschitz form (6). From [18, Lemma 12.5],when the functions and are bounded, and arebounded. So we have

(7)

where is a known initial constant weight,. Similarly

(8)

where . When as a diagonal matrix,substituting (7) and (8) into the plant (4), we have the followingsingle-output form:

(9)

where the output is

is th element of the vector .The unmodeled dynamics are defined as

(10)

the parameter is

, the dataisand .

The output of the recurrent neural network (1) is

(11)

We define the training error as

(12)

The identification error between the plant (1) and theneural network (2) is

(13)

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 11: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM 985

Fig. 1. An ellipsoid.

Fig. 2. Ellipsoid intersection of two ellipsoid sets.

Now we use the BE algorithm to train the recurrent neural net-work (2) such that the identification error is bounded.

Definition 1: A real -dimensional ellipsoid set, centered on, can be described as

where is a positive-definite symmetric matrix. Thevolume of is defined as in [5] and [20]

where is a constant that represents the volume of the unit ballin .

The orientation (direction of axis) of the ellipsoid is de-termined by the eigenvectors of , and the lengthsof the semimajor axes of are determined by the eigenvalues

of . A 2-D ellipsoid is shown in Fig. 1.Definition 2: The ellipsoid intersection of two ellipsoid sets

and is another ellipsoid set [25], definedas

where and and are positive definite symmetricmatrices.

The normal intersection of the two ellipsoid sets isnot an ellipsoid set in general. The ellipsoid set contains thenormal intersection of ellipsoid sets [25], . Fig. 2shows this idea. There exists a minimal volume ellipsoid corre-sponding to , called the optimal bounding ellipsoid (OBE);see [11], [20], and [25]. In this paper, we will not try to find ,but we will design an algorithm such that the volume of the new

ellipsoid intersection will not increase. Now using the ellipsoiddefinition for the neural identification, we define the parametererror ellipsoid as

(14)

where is the unknown optimal weightthat minimizes the modeling error in (9), and .In this paper, we use the following two assumptions.

A1. It is assumed that belongs to an ellipsoidset

(15)

where are the known positive constants, .The assumption A1 requires that is bounded

by . In this paper, we discuss the open-loop identification, andwe assume that the plant (1) is bounded-input–bounded-output(BIBO) stable, i.e., and in (1) are bounded. Sinceand are bounded, all of data in are bounded, so

is bounded. If

By Definition 1, the common center of the sets ,is , so . Finding is an intractable tasksince the amount of information in (15) grows linearly in .Moreover, evaluating a value of in (14) involves the so-lution of th-order inequalities for (15).

A2. It is assumed that the initial weight errors are inside anellipsoid

(16)

where andare the unknown optimal weights.The assumption A2 requires that the initial weights of

the neural networks be bounded. It can be satisfied bychoosing suitable and . From the definition ofin (14), the common center of the sets , is , so

. By (14) and (15), the ellipsoid intersectionsatisfies

(17)

Thus, the problem of identification is to find a minimum set of, which satisfies (14). We will construct a recursive identifi-

cation algorithm such that is a BE set if is a BE set.The next theorem shows the propagation process of these ellip-soids.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 12: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

986 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

Theorem 1: If in (14) is an ellipsoid set, we use the fol-lowing recursive algorithm to update and :

(18)

where is a given diagonal positive-definite matrix, and is apositive constant which is selected such that and .Then is an ellipsoid set and satisfies

(19)

where and is given in (12).Proof: First, we apply the matrix inversion lemma [7] to

calculate by (18)

Since

where , and denote matrices of the correct size,specifically, and

, and, it gives

(20)

so

(21)

where . Now we calculate. By (18), we have

and

(22)

Substituting (21) into (22), it gives

(23)

By the intersection property (17) of the ellipsoid sets, we have

So (23) becomes

Now we use as in(12). The second term of the above equation can be calculatedas

so

From (21), we know ,so . Because

so

(24)

Equation (19) is established. Since and

(25)

is an ellipsoid set.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 13: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM 987

Remark 1: The algorithm (18) has a scalar form, which is foreach subsystem. This method can decrease the computationalburden when we estimate the weights of the recurrent neuralnetwork. A similar idea can be found in [8] and [19]. For eachelement of and in (18), we have

(26)

If does not change as , it becomes the back-propagation algorithm [8]. The time-varying gainin the BE algorithm may speed up the training process. The BEalgorithm (18) has the similar structure as the extended Kalmanfilter training algorithm [22], [23], [26]

(27)

where can be chosen as , where is small and positive,and is the covariance of the process noise. When ,it becomes the least square algorithm [7]. If and

, (27) is the BEalgorithm (18). But there is a big difference: the BE algorithmis for deterministic case and the extended Kalman filter is forstochastic case.

The ellipsoid intersection of and is (17), which is alsoellipsoid, defined as

where is a vector variable and is the center of . Wecannot assure that the center of the ellipsoid intersectionis also , but since the centers of and are , we canguarantee that is inside . From (19) and (18), we know

where and.

If , i.e.,, then . So

, and the volume ofsatisfies

The volume of is less than the volume of when themodeling error is not small and and .Thus, the set will converge to the set when

. This means thatwhen the modeling error is bigger than the unmodeled dynamic

will converge to the set ; see Fig. 3.

Fig. 3. Convergence of the intersection � .

The following steps show how to train the weights of recur-rent neural networks with the BE algorithm.

1) Construct a recurrent neural networks model (2) to identifyan unknown nonlinear system (1). The matrix is selectedsuch that it is stable.

2) Rewrite the neural network in linear form

3) Train the weights as

4) is changed as the BE algorithm

III. STABILITY ANALYSIS

Theorem 1 tells us thatis a BE if A2 is satisfied. So the weights of the neural networksare bounded with the training algorithm (18). The next theoremgives the bound of the identification error.

Theorem 2: If we use the neural network (2) to identify theunknown nonlinear plant (1) with the training algorithm (18),then the identification error is bounded, and the normal-

ization of the training errorconverges to the residual set

(28)

where .

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 14: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

988 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

Proof: We define the following Lyapunov function:

(29)

Evaluating as

by (24), we have, so

From Theorem 1, we knowis a BE set, so . We define

. So

(30)

where . Because

where and are -functions,and is an -function of andis a -function of , so admits the smooth input-to-state(ISS)-Lyapunov function as in [9], the dynamic of the trainingerror is input-to-state stable. The “INPUT” corresponds to thesecond term of the last line in (30). The “STATE” correspondsto the first term of the last line in (30), i.e., the training error

. Because the “INPUT” is bounded and the dynamicis ISS, the “STATE” is bounded.

The training error is not the same as the identificationerror , but they are minimized at the sametime. From (2), (4), (9), and (12), we have

(31)

where . By the relation

because

Since is a constant, the minimization of the training errormeans the upper bound of the identification error

is minimized. When is bounded, is also bounded. From(30) and , we know

(32)

Summarize (32) from to

Since is constant and

(33)

It is (28).Remark 2: Even if the parameters converge to their optimal

values with the training algorithm (18), from (4), we know thatthere always exists unmodeled dynamics (structure error). Sowe cannot reach .

IV. SIMULATIONS

The nonlinear plant to be identified is expressed as [15], [24]

(34)

This input–output model can be transformed into the followingstate–space model

(35)

This unknown nonlinear system has the standard form (1). Weuse the recurrent neural network given in (2) to identify it, where

and.

We select , which is a stable diagonal matrix.The neural identifier (2) can be written in the form of

(36)

Here is nonlinear part. Dynamics of the linear partis determined by the eigenvalues of . In this

example, we found that can assure both stability and fastresponse for the dynamic neural network (36).

Model complexity is important in the context of system iden-tification, which corresponds to the hidden nodes of the neu-romodel. In order to get higher accuracy, we should use morehidden nodes. In [15], the static neural networks needed 20hidden nodes for this example. For this simulation, we try totest different numbers of hidden nodes, and we find that with

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 15: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM 989

Fig. 4. Identification errors of backpropagation and BE.

Fig. 5. Identification errors of Kalman filter and BE.

more than three hidden nodes, the identification accuracy willnot improve much. So we use three nodes in the hidden layer,i.e.,

, and . The initial weights ,and are chosen in random in . The input is

(37)

From Theorem 1, we know that the initial condition forshould be large, and we select .Theorem 1 requires , and corresponds to the learningrate in (18). The bigger is, the faster is the training algorithm,but it is less robust. In this example, we found that issatisfied. Theorem 1 also requires , i.e.,

. It is the upper bound of the modeling error as in (15), andfrom (18), we see that also decides the learning rate. For thisexample, we found that is a good choice.

We compare the BE training algorithm (18) with the stan-dard backpropagation algorithm (26), and the learning rate forthe backpropagation is . In this simulation, we found

Fig. 6. Identification errors of backpropagation and BE with a bounded pertur-bation.

that after the backpropagation algorithm becomes un-stable. If we define the mean squared error for finite time as

then the comparison results for the identification error areshown in Fig. 4.

When the time-varying learning rate in the BEtraining algorithm (18) is constant, e.g., , theBE training becomes backpropagation. The updating steps inthe BE training are variable to guarantee the stability. Also

when is large, so this BE algorithm per-forms much better than backpropagation.

Extended Kalman filter training algorithms [22], [23], [26]are also effective when disturbances are white noise or smallbounded noise, which is

(38)

We choose and . The comparison results forthe identification error are shown in Fig. 5.

Now we repeat the above simulations with a bounded pertur-bation in the input. This input–output model can be writtenas

(39)

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 16: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

990 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

Fig. 7. Identification errors of Kalman filter and BE with a bounded perturba-tion.

where is a uniform random noise. When it is bounded by0.02, the comparisons between the BE algorithm and backprop-agation are shown in Fig. 6. When it is bounded by 0.2, thecomparisons between the BE algorithm and extended Kalmanfilter are shown in Fig. 7. They show that the BE technique doesbetter than both backpropagation and extended Kalman filter inthe presence of non-Gaussian noise.

The leaning rate in the extended Kalman filter (38) andthe BE (18) are similar. Both of them have fast convergencespeeds. The extended Kalman filter requires that the externaldisturbances be white noises. On the other hand, the noises in theBE training are required to be bounded. When the disturbancesare big, the bounded ellipsoid training proposed in this paperhas less steady error than the extended Kalman filter algorithm.

Theorem 1 gives necessary conditions of and for stablelearning, i.e., and . In this example, we found thatif and , the learning process becomes unstable.

V. CONCLUSION

In this paper, a novel training method for recurrent neural net-work is proposed, and the BE algorithm is modified for neuralidentification. Ellipsoid intersection and ellipsoid volume areintroduced to explain the physical meaning of the proposedtraining algorithms. Both hidden layers and output layers ofthe recurrent neural networks can be updated. Lyapunov-liketechnique is used to prove that the ellipsoid intersection canbe propagated and the BE algorithm is stable. The proposedconcept can be extended for feedforward neural networks.The BE algorithm may also be applied to nonlinear adaptivecontrol, fault detection and diagnostics, performance analysisof dynamic systems and time series, and forecasting.

REFERENCES

[1] R. G. Bland, D. Goldfarb, and M. J. Todd, “The ellipsoid method: Asurvey,” Oper. Res., vol. 29, pp. 1039–1091, 1981.

[2] F. N. Chowdhury, “A new approach to real-time training of dynamicneural networks,” Int. J. Adapt. Control Signal Process., vol. 31, pp.509–521, 2003.

[3] M. V. Correa, L. A. Aguirre, and R. R. Saldanha, “Using steady-stateprior knowledge to constrain parameter estimates in nonlinear systemidentification,” IEEE Trans. Circuits Syst. I, Fund. Theory Appl., vol.49, no. 9, pp. 1376–1381, Sep. 2002.

[4] J. A. Dickerson and B. Kosko, “Fuzzy function approximation withellipsoid rules,” IEEE Trans. Syst. Man Cybern. B. Cybern., vol. 26,no. 4, pp. 542–560, Aug. 1996.

[5] E. Fogel and Y. F. Huang, “On the value of information in systemidentification: Bounded noise case,” Automatica, vol. 18, no. 2, pp.229–238, 1982.

[6] S. Gazor and K. Shahtalebi, “A new NLMS algorithm for slow noisemagnitude variation,” IEEE Signal Process. Lett., vol. 9, no. 11, pp.348–351, Nov. 2002.

[7] G. C. Goodwin and K. Sang Sin, Adaptive Filtering Prediction andControl. Englewood Cliffs, NJ: Prentice-Hall, 1984.

[8] S. Haykin, Neural Networks-A Comprehensive Foundation. NewYork: Macmillan, 1994.

[9] Z. P. Jiang and Y. Wang, “Input-to-state stability for discrete-time non-linear systems,” Automatica, vol. 37, no. 2, pp. 857–869, 2001.

[10] D. Joachim and J. R. Deller, “Multiweight optimization in optimalbounding ellipsoid algorithms,” IEEE Trans. Signal Process., vol. 54,no. 2, pp. 679–690, Feb. 2006.

[11] S. Kapoor, S. Gollamudi, S. Nagaraj, and Y. F. Huang, “Tracking oftime-varying parameters using optimal bounding ellipsoid algorithms,”in Proc. 34th Allerton Conf. Commun. Control Comput., Monticello,IL, 1996, pp. 1–10.

[12] N. S. Kayuri and V. Vienkatasubramanian, “Representing boundedfault classes using neural networks with ellipsoid activation functions,”Comput. Chem. Eng., vol. 17, no. 2, pp. 139–163, 1993.

[13] E. B. Kosmatopoulos, M. M. Polycarpou, M. A. Christodoulou, andP. A. Ioannou, “High-order neural network structures for identifica-tion of dynamic systems,” IEEE Trans. Neural Netw., vol. 6, no. 2, pp.422–431, Mar. 1995.

[14] R. G. Lorenz and S. P. Boyd, “Robust minimum variancebeam-forming,” IEEE Trans. Signal Process., vol. 53, no. 5, pp.1684–1696, May 2005.

[15] K. S. Narendra and K. Parthasarathy, “Identification and control of dy-namic systems using neural networks,” IEEE Trans. Neural Netw., vol.1, no. 1, pp. 4–27, Mar. 1990.

[16] S. A. Nazin and B. T. Polyak, “Limiting behavior of bounding ellipsoidfor state estimation,” in Proc. 5th IFAC Symp. Nonlinear Control Syst.,St. Petersburg, Russia, 2001, pp. 585–589.

[17] A. G. Parlos, S. K. Menon, and A. F. Atiya, “An algorithm approachto adaptive state filtering using recurrent neural network,” IEEE Trans.Neural Netw., vol. 12, no. 6, pp. 1411–1432, Nov. 2001.

[18] A. S. Poznyak, E. N. Sanchez, and W. Yu, Differential Neural Networksfor Robust Nonlinear Control. Singapore: World Scientific, 2001.

[19] G. V. Puskorius and L. A. Feldkamp, “Neurocontrol of nonlinear dy-namic systems with Kalman filter trained recurrent networks,” IEEETrans. Neural Netw., vol. 5, no. 2, pp. 279–297, Mar. 1994.

[20] L. Ros, A. Sabater, and F. Thomas, “An ellipsoid calculus based onpropagation and fusion,” IEEE Trans. Syst. Man Cybern. B, Cybern.,vol. 32, no. 4, pp. 430–442, Aug. 2002.

[21] J. J. Rubio and W. Yu, “Neural networks training with optimal boundedellipsoid algorithm,” in Advances in Neural Networks-ISNN 2007, ser.Lecture Notes in Computer Science 4491. Berlin, germany: Springer-Verlgag, 2007, pp. 1173–1182.

[22] J. J. Rubio and W. Yu, “Nonlinear system identification with recurrentneural networks and dead-zone Kalman filter algorithm,” Neurocom-puting, vol. 70, no. 13, pp. 2460–2466, 2007.

[23] D. W. Ruck, S. K. Rogers, M. Kabrisky, P. S. Maybeck, and M. E.Oxley, “Comparative analysis of backpropagation and the extendedKalman filter for training multilayer perceptrons,” IEEE Trans. PatternAnal. Mach. Intell., vol. 14, no. 6, pp. 686–691, Jun. 1992.

[24] P. S. Sastry, G. Santharam, and K. P. Unnikrishnan, “Memory neuralnetworks for identification and control of dynamic systems,” IEEETrans. Neural Netw., vol. 5, no. 2, pp. 306–319, Mar. 1994.

[25] F. C. Schweppe, Uncertain Dynamic Systems. Englewood Cliffs, NJ:Prentice-Hall, 1973.

[26] S. Singhal and L. Wu, “Training multilayer perceptrons with theextended Kalman algorithm,” Adv. Neural Inf. Process. Syst. I, pp.133–140, 1989.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

Page 17: A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM 991

[27] E. Weyer and M. C. Campi, “Non-asymptotic confidence ellipsoids forthe least squares estimate,” in Proc. 39th IEEE Conf. Decision Control,Sydney, Australia, 2000, pp. 2688–2693.

[28] W. Yu, “Nonlinear system identification using discrete-time recurrentneural networks with stable learning algorithms,” Inf. Sci., vol. 158, no.1, pp. 131–147, 2002.

Wen Yu (M’97–SM’04) received the B.S. degreein electrical engineering from Tsinghua University,Beijing, China, in 1990 and the M.S. and Ph.D.degrees in electrical engineering from NortheasternUniversity, Shenyang, China, in 1992 and 1995,respectively.

From 1995 to 1996, he served as a Lecturer atthe Department of Automatic Control, NortheasternUniversity. In 1996, he joined CINVESTAV-IPN,México, where he is currently a Professor at theDepartamento de Control Automático. He also held

a research position with the Instituto Mexicano del Petróleo, from December2002 to November 2003. Since October 2006, he has been a senior visitingresearch fellow at Queen’s University Belfast. He also held a visiting profes-

sorship at Northeastern University in China from 2006 to 2008. His researchinterests include adaptive control, neural networks, and fuzzy control.

Dr. Yu serves as an Associate Editor of Neurocomputing and the InternationalJournal of Modelling, Identification and Control. He is a member of the MexicanAcademy of Science.

José de Jesús Rubio was born in México City in1979. He graduated in electronic engineering fromthe Instituto Politecnico Nacional, México, in 2001.He received the M.S. and Ph.D. degrees in automaticcontrol from CINVESTAV IPN, México, in 2004 and2007, respectively.

He was a full time Professor in the AutonomousMetropolitan University, Mexico City, Mexico, from2006 to 2008. Since 2008, he has been a full time Pro-fessor at the Instituto Politecnico Nacional, ESIMEAzcapotzalco, Mexico. He has published four chap-

ters in international books and ten papers in international magazines and he haspresented more than 20 papers in international conferences. He is a member ofthe adaptive fuzzy systems task force. His research interests are primarily fo-cused on evolving intelligent systems, nonlinear and adaptive control systems,neural-fuzzy systems, mechatronic, robotic, and delayed systems.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.