Recurrent neural networks for nonlinear output regulation

�This paper was not presented at any IFAC meeting. This paperwas recommended for publication in revised form by Guest EditorsF. L. Lewis and K. S. Narendra. This study is supported by theHong Kong Research Grants Council under Grant CUHK4150/97E.*Corresponding author. Tel.: #852-2609-8048; fax: #852-2603-

6002.E-mail address: [email protected] (J. Wang).

Automatica 37 (2001) 1161}1173

Recurrent neural networks for nonlinear output regulation�

Yunong Zhang, Jun Wang*Department of Automation & Computer-Aided Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong

Received 7 June 2000; revised 24 January 2001; received in "nal form 6 March 2001

Abstract

Based on a power-series approximation method, recurrent neural networks (RNN) are proposed for real-time synthesis andauto-tuning of feedback controllers for nonlinear output regulation systems. The proposed neurocontrol approach represents a novelapplication of recurrent neural networks to the nonlinear output regulation problem. The proposed approach completely inherits thestability and asymptotic tracking properties guaranteed by original nonlinear output regulation systems, due to its globallyexponential convergence. Excellent operating characteristics of the proposed RNN-based controller and the closed-loop nonlinearcontrol systems are demonstrated by using simulation results of the ball-and-beam system and the inverted pendulum on a cartsystem. � 2001 Elsevier Science Ltd. All rights reserved.

Keywords: Recurrent neural networks; Nonlinear output regulation; Pole assignment; Ball-and-beam system; Inverted pendulum on a cart system

1. Introduction

In the past two decades, neural networks for controlapplications have been investigated extensively. SeeLewis (1999) for a framework and paradigm of intelligentcontrol based on neural networks and fuzzy logic. Mostof those proposed neural network approaches to controlapplications use neural networks (typically feedforwardneural networks) as representations of plants and/or con-trollers trained via supervised learning (e.g., Narendra& Parthasarathy, 1990; Levin & Narendra, 1993; Chen& Liu, 1994; Polycarpou, 1996; Jagannathan & Lewis,1996; Rovithakis, 1999; Chu & Huang,1999; Rovithakis& Christodoulou, 2000). In many areas of linear controland some areas of nonlinear control, the theory anddesign methodology are well studied. In such areas neu-ral networks can play a supportive role to synthesize andtune feedback controllers autonomously instead of re-placing conventional controllers. This approach, com-bining the existing results in the literature of control

theory and neural networks, is more acceptable by engin-eers and practitioners. Because of the inherently paralleldistributed nature in neural computation, neural net-works can be a viable tool and e!ective computationalmodel to online synthesize linear and nonlinear controlsystems.A problem of major importance in control applica-

tions is to design and implement control laws to achieveasymptotic tracking and/or disturbance rejection fornonlinear control systems, which is known as nonlinearoutput regulation problem or servomechanism problem.The nonlinear output regulation problem aims to designa feedback control such that the closed-loop system isstable and its output asymptotically approaches someexogenous signal generated by an exosystem. As shown"rst in Isidori and Byrnes (1990), and Huang and Rugh(1992a), the key condition for the solution of this problemvia either state-feedback or output-feedback control isthe solvability of the so-called regulator equation. Ifsolvable, then under some standard assumptions, thereexists a state-feedback or output-feedback control lawsuch that the closed-loop system is internally stable, andthe tracking error will asymptotically approach zero forall su$ciently small initial conditions of the plant andsu$ciently small reference inputs and/or disturbances.However, due to the nonlinear nature, it is in generaldi$cult to solve the regulator equation exactly and ana-lytically. Thus, it may be necessary to use approximation

0005-1098/01/$ - see front matter � 2001 Elsevier Science Ltd. All rights reserved.PII: S 0 0 0 5 - 1 0 9 8 ( 0 1 ) 0 0 0 9 2 - 9

methods, such as the power-series approximationmethod proposed by Huang and Rugh (1992b), to yielda feedback control law to make the asymptotic trackingerror arbitrarily small.Based on a power-series approximation method, the

synthesis of such a feedback control law involves thesolution for nonlinear output regulation feedback gain,which can be transformed or viewed to be an extention ofreal-time and adaptive pole assignment, where the asso-ciated coe$cient matrices are possibly time-varying inlarge magnitudes and relatively high frequencies corre-sponding to the exogenous signal. Thus, those traditionalo!-line pole assignment algorithms with constant feed-back gain matrices and without large domain of conver-gent attractionmay be unable to stabilize such open-loopsystems. Besides, based on system identi"cation, theslowly time-varying nature of the nonlinear plant alsoentails online adjustment of controller parameters andhence complicates the computation tasks. This is themotivation for us to propose a parallel computationapproach, i.e., a class of recurrent neural networks as thereal-time synthesizers and tuners for adaptive controllersof nonlinear output regulation systems.

2. Problem formulation

2.1. Nonlinear output regulation problem

Consider a general nonlinear plant described by

x� (t)"f (x(t),u(t), v(t)), x(0)"x�,

(1)y(t)"h(x(t), u(t), v(t)),

where x3R�, u3R�, y3R� are, respectively, the statevector, plant input, output, and v3R� is the exogenoussignal generated by an exosystem:

v� (t)"a(v(t)), v(0)"v�

which models both reference inputs and disturbances. Anerror measure e(t)"y(t)!d(v(t)) re#ects the goal oftracking or rejecting components of v(t). Here, all thefunctions f, h, a and d are assumed to be smooth in anopen neighborhood � of the origin, and take the value ofzero at the origin in respective Euclidean spaces.Based on the state-feedback control law u(t)"

k(x(t), v(t)), the closed-loop system can be written as

x� (t)"f�(x(t), v(t)), x(0)"x

�,

(2)y(t)"h

�(x(t), v(t)),

where f�(x(t), v(t))"f (x(t), k(x(t), v(t)), v(t)) and h

�(x(t), v(t))"

h(x(t),k(x(t), v(t)), v(t)).The objective of output regulation is to implement

a feedback control law such that the closed-loop system

(2) is stable and e(t) asymptotically approaches zero. Asshown "rst in Isidori and Byrnes (1990), the key condi-tion for the solution of this problem is the existence ofa zero-error manifold for the plant, which can be reph-rased as the existence of su$ciently smooth functionsx(v) and u(v) de"ned on � such that x(0)"0, u(0)"0, and

�x(v)�v

a(v)"f (x(v),u(v), v),

0"h(x(v),u(v), v)!d(v). (3)

Eq. (3) is known as the regulator equation or Isidori}Byrnes equation. The kth-order power-series approxima-tion solution to (3) proposed by Huang and Rugh (1992b)is described as: for some positive integer k, letx��(v)"��

��v��, u

��(v)"��

��v�� where v��"

[v��, v��

�v,2, v��

�vv,2, v�

�]�, such that x

��(0)"0,

u��(0)"0, and (x

��(v),u

��(v)) su$ciently approximate the

exact solution to the regulator equation (3) for v3�.Theorem 1.2 in Huang and Rugh (1992b) o!ers a su$-cient condition for the existence of a kth-order approxi-mation solution based on the linearizations of the plantand exosystem at the origin; i.e.,

rank��f (0,0,0)/�x!�I �f (0,0,0)/�u

�h(0,0,0)/�x �h(0,0,0)/�u�"n#p (4)

for all � given by

�"��

#2#��,�j"1, 2,2, k,

i�,2, i

3�1, 2,2, q�,

(5)

where ��,2, �

�are eigenvalues of �a(0)/�v. Thus, if (4)

and (5) hold, x��(v) and u

��(v) can be obtained as degree-k

polynomials in v by solving recursive sets of linear matrixequations generated by substituting appropriate poly-nomial expressions into (3). Based on the obtained x

��(v)

and u��(v), the control law construction needs to com-

pute K(v) such that

�f�x

(x��(v), u

��(v), v)#

�f�u(x

��(v), u

��(v), v)K(v) (6)

has desired eigenvalues with negative real parts for allv3�. Thereafter, the feedback control is constructed as

u(t)"u��(v(t))#K(v(t))[x(t)!x

��(v(t))], (7)

which has been proven that such kind of feedback con-trol law (7) can make closed-loop system (2) stable andthe tracking error e(t) arbitrarily small for v3�.As briefed in Section 1, the solution for feedback gain

K(v(t)) of (6) can be transformed as an extension of

1162 Y. Zhang, J. Wang / Automatica 37 (2001) 1161}1173

real-time adaptive pole assignment, where the associatedcoe$cient matrices

A(v(t)) :"�f�x

(x��(v),u

��(v), v),

(8)

B(v(t)) :"�f�u(x

��(v),u

��(v), v)

in (6) are time-varying possibly in large magnitudesand relatively high frequencies corresponding to theexogenous signal v(t). Thus, those traditionally o!-linepole assignment algorithms without large domain of con-vergent attraction may be unable to stabilize the open-loop system (1), if they cannot perform such extendedpole assignment (6) exactly or su$ciently fast. Besides, ifbased on system identi"cation, the slowly time-varyingnature of the nonlinear plant (1) also entails online ad-justment of controller parameters and hence complicatesthe computation tasks.Thus, based on linear pole assignment method via

Sylvester equation, a class of recurrent neural networkswill be proposed and investigated to online computefeedback gain K(v(t)) and subsequently generate controllaw (7) to make the closed-loop system stable and track-ing error arbitrarily small.

2.2. Pole assignment via Sylvester equation

First, we review the pole assignment problem in lineartime-invariant control systems. Consider a linear system

x� (t)"Ax(t)#Bu(t), x(0)"x�,

where A3R�� and B3R�� are coe$cient matrices. Ifa linear state-feedback control law u(t)"r(t)#Kx(t)is applied, the closed-loop system is x� (t)"(A#BK)x(t)#Br(t) where r3R� is a reference inputvector, and K3R�� is a state-feedback gain matrix.According to the linear system theory, the matrix K canbe chosen by the pole assignment method; that is, whenall state variables of a system are controllable andmeasurable, the closed-loop poles of the system can beplaced at the desired locations in the left-half complexplane through appropriate state feedback gains.One approach to pole assignment is by means of

solving the following matrix equations:

AZ!Z�"!BG, (9)

KZ"G (10)

for a "xed �3R�� and almost any G3R��. Eq. (9) isknown as the Sylvester equation.

Lemma (Bhattacharyya & de Souza, 1982). If � is cyclicand has prescribed eigenvalues, A and � have no common

eigenvalue (i.e., �(A)��(�)"�) and (G,�) is observable,then the unique solution Z of (9) is almost surely nonsingu-lar with respect to the parameter G and the spectrum of(A#BK) equals that of �.

The uniqueness and nonsingularity of Z are veryimportant properties. For an arbitrary initial conditionof Z, globally asymptotically stable algorithms canconverge to the unique solution for nonsingular Z.Therefore, given controllable (A,B), the usual numericalprocedure for computing K is in two steps: "rst, after �andG appropriately chosen, solve the Sylvester equation (9)for Z; second, solve the linear algebraic equation (10) forK, i.e., K"GZ��. However, due to its serial-processingnature, such two-step numerical algorithm might en-counter problems when applying it to large-scale feed-back control systems or online solving for time-varyingfeedback gainK(v(t)). That is the key reason why parallelcomputation strategies have been considered.

2.3. Recurrent neural networks for pole assignment

In the last decade, neural networks have beenproposed to solve a wide variety of algebraic equationproblems, e.g., for solving systems of linear algebraicequations (Cichocki & Unbehauen, 1992; Xia, Wang, &Hung, 1999), for liner matrix equations (Wang&Mendel,1991; Wang, 1993) and for other matrix problems (Wang,1997), which have laid the basis for pole assignment viaSylvester equation using RNN. There are also somestudies having been reported for pole assignment mainlyof linear time-invariant systems using RNN (Kumar&Guez, 1991; Wang &Wu, 1996; Wang, 1998), while theproposed RNN could be applied for pole assignment oflinear time-varying pairs, then to the aforementionednonlinear output regulation systems, due to the globallyexponential convergence.Solving Eqs. (9) and (10) for Z and K using the RNN

synthesis methodologies, the dynamics of two proposedrecurrent neural networks for pole assignment can bedesigned and described in the interrelated di!erentialequations of this form:

N:dZ(t)

dt"!

�A�[AZ(t)!Z(t)�#BG]

(11)![AZ(t)!Z(t)�#BG]��,

N�:dK(t)

dt"!

��[K(t)Z(t)!G]Z(t)��, (12)

where

and �

are positive design parameters,Z(t)3R�� is the activation state matrix of N

corre-

sponding to Z of (9), K(t)3R�� is the state matrix ofN

�corresponding to the feedback gain K in (10).

The real-time pole assignment process using neuralnetworks is virtually a multiple time-scale system. The

Y. Zhang, J. Wang / Automatica 37 (2001) 1161}1173 1163

Fig. 1. Block diagram of the neural network approach to nonlinearoutput regulation.

plant and controller are in a time scale.NandN

�are in

another time scale smaller than that of the plant andcontroller, which requires that the dynamic process forcomputingKH be su$ciently quick, or the plant be su$-ciently slow time-varying. By properly selecting �,G,

and

�, the convergence of the recurrent neural

networks can be su$ciently expedited. The combinednetworks can also be viewed as a second-orderdynamic system like the models in Rovithakis andChristodoulou (2000).

3. RNN for nonlinear output regulation

3.1. Model description

As stated in Section 2, the real-time solution for non-linear output regulation feedback gain K(v(t)) can beviewed as an extention of real-time pole assignment via(9) and (10), where the associated coe$cient matricesA(v(t)) and B(v(t)) are de"ned in (8). Thus the proposedRNN for extended pole assignment are similarly de-signed and described by

NH:dZ(t)

dt"!

�A(v(t))�;(t)!;(t)��,

;(t) :"A(v(t))Z(t)!Z(t)�#B(v(t))G; (13)

NH�:dK(t)

dt"!

��[K(t)Z(t)!G]Z(t)��. (14)

The neural network approach to nonlinear output regu-lation problem can be elucidated in Fig. 1 by use ofa block diagram; that is, a power-series approximator,recurrent neural networks, and a u(t)-generator consti-tute the RNN-based controller for nonlinear output re-gulation systems. The power-series approximator onlineevaluates x(v(t)) and u(v(t)) for any v(t) according to thegiven kth-order power-series approximation formulae tothe regulator equation. Then based on x(v(t)), u(v(t)), andv(t) the proposed RNN (13) and (14) solve for dynamicfeedback gain K(v(t)) to stabilize the Jacobian lineariz-ation pair (A(v(t)),B(v(t))). Finally, the u(t)-generator isconstructed as (7) to generate control input u(t).Note that the above neurocontrol approach can be

applied to solving K(v(t)) online for nonlinear outputregulation, provided that (A(v(t)),B(v(t))) are always con-trollable (or at least stabilizable), and time-varying ina time scale su$ciently slower than that of NH

and NH

�,

where the former premise is an inherent property oforiginal nonlinear plant of interest, while the latter isstated as an assumption to be discussed in the sequel andactually can be generically satis"ed by properly selecting�, G,

and

�.

3.2. Network architecture

The neural network approach to pole assignment canbe performed by using two interrelated recurrent neuralnetworks NH

(13) and NH

�(14), one of which computes

Z(v(t)) and another computes K(v(t)). By de"ning hiddenstate matrices>,;,X and<, the dynamical equations ofsuch recurrent neural networks can be decomposed asfollows:

NH:dZ(t)

dt"!

>(t),

>(t)"A(v(t))�;(t)!;(t)��, (15)

U(t)"A(v(t))Z(t)!Z(t)�#B(v(t))G;

NH�:dK(t)

dt"!

�X(t),

X(t)"<(t)Z(t)�, (16)

V(t)"K(t)Z(t)!G.

To facilitate sketching network architectures, the casewhere all the desired poles are real is considered in thissubsection. Note that � represents the spectrum ofa closed-loop system and is normally a diagonal matrix�"diag��

�, �

,2, �

�� so that the connection patterns

can be simpli"ed and the following structural results areestablished.The architecture of NH

for computing Z(v(t)) consists

of one output layer, two hidden layers, and each layerconsists of an n�n array of neurons. The neuron z

�in

the output layer is forward connected with neurons


Fig. 2. Architecture of the jth-column subnetwork in NH.

Fig. 3. Architecture of the ith-row subnetwork in NH�.

u�, ∀l3�1, 2,2, n� in the "rst hidden layer, where the

weights are a��(v(t)), and there exists an extra connection

weight !�from neuron z

to u

. The neuron u

�is

forward connected with neurons y�, ∀p3�1, 2,2, n� in

the second hidden layer, where the weights are a��(v(t)),

respectively, and there also exists an extra connectionweight!�

from neuron u

to y

. The biasing threshold

for neuron u�is (B(v(t))G)

�, while there exists no bias for

y�, ∀p3�1,2,2, n�. Neuron y

�is then backward con-

nected only with the output neuron z�, and its connec-

tion weight is de"ned as !. There is an integral

transformation in the output layer Z and a functionaltransformation in those two hidden layers ;, >. As seenfrom (15) and (16), the symmetric pattern of connectivityshows that NH

(or NH

�) can actually be decomposed into

n (or m) independent subnetworks, and each subnetworkrepresents one column (or row) vector of Z(t) (or K(t)).

Fig. 2 depicts the architecture of the jth-column subnet-work in NH

as discussed above. The architecture of neu-

ral network NH�for computing K(v(t)) can be similarly

sketched with its ith-row subnetwork depicted in Fig. 3.

3.3. Stability analysis

Without loss of generality, assume that the Jacobianlinearization pair (A(v(t)),B(v(t))) is time-varying in a timescale su$ciently slower than that of NH

and NH

�. Speci"-

cally, given controllable (A(v(t)),B(v(t))), starting from anyt�3[0,#R) and within some small time interval t the

maximal variation of (A(v(t)),B(v(t))) is su$ciently smallwhile the proposed RNN has been asymptotically con-vergent to the corresponding theoretical solutionK(v(t

�))

with a su$ciently small relative error, such that we cande"ne A(v(t

�)), B(v(t

�)) and K(v(t

�)) as A, B and KH,


respectively, and the time-invariant models N(11) and

N�(12) su$ciently approximate NH

(13) and NH

�(14) for

any t�and t.

Based on the above assumption,NandN

�are used to

analyze the stability of the proposed RNN for anyt�3[0,#R) and t, instead of NH

and NH

�. Thus, for

some t�, let ZH be an n�n constant matrix denoting the

unique and nonsingular theoretic solution Z(v(t�)) to (9)

after �,G appropriately chosen, and the following resultsare obtained.

Theorem. Given controllable (A,B), if � and G are chosensatisfying conditions in the Lemma, then the equilibriumpoint ZH of N

(11) is (globally) exponentially stable, and

the output K(t) of N�(12) is globally exponentially conver-

gent to KH"G(ZH)��.

Proof. Let ZK (t) :"Z(t)!ZH, thus ZK (t)"03R�� is theunique solution to (9). N

(11) can be reformulated as

dZK (t)/dt"!�A�[AZK (t)!ZK (t)�]![AZK (t)!ZK (t)�]

��.De"ne E"��AZK (t)!ZK (t)��

�where �� ) ��

�denotes the

Frobenius norm. E is proved to be a Lyapunov function,since E is positive de"nite (E'0 if ZK (t)O0 and E"0 i!ZK (t)"0); dE/dt is negative de"nite due to the fact that

dE(ZK (t))dt

"trace��E�ZK �

�dZK (t)dt �

"!��A�(AZK (t)!ZK (t)�)

!(AZK (t)!ZK (t)�)��(0

and dE/dt"0 i! ZK (t)"0. By the Lyapunov stabilitytheory, the equilibrium point ZK (t)"0 is asymptoticallystable (equivalently, ZH of N

(11) is asymptotically

stable). Furthermore,Nis a linear time-invariant model

and can be written as dz� (t)/dt"AI z� (t) where z� (t)"[z(

��(t),2, z(

��(t), z(

�(t),2, z(

��(t)]�3R�

and AI (03R��

.By the linear system theory, the foregoing asymptoticstability ofN

(11) is of a global property and also implies

the exponential stability.Consider the general case where AI has l eigenvalues

�(multiplicity �

�),

(multiplicity �

),2,

�(multiplicity

��), (�

�#�

#2#�

�)"n. Therefore AI "QJQ��,

where J is the Jordan form of matrix AI and Q is theassociated transform matrix. Since exp(AI t)"Q exp(Jt)Q��,z� (t)"exp(AI t)z� (0)"Q exp(Jt)Q��z� (0) where entries ofexp(Jt) has the form of exp(

�t)t�, p"0, 1,2, �

�!1,

q"1, 2,2, l, and ��is the maximal dimension of Jordan

block associated with �in J, and surely Re(

�)(0,

∀q"1, 2,2, l since AI is asymptotic stable. Thus it isreasonable to assume that ZK (t)"[��

��

��c��

exp( �t)t�]

��and Z(t)"[zH

�#��

��

��c��

exp(

�t)t�]

��, where c

��is a coe$cient indexed by i, j, p

and q.

Let KK (t) :"(K(t)!KH)3R��, thus N�(12) can be

reformulated as dKK (t)�/dt"!��Z(t)Z(t)�KK (t)�#

Z(t)ZK (t)�KH��. De"ning KK (t)�"[kK�(t),2, kK

�(t)] and

KH�"[kH�,2, kH

�] where kK

�(t), kH

�are the ith columns of

KK (t)� and KH�, respectively, N�could be decomposed as

dkK�(t)

dt"!

�Z(t)Z(t)�kK

�(t)!

�Z(t)ZK (t)�kH

�,

i"1, 2,2,m. (17)

Since we know the explicit expressions of Z(t) and ZK (t),∀i3�1, 2,2,m�, subsystem (17) can be viewed as a lineartime-varying system, where !

�Z(t)Z(t)� is the time-

varying coe$cient matrix, and !�Z(t)ZK (t)�kH

�is the

forced term. By the linear time-varying system theory(Tsakalis & Ioannou, 1993), the solution to (17) iskK�(t)"�(t, t

�)kK

�(t

�)#� �

��(t ,�)(!

�)Z(�)ZK (�)�kH

�d�

where�(t, t�) is the state transition matrix (here, consider

t�"0).For �(t, t

�), the homogeneous part of system (17) is

studied "rst, i.e., dkK�(t)/dt"!

�Z(t)Z(t)�kK

�(t)"

!(�ZHZH�#

�ZK (t)Z(t)�#

�ZHZK (t)�)kK

�(t). Thus, by

the perturbed linear system theorem (Slotine & Li, 1991),the homogeneous system is (uniformly) globally expo-nentially stable, and thus �k, a'0, ��(t, t

�)��)

k exp(!a(t!t�)), ∀t*t

�*0.

Now return to the originally nonhomogeneous solu-tion to N

�(17). Taking the matrix norm operations, we

know, �c, c(�'0, , ((0 such that

��kK�(t)��)��(t, t

�)�� kK

�(t�)��#

��kH

��

��

��

��(t,�)�� Z(�)ZK (�)��d�

)k exp(!at)��kK�(0)��#

��kH

��k exp(!at)

��

�

exp(a�)c exp( �) d�

"�k��kK � (0)��!c

�k��kH

��

a# �exp(!at)

#

c�k��kH

��

a# exp( t))c(

�exp( ( t).

Clearly lim� �

kK�(t)"0, ∀kK

�(0)3R�, and as time tPR,

KK (t) and K(t) exponentially approach 0 and KH, respec-tively. It follows that KH of N

�is globally exponentially

stable, and that the combined neural networks (11) and(12) are globally exponentially stable. �

By the above theorem, for t�3[0,#R) and t, the

global exponential stability of Nand N

�implies that,

after a period of 4/��

)t, ��K(t)!KH�� will be less than1.85%(+exp(!4)) of

��K(t

�)!KH��, where

�'0


and ��

is the convergence rate of N�. By adjusting

and

�, the (pointwise) convergence rate of the recur-

rent neural networks (13) and (14) can be su$cientlyexpedited so as to satisfy the assumption. Therefore, thereal-time solution K(t) generated by the proposed neuralnetwork approach can su$ciently approximate the the-oretical solutionK(v(t)), which makes the proposed neur-ocontrol inherit the stability and asymptotic trackingproperties of the original nonlinear output regulationsystem formulated in Section 2.1 and proved in Huangand Rugh (1992b).

3.4. Convergence analysis

It is di$cult to obtain the direct relation between theexponential convergence rate �

��and design parameters

�,G,,

�from the proof of the theorem. For simplicity,

we will concentrate on the case where� is in the diagonalform. Hence, �"�� and �� still diagonal for any integerk, thus G could be easily speci"ed as in Wang and Wu(1996) subject to the observability constraint on (�,G)pair.If � is selected in the diagonal form with other con"g-

uration same as the stability theorem, then as for theequilibrium point Z(t)"ZH of N

(11), its exponential

convergence rate is ��

"min

��[(A!�

�I)�

(A!��I)]� where �

�(i"1, 2,2, n) are the prescribed

closed-loop eigenvalues in �. The exponential conver-gence of N

implies that, after a period of 4/�

�,

��Z(t)!ZH�� will be less than 1.85 %��Z(t

�)!ZH��,

where '0 and the convergence rate �

�is mainly

determined by the design parameter and roughly the

minimal di!erence between eigenvalues of open-loop andclosed-loop systems. For example, if

"10 and the

minimal di!erence is more than 1, then the convergencetime to get the relative error of

�1.85%will be roughly

less than 4 �s.Starting from t

�and within t, after some time instant

t;(t

�#t) such that ��AZ(t)!Z(t)�#BG��)�, ∀t

*tor directly t

*4/�

�, neural network N

�(12) is

approximately simpli"ed to

N��:dK(t)

dt"!

��[K(t)ZH!G]ZH��. (18)

For the equilibrium KH of N��(18), its exponential

convergence rate is ��

��"

�min��(ZHZH�)�, which im-

plies that, after a period of 4/��

#4/��

��, ��K(t)!KH��

will be less than 1.85 �%��K(t

�)!KH��, where the con-

vergence rate ��

��is mainly determined by the design

parameter �and roughly the minimal modulus of

ZH eigenvalues, and from (9), it is seen that the minimalmodulus of ZH eigenvalues increases by the same time asG increases.Since the convergence time t

�for the proposed neural

networks (11) and (12) to achieve the approximate solu-

tion K(t) with relative error of 1.85 �% is roughly

4/��

#4/��

��, the overall convergence rate �

��+

��

��

��/(�

�#�

��). It follows that the time-scale assump-

tion can be generically satis"ed by properly settingthose design parameters �, G,

and

�. Furthermore,

such an assumption can be reformulated as a designcriterion; i.e., starting from any t3[0,#R) if withint*t

�the maximal variation of (A(v(t)),B(v(t))) is ac-

ceptably small, then the relative error of dynamicallysolving for K(v(t)) by the proposed neural networkapproach is roughly less than or equal to 1.85

�%.

4. Simulation results

To demonstrate the applicability and performance ofRNN-based controller for nonlinear output regulation,we consider two nonminimum-phase benchmark sys-tems: the ball and beam system and the inverted pendu-lum on a cart system.

4.1. The ball and beam system

A beam is made to rotate in a vertical plane by ap-plying a torque at the center of rotation and a ball is freeto roll on the beam. The motion equation is representedbelow (Hauser, Satry, & Kokotovic, 1992):

x��(t)"x

(t),

x�(t)"bx

�(t)x

�(t)!gb sinx

(t),

x�(t)"x

�(t),

x��(t)"u(t),

y(t)"x�(t),

where x�

is the ball position, g"9.81 m/s andb"0.7143. The objective is to design a state feedbackcontrol law such that the position of the ball asymp-totically tracks the reference input A

�cos�t. Thus the

exogenous system is given as follows:

v��(t)"�v

(t), v

�(0)"A

�,

v�(t)"!�v

�(t), v

(0)"0

and the error measure e(t)"y(t)!v�(t).

Based on the power-series approximation method, wewill apply the proposed recurrent neural networksNH

(15) andNH

�(16) to solving feedback gainK(v(t)), then

generate the feedback control law u(t) of the form (7) tocontrol the output regulation system.The derivation of (4) shows that, conditions for the

existence of a kth-order approximate solution x(v) andu(v) (the subscript (k) omitted for simplicity) are satis"ed


Fig. 4. Time-varying components of A(v(t)).

for any k'0. Thus, a third-order approximate solutionto the regulator equation (3) is given as

�x�(v)

x(v)

x(v)

x�(v)�"�

v�

�v

�v�

gb#

� v�

6gb#

� v�v

gb

�v

gb#

(1!4b)��v�v

2gb#

��v

gb�

and

u(v)"!��v�/(gb)

#��v�(2(1!7b)v

!(1!4b)v

�)/(2gb).

A straightforward derivation gives

A(v(t))"�0 1 0 0

bx�

0 !bg cos x

2bx�x�

0 0 0 1

0 0 0 0 �and B(v(t))"(0 0 0 1)�, where (A(v(t)),B(v(t))) are actuallyalways controllable at any time. To illustrate the operat-ing characteristics of the RNN-based controller and itsclosed-loop nonlinear output regulation system, letA

�"6, �"�/5, �(�)"�!1,!2,!3,!4�. Periodic

time-varying components of A(v(t)) are illustrated inFig. 4, and their maximal magnitude variation is roughly1.131�10�� within t"10�� s. According to the dis-cussions in the preceding section, the design parametersG,

and

�of NH

and NH

�can be set to [1 1 1 1], 10 ,

and 10�, respectively.Fig. 5 depicts the adaptive tuning of output neurons

z�(t) (i, j"1, 2, 3, 4) in NH

. Fig. 6 shows the dynamic

output regulation feedback gain K(t) obtained from the

output layer of NH�. Fig. 7 illustrates the trajectories of

the closed-loop poles �(A(v(t))#B(v(t))K(t)) over time t.Fig. 8 shows the tracking performance of the proposedRNN-based controller for the ball-and-beam system.Since some components of A(v(t)) are time-varying

roughly at two times frequency of the exosystemv(t), NH

and NH

�have to converge faster to generate

K(v(t)) also roughly at two times frequency of v(t), so as toadapt itself to the variation of A(v(t)) and keep theclosed-loop poles su$ciently close to the desired ones�!1,!2,!3,!4�. As seen from Fig. 7, the closed-loop pole !1 has been kept without appreciable error,while others are slightly vibrating in [!2.0023,!1.9977], [!3.0533,!2.9506] and [!4.0564,!3.9575],such vibration is practically acceptable and has littlee!ect on the tracking performance. As shown in Fig.8 and simulation data, the maximal steady-state trackingerror of the proposed neurocontrol system is 3.41�10�,which, under the same kth-order approximation solutionx(v) and u(v), is mainly determined by amplitude andfrequency of the exogenous signal v(t) to be tracked. IfG increases (e.g., by 4 times), the stabilization might beperformed better with Z(t) still in a feasible range.

4.2. The inverted pendulum on a cart system

The inverted pendulum on a cart system is anothernonlinear nonminimum-phase system that can be foundat many universities' control labs. The motion of thesystem is described as (Gurumoorthy & Sanders, 1993):

x��(t)"x

(t),

x�(t)"

1

M#m sinx

(u#m(lx�!g cosx

) sinx

!bx

),

x�(t)"x

�(t),

x��(t)"

1

l(M#m sinx)((M#m)g sinx

#(bx!u!mlx

�sinx

)cosx

), y(t)"x

�(t),


Fig. 5. Adaptive tuning of output neurons z�in NH

.

Fig. 6. Feedback gain K(t) obtained by NH�.

where x�is the position of the cart, M"1.378 kg, m"

0.051 kg, l"0.325 m, b"12.98 kg/s and g"9.81 m/s.The objective is to design a state-feedback control lawsuch that the position of the cart asymptotically tracks

the reference input A�sin�t while keeping the closed-

loop system stable. Thus the exogenous system and errormeasure can be reformulated same as the ball-and-beamexample except for v

�(0)"0, v

(0)"A

�.


Fig. 7. Trajectories of closed-loop poles over time.

Fig. 8. Tracking performance of the ball-and-beam system.

The third-order approximate solution to the regulatorequation (3) is given by

�x�(v)

x(v)

x(v)

x�(v)�"�

v�

�v

a��v�#a

�v�v#a

�v�

�a��v#�a

�v#��v

�v�

and

u(v)"!(M#m sin x)�v

�

!m(lx�!g cos x

) sin x

#bx

,

where

a��

"!�/(g#l�),a�

"(3!g)l/((g#9l�)a��),

a�

"(g#7l�)/(6l�a�)

and

�"3a�

!2a�.

A straightforward computation of (8) gives

A(v(t))"

�0 1 0 0

0!b

M#m sin x

�f

�x

(x,u, v)2mlx

�sin x

M#m sin x

0 0 0 1

0!b cos x

l(M#m sin x

)

�f�

�x

(x,u, v)mlx

�sin 2x

l(M#m sin x

)�


Fig. 9. Feedback gain K(t) obtained by NH�.

�f

�x

(x,u, v)"mlx

�cos x

!mg cos 2x

M#m sin x

!

2mlx�sin x

!2bx

#(2u!g)m sin 2x

2(M#m sin x

)

,

�f�

�x

(x,u, v)"(M#m)g cos x

#(u!bx

) sin x

!mlx

�cos 2x

l(M#m sin x

)

!

m sin 2x(2(M#m)g sin x

#(bx

!u)2 cos x

!mlx

�sin 2x

)

2l(M#m sin x)

.

and

B(v(t))"(0,1/(M#m sin x), 0,

!cos x/(Ml#ml sin x

))�,

where

Compared to the ball-and-beam example, it is muchharder (if not impossible) to derive the analytic formof nonlinear output regulation feedback gain K(v(t))explicitly since A(v(t)) and B(v(t)) in this case aremuch complicated. Let A

�"1.0, �"�/5 and

�(�)"�!0.45,!0.90,!1.35,!1.8�, maximal vari-ation of the periodic time-varying components of A(v(t))and B(v(t)) is roughly 3.8�10�� within t"10�� s, thusG,

and

�of NH

and NH

�can be selected to be

[150 150 150 150], 10 , and 10 , respectively.The operating characteristics of NH

, NH

�and its result-

ing closed-loop system are shown in Figs. 9}11. Fig. 9depicts the dynamic output regulation feedback gainK(t)adaptive tuned for the variation of (A(v(t)),B(v(t))). Figs.10 and 11 show that the closed-loop poles have been keptsu$ciently close to the desired locations with relativeerrors less than 1.87�10��, and that the maximal

steady-state tracking error is less than 2.33�10��, whichimplies, the proposed recurrent neural networks performthe stabilization satisfactorily and subsequently lead tosuperior tracking performance of the inverted pendulumon a cart system.

5. Concluding remarks

In this paper, an RNN-based control system for non-linear output regulation is proposed. By using a power-series approximator, recurrent neural networks aredeveloped for online computation of output regulationfeedback gain so as to generate dynamic controlinput. Compared with the supervised learning approach,the proposed recurrent neural networks are advantage-ous since prior o%ine training is not necessary.Compared with the other recurrent neural networks,the proposed neural network approach is characterizedwith (pointwise) global exponential stability, simplearchitecture due to symmetric connectivity, and highrealizability due to few design parameters. Because ofthe global exponential stability property, the proposedneural network approach can be applied to solving


Fig. 10. Trajectories of closed-loop poles over time.

Fig. 11. Tracking performance of the inverted pendulum on a cart system.

large-scale nonlinear output regulation problems inreal-time applications.

References

Bhattacharyya, S. P., & de Souza, E. (1982). Pole assignment viaSylvester's equation. Systems and Control Letters, 1(4), 261}263.

Chen, F.-C., & Liu, C.-C. (1994). Adaptive controlling nonlinear con-tinuous-time systems using multilayer neural networks. IEEE Trans-actions on Automatic Control, 39(6), 1306}1310.

Chu, Y. C., & Huang, J. (1999). A neural-network method for thenonlinear servomechanism problem. IEEE Transactions on NeuralNetworks, 10(6), 1412}1423.

Cichocki, A., & Unbehauen, R. (1992). Neural networks for solvingsystems of linear equation and related problems. IEEE Transactionson Circuits and Systems, 39(2), 124}138.

Gurumoorthy, R., & Sanders, S. R. (1993). Controlling nonminimumphase nonlinear systems*the inverted pendulum on a cart example.

Proceedings of the American Control Conference (pp. 680}685). SanFrancisco, CA, USA.

Hauser, J., Satry, S., & Kokotovic, P. (1992). Nonlinear control viaapproximate input}output linearization: the ball and beamexample. IEEE Transactions on Automatic Control, 37(3),392}398.

Huang, J., & Rugh, W. J. (1992a). Stabilization on zero-error manifoldsand the nonlinear servomechanism problem. IEEE Transactions onAutomatic Control, 37(7), 1009}1013.

Huang, J., & Rugh, W. J. (1992b). An approximation method for thenonlinear servomechanism problem. IEEE Transactions on Auto-matic Control, 37(9), 1395}1398.

Isidori, A., & Byrnes, C. I. (1990). Output regulation of non-linear systems. IEEE Transactions on Automatic Control, 35(2),131}140.

Jagannathan, S., & Lewis, F. L. (1996). Discrete-time neural net control-ler for a class of nonlinear dynamical systems. IEEE Transactions onAutomatic Control, 41(11), 1693}1699.

Kumar, S. S., & Guez, A. (1991). ART-based adaptive pole placementfor neurocontrollers. Neural Networks, 4(3), 319}335.


Levin, A. U., & Narendra, K. S. (1993). Control of nonlinear dynamicsystems using neural networks: controllability and stabilization.IEEE Transactions on Neural Networks, 4(2), 192}206.

Lewis, F. L. (1999). Nonlinear network structures for feedback control.Asian Journal of Control, 1(4), 205}228.

Narendra, K. S., & Parthasarathy, K. (1990). Identi"cation and controlof dynamical systems using neural networks. IEEE Transactions onNeural Networks, 1(1), 4}26.

Polycarpou, M. M. (1996). Stable adaptive neural control scheme fornonlinear systems. IEEE Transactions on Automatic Control, 41(3),447}451.

Rovithakis, G. A. (1999). Robust neural adaptive stabilization ofunknown systems with measurement noise. IEEE Transactions onSystems, Man, and Cybernetic*Part B, 29(3), 453}459.

Rovithakis, G. A., & Christodoulou, M. A. (2000). Adaptive controlwith recurrent high-order neural networks: theory and industrialapplications.. Berlin: Springer.

Slotine, J. -J. E., & Li, W. (1991). Applied nonlinear control (pp. 115}116).Englewood Cli!s, NJ: Prentice-Hall.

Tsakalis, K. S., & Ioannou, P. A. (1993). Linear time-varying systems:control and adaptation (pp. 2}3). Englewood Cli!s, NJ: Prentice-Hall.

Wang, J. (1993). Recurrent neural networks for solving linear matrixequations. Computers and Mathematics with Applications, 26(9),23}34.

Wang, J. (1997). Recurrent neural networks for computing pseudoin-verses of rank-de"cient matrices. SIAM Journal on Scientixc Comput-ing, 18(5), 1479}1493.

Wang, J. (1998). Multilayer recurrent neural networks for synthesizingand tuning linear feedback control systems. In C.T. Leondes (Ed.),Control and dynamic systems*Neural network systems techniquesand applications, Vol. 7(2), pp. 75}126. San Diego, CA: AcademicPress.

Wang, L. X., & Mendel, J. M. (1991). Three-dimensional structurednetwork for matrix equation solving. IEEE Transactions on Com-puters, 40(12), 1337}1345.

Wang, J., & Wu, G. (1996). A multilayer recurrent neural network foron-line synthesis of linear control systems via pole assignment.Automatica, 32(3), 435}442.

Xia, Y., Wang, J., & Hung, D. L. (1999). Recurrent neural networks forsolving linear inequalities and equations. IEEE Transactions onCircuits and Systems*Part I, 46(4), 452}462.

Yunong Zhang received the B.S. and M.S.degrees in automatic control engineeringfrom the Huazhong University of Scienceand Technology and the South ChinaUniversity of Technology, China, in 1996and 1999, respectively. He is now work-ing towards the Ph.D. degree in theDepartment of Automation and Computer-Aided Engineering, the Chinese Universityof Hong Kong, Shatin, NT, Hong Kong.His research interests include nonlinearsystems, robotics, neural networks, and

signal processing.

Jun Wang is an associate professor ofmechanical and automation engineering atthe Chinese University of Hong Kong.Prior to this position, he was an associateprofessor at the University of NorthDakota, Grand Forks, North Dakota. Hereceived his B.S. degree in electrical engin-eering and an M.S. degree in systems en-gineering from Dalian University ofTechnology, China. He received his Ph.D.degree in systems engineering from CaseWestern Reserve University, Cleveland,

Ohio. His current research interests include neural networks and theirengineering applications. Jun Wang is an Associate Editor of the IEEETransactions on Neural Networks.


Recurrent neural networks for nonlinear output regulation

Documents

Transcript of Recurrent neural networks for nonlinear output regulation