Post on 25-May-2018
HYPERSONIC VEHICLE TRAJECTORY
OPTIMIZATION AND CONTROL
S. N. Balakrishnan
J. Shen
J. R. Grohs
Dept. of Mechanical & Aerospace Engineering
and Engineering Mechanics
University of Missouri-Rolla
Rolla, MO 65409-0050
FINAL REPORT
July 1997
Grant Number: NAG 1 1728
Hypersonic Vehicles Office
Nasa Langley Research Center
Hampton, VA 23681-0001
https://ntrs.nasa.gov/search.jsp?R=19980017705 2018-06-30T06:29:37+00:00Z
TABLE OF CONTENTS
EXECUTIVE SUMMARY .................................................... 1
BACKGROUND ............................................................ 2
WHY NEURAL NETWORKS ................................................. 3
LITERATURE REVIEW ..................................................... 4
PROBLEM FORMULATION ................................................. 18
OPTIMIZATION/CONTROL ................................................. 20
NUMERICAL RESULTS .................................................... 30
CONCLUSIONS ........................................................... 32
ACKNOWLEDGMENT ..................................................... 32
BIBLIOGRAPHY .......................................................... 33
FIGURES ................................................................ 37
APPENDIX ............................................................... 52
HYPERSONIC VEHICLE TRAJECTORY OPTIMIZATION AND CONTROL
EXECUTIVE SUMMARY
Two classes of neural networks have been developed for the study of hypersonic vehicle
trajectory optimization and control.
The first one is called an 'adaptive critic'. The uniqueness and main features of this approach
are that: 1) they need no external training, 2) they allow variability of initial conditions, and 3) they
can serve as feedback control. This is used to solve a 'free final time' two-point boundary value
problem that maximizes the mass at the rocket burn-out while satisfying the pre-specified burn-out
conditions in velocity, flightpath angle, and altitude.
The second neural network is a recurrent network. An interesting feature of this network
formulation is that when its inputs are the coefficients of the dynamics and control matrices, the
network outputs are the Kalman sequences (with a quadratic cost function); the same network is also
used for identifying the coefficients of the dynamics and control matrices. Consequently, we can use
it to control a system whose parameters are uncertain.
Numerical results are presented which illustrate the potential of these methods.
I. BACKGROUND
For theUnitedStatesto maintainits leadershipin spacetechnology,cheapermeansof space
transportation- alternativesto spaceshuttlemust be developed. In order to developsuchan
alternative,differentconfigurationsof hypersonicvehiclesmustbestudiedfrom the perspectivesof
cost-effectiveperformance.A majorpartof suchstudy involvesoptimal trajectorydesignfor its
missionand control of vehicles. Sincecurrent state of knowledgeof hypersonicvehicles(in
atmosphericflight,especially)is limited,it is imperativethat anytool that isdevelopedfor trajectory
optimization and control be usablewith variationsin flight parameters.Therearequite a few
methods- direct and indirect - availablein the existing literature which deal with trajectory
optimizationandoptimalcontrol. However,theyareeitherill-suitedfor designor donot consider
thedesignphaseof avehicle.First,for eachscenario,typically,a two-point boundaryvalueproblem
needsto be solved. This processcould lead to an enormousamount of time when several
combinationsof scenariosareconsidered.Second,manytrajectoryoptimizationmethodsdo not
directlyyield afeedbackform of control that canbeusedin flight.
In thisstudy,two newneuralnetworkbasedapproacheshavebeenformulatedwhichaddress
the two problemsmentioned. The resultingdesigntechniqueenablesthe user to studyoptimal
trajectoryof hypersonicvehicleswith asetof predeterminedneuralnetworks.
scenarios, this approach is expected to yield near optimal trajectories.
problem in such a way as to produce a feedback control directly.
the gains of the matrices used in a linearized control.
For an envelope of
We formulate the
In the case of recurrent networks,
2
m
II. WHY NEURAL NETWORKS
Use of direct or indirect methods of optimization necessitates having to solve a problem for
each set of initial conditions. This requires determining a separate solution for each possible initial
condition for a given system. Dynamic programming is also a method of determining optimal control
for a family of initial conditions. However, the usual method of solution becomes very difficult to
solve in higher dimensions and nonlinear systems. These methods of solution for control do not
usually yield a feedback form of control in terms of states either.
Other methods of solution also have their advantages and disadvantages. Neighboring
optimal control is beneficial in that the solution of a single two-point boundary value problem
(TPBVP) allows an approximate solution over a limited range of initial conditions. The disadvantage
is that approximation methods such as neighboring optimal control can fail at a distance from the
original TPBVP solution.
Currently, there is no unified mathematical formalism under which a controller can be
designed for nonlinear systems. Techniques like feedback linearization have been used for a few
nonlinear problems under limited conditions, such as equal number of inputs and outputs. More
rigorous and general solutions are available with linearized models; however, they are restricted by
the assumption of linear models. Other available solutions for nonlinear controllers are highly
problem oriented. Consequently, we propose a formulation with neural networks which: 1) solves
a nonlinear control problem directly without any approximation to the system model (in the absence
of a good model this approach can synthesize a nonlinear model of the states), 2) yield a control law
in a feedback form as a function of the current states, and 3) maintain the same structure regardless
of the type or problem (handles linear problems as well). Such a formulation is afforded by the field
of neural networks. In the following sections,we tracethedevelopmentof neuralnetworksand
developmentof learningcontrolin particular.
13I. LITERATURE REVIEW
Thedevelopmentof intelligentcontrol systemdesigntechniqueshasalong andrichhistory
asdoesthefieldof controlsystemsengineeringingeneral.Neuralnetwork techniqueshavealsobeen
usedin control systemsfor quite a longtimebut recentlyhavebecomevery popular. Thissection
containsa brief surveyof the historyof control rangingfrom cyberneticsin the 1940'sthrough
learningcontrol systemsandthe beginningof neural control in the 1960's. The next important
landmarkoccurredwith the useof critic architecturesin reinforcementlearningsystems. We
concludethesectionwith abrief surveyof currentliteratureinneuralcontrol organizedin theareas
of systemidentification,nonlinear,adaptive,andoptimalneuralcontrol.
1. Cybernetics. Neural Networks and Learning Control. Norbert Wiener is recognized
as the father of cybernetics, a field which he describes as "the control and communication in the
animal and in the machine" [I]. Cybernetics also provided some of the motivation for the
development of control theory and neural networks during the 1950's and 1960's. For example,
Ashby contributed two complementary monographs in cybernetics, Desig71for a Brain [2] and An
Introduction to Cybernetics [3] which discussed control and communication in biological systems.
In the former, Ashby gave an early implementation of an artificial neural network called the hemostat.
The latter contribution was a careful development of cybernetics intended to popularize the
technology. Topics discussed include feedback, stability, a black box theory for large systems,
regulation and control in biological systems, and hierarchical control.
4
K. S. Fu givesoneof the first formal descriptionsof learningcontrol in [4]. A learning
controlsystemis acontrolsystemcapableof modifyingits behaviorbasedonexperiencein order to
maintain acceptable performance in the presence of uncertainties. Possible measures of performance
include the amount of time required to adapt to changes and the evaluation of suitable performance
indices. A learning control system is distinguished from adaptive control systems through its ability
to recognize familiar patterns in a situation and, based on past experience, to adjust in order to
improve performance. Adaptive control systems emphasize a control system's ability to react to new
situations.
Sklansky gives an early survey of learning control [5]. According to Sklansky, learning in the
automatic control literature is associated with a hierarchical arrangement of three feedback loops.
These are the controller, a system identifier or pattern recognizer, and a teacher. The pattern
recognizer transforms observable quantities in the system into a fixed set of categories, each of which
corresponds to a set of controller parameters. Categories are represented by fixed regions in an
intermediate feature space. The teacher provides information to the pattern recognizer for adjusting
the boundaries between categories in the feature space so that improved control system performance
results. An adaptive control system uses only the first two loops. The learning loop, which
distinguishes a learning control system from an adaptive control system, sends reinforcement signals
in the form of a reward or a punishment to the pattern recognizer based on an assessment of current
control system performance.
The advantage of the use of the learning loop is that it provides a means of training the pattern
recognizer on-line. Sklansky describes five techniques for the design of learning control systems and
notes their interrelationships and pattern classification. These techniques are decision theory,
trainablethresholdlogic, hill climbing,samplesetconstruction,andMarkov chains. In thedecision
theoretic approach,the boundariesbetweenclassesaredeterminedby estimatingjoint probability
densitiesusingmeasurementstakenfromthe systemduringoperation.Thetrainablethresholdlogic
methodwhich Sklanskydescribesisactuallyaprecursorto theuseof neuralnetworksfor control.
In thismethod,categoryboundariesaremovedby adjustmentof weightedsumsof componentsin
afeaturevector, thisweightedsumisthenpassedthroughathresholdfunctionto produceabipolar
control signal. Theteacherin athresholdlogic learningsystemprovidesinformationfor adjusting
weights in the categorizer. The sample set constructiontechnique breaks categories into
subcategoriesbasedon distancesmeasuredin the featurespace. During traininga fixed set of
prototypefeaturevectorsaredevelopedwith thesubcategoriesgivenby openballssurroundingthe
prototype.Wethenformthecategoryregionsastheunionsofsubcategories.Theboundarybetween
categoriesis formedasa sequenceof hyperplanesperpendicularto hyperplanesjoining prototypes
from eachcategory.
Ideas from decisiontheory, trainablethreshold logic, and sampleset constructionare
prominent in the developmentof neuralnetwork theory. In 1966Nikolic andFu [6] describean
algorithmbasedondecisiontheoryfor on-linelearningcontrolof anunknowndiscretetime plant
withoutanexternalteacher.Controlactionsarechosenfrom afinite set. Theperformanceindexis
theconditionalexpectationof the instantaneousperformanceevaluationswith respectto observed
statesandallowablecontrolactions.ThemodelusedbyNikolic andFu isverysimilarto Sklansky's
generallearningcontrol systemandtheyincludeprovisionsfor thecasewhentheteacherdoesnot
haveperfectknowledgeof theplantbeingcontrolled. Thiswork providesthefoundationfor later
critic basedschemes.
6
Tsypkinalsomakescontributionsin learningcontrol systemsbasedondecisiontheoryand
optimization.In anarticleabout'self-learning'[7], Tsypkindistinguishesbetweenthreemethodsfor
determining decisionrulesin the patternrecognizer. The first method assumesthat statistical
informationis availablein advance.In thiscasestatisticaldecisiontheorycanbeusedto determine
thedecisionrule. In thesecondmethod,thedesignerassumesthat asequenceof correctlyclassified
patternsexists. In thiscase,thedecisionruleis determinedbasedon datain thetrainingsetandthe
methodiscalledlearningwith reinforcement. In thethird case,no informationis assumedinitially
andthedecisionruleis foundusingobservedbutunclassifiedpatternsfrom the system.Tsypkincalls
this third caseself-learningExtensionsof the ideaof self-learningin automaticsystemsappliedto
patternrecognition,identification,dualcontrol,andtheallocationof resourcesarediscussedin a later
work [8] andcompiledinto atext [9].
Theimprovementinperformancewith respectto givenperformanceobjectivesandbasedon
experienceis a commontheme in learningcontrol. There are three componentsrelatedto
performancein the control system:i) the specificationoptimal performanceobjectives,2) the
assessmentof the system'slevelof performance,and3) ameans for improving performance over
time. Cybernetics and learning control are based on the use of pattern recognition, optimization, and
control of uncertain dynamic systems using biologically inspired models of intelligent behavior.
Rudimentary neural networks in the form of linear threshold logic units have been used as an
implementation medium for learning control systems cited above. We now turn to a discussion of
a subclass of learning control systems called reinforcement learning systems which build in methods
for assessing and improving control system performance.
7
2. Learning with a Critic, The ground breaking work on learning control in the 1960's,
along with studies in cybernetics, has led to a study of critic-based systems for two decades and this
study has recently been revived even in the current decade. In 1970, Mendel and McLaren introduced
a concept in learning control which they call reinforcement learning [ I0]. Reinforcement learning
control is developed as a subclass of learning control discussed above with the addition of
performance assessment and a method for modifying controller actions. The idea is to provide a
means of control for unstructured environments where the plant model may not be known or where
a complex performance measure is used [ 11]. In reinforcement learning systems, a critic is used to
monitor plant inputs and outputs and to provide an evaluation signal which represents an indication
of current performance to the controller.
Widrow, Gupta and Maitra [12] describe the concept of the critic for adaptation of neural
networks. Widrow et al., delineate three separate modes of learning. A supervised learning system,
also known as learning with a teacher, modifies the parameters of the neural network using error
between network output signals and the desired output signals. The assumption here is that the
desired output signals corresponding to each input signal are known at the time that learning is taking
place. In an unsupervised learning procedure, also called learning without a teacher or decision-
directed learning, the parameter adjustments are not guided by knowledge of a desired output signal.
Learning with a critic bridges the gap between the two previous methods. Learning with a critic does
not assume that desired output signals are known for each input signal but rather that some indication
can be made with respect to network performance over a series of trials.
Barto, Sutton and Anderson [13] extend the idea of learning with a critic through the
development of a learning system which includes both an adaptive critic element and an adaptive
search element. As in the learningwith a critic approach,explicit desiredcontrol actionsare
unknown.The objectiveis to providecontrolsignalswhichtendto optimizea performanceindex.
Thepurposeof theadaptivesearchelementis to implementatrial-and-errorprocedureto associate
controlvectorswith respectiveobservationsof thestateof thesystembeingcontrolled. Theadaptive
critic elementreceivesasuccess/failuresignalfromanoutsidesourceasa resultof aseriesof control
actions. This signalis calledanexternalreinforcementsignal. The adaptivecritic elementalso
receivesweightedsignalsfrom eachof the statevariableof the controlledsystem. The external
reinforcementsignalprovidesfeedbackfor modifying the strengthsof theseconnections. The
adaptivecritic elementusestheexternalreinforcementsignalandweightedstatesignalsto provide
a continuousevaluationof performanceto helpguide the searchfor appropriatecontrol actions.
Sutton callsthe critic basedadaptationalgorithmthe"AdaptiveHeuristicCritic" anddevelopsits
applicationin creditassignmentproblemsin hisPh.D.dissertation[ 14].
Theimplementationof theadaptivecritic is basedonWidrow's methodfor learningwith a
criticbutprovidesahigherlevelof feedbackto thecontrol system.Two setsof connectionweights
connectingtwo processingelementsareadjustedduringthelearningprocedure.This,in conjunction,
with theactivesearchdistinguishestheadaptivecriticarchitecturefrom previouswork. Theadaptive
critic architectureis capableof learningto balanceapolemountedon amovablecart byapplying
controlsignalsto a movablecartwithnopriorknowledgeof thesystemto becontrolled.Thisability
to determinecontrolactionsassumingnopreviousknowledgeisagreatstrengthof theadaptivecritic
architecture.Thedisadvantageof thearchitectureis that manyfailedtrialsoccurbeforeasuccessful
run iscompleted.Thecart-polesolutionalsodependson thepartitioningof theproblemstatespace
intoafinitenumberof regions.Thispartitioningmaynotbepracticalin problemswherefinercontrol
9
is required. In this case the number of regions required may be too large for effective results.
Examples of such problems include those with time-varying dynamics, tracking problems, and some
nonlinear problems.
Barto et al. [13], distinguish between supervised learning paradigms and reinforcement
learning used in the adaptive critic approach. In the supervised learning approach training proceeds
in several steps. First an input pattern is presented to a neural network. An output response is
produced based on the current parameters embedded within the network. The response is then
compared with a desired response and error is used to modify the neural network parameters to
improve its mapping. Reinforcement learning is based on an evaluation of the current network output
in relationship with current external factors (states in a system for example). This evaluation may be
as simple as a binary decision indicating a reward for proper response or punishment for inappropriate
response. The quality of feedback for a system using reinforcement learning is lower than that
available in a supervised learning system. This property makes reinforcement learning methods useful
for situations when a quantitative answer is not available.
Werbos [15] defends the use of neural networks for control applications. He suggests that
neural networks will be able to solve difficult problems faced by modern controls engineers including
the real-time control of nonlinear possibly unknown systems with high noise levels and high
throughput. Werbos describes five dominant paradigms for use in neural control systems. These are
Supervised Control, Inverse Dynamics, Stabilization Systems, Backpropagation Through Time and
Adaptive Critics with Reinforcement Learning. The Supervised Control architecture uses a neural
network trained to map current state vectors to corresponding control vectors. In the inverse
dynamics approach, observed system state is assumed to be a function of the current control and
10
previoussystemstate. Theneuralnetworkis trainedto invertthe plantin orderto providecontrol
actionswhichleadto desiredstates.Stabilizationsystemsaredesignedto providestablecontrol in
tracking andregulatorproblems.Backpropagationthroughtime dependson a plantmodelanda
performanceindexwritten in termsof control andstateactions. The neuralnetwork predictsa
sequenceof statesgivenasequenceof controlactions.Thebackpropagationalgorithmthenprovides
derivativesof theperformanceindexwhichcanbeusedto updatecontrol actionsat eachstepalong
theway. Adaptivecriticarchitecturesandreinforcementlearningarethefocalpoint in [33]. Werbos
describessystemsbasedon theadaptivecritic asanapproximationto dynamicprogrammingand
presentsthenotionof thebackpropagatedcritic.
Jameson[16] claimsto be thefirst to publishresultsusinga backpropagatedcritic. The
primarydifferencebetweentheadaptivecritic architectureof Barto Sutton,andAndersonandthe
backpropagatedcritic is themaximizationof thecritic output providinggradientinformationvia a
plantmodelnetworkto thecontrollersothatfuture controlactionscanbeimproved.Thepurpose
of the critic network in this architectureis to predict future reinforcementsignalsfrom the
environment. Thecritic networkanda modelof theplantareusedto calculatederivativesof the
predictedreinforcementsignalwith respectto controlactions.The controlactionsarethenmodified
to improve performance. The prediction providedby the critic network is also improvedby
comparingtheactualreinforcementsignalwith previouslystoredpredictions.Thebackpropagated
critic, like previouscritic designs,assumesnoknowledgeof the plantandresultsareimprovedby
makingmultipleattemptsat a solution.
SofgeandWhite[17] advocatethedevelopmentof neuralcontrol architectureswhichcanbe
adaptedon-linefor stableoperationof unknown,nonlinearplantswhich mayincludenoisein the
I1
feedbackloop. Theysuggestthat adaptivecritic architecturesmaybeusedin manufacturingthe
processcontrol applicationsto provideflexibility andefficientadaptabilitythroughchangeswhich
occurduring thelife-cycleof equipment.Theauthorsuseanadaptivecritic architecturebasedon
Albus'CMAC neuralnetwork[18]to doprocesscontrolinathermoplasticcompositemanufacturing
process.Accordingto SofgeandWhite,"the goal of on-linelearningis thereal-timeoptimization
of a largescalenon-linearprocessat minimalcomputationalcost." Theauthorshavedesignedand
built anadaptivecritic systemfor controlof manufacturingprocesses.
Watkins givesa recentimplementationof reinforcementlearningcalledQ-learningin his
dissertation[19]. Q-learningis basedon theapproximationof arealvaluedfunction,calledtheQ-
functionbyWatkins. Theq-functionis afunctionthat mapscurrentplantstateandcontrolinto an
estimationof thefutureperformanceof thesystem.Thisestimateis basedon theassumptionthat
optimalcontrol is appliedto theplantfrom the next timeinstantforward. A Q-learningalgorithm
is an algorithmwhich iteratively improvesthe estimationfor the Q-function. There is a close
correspondencebetweenQ-learninganddynamicprogrammingusedin the control of dynamical
systems[20]. Bradtke[21] distinguishesbetweentwo typesof Q-learningalgorithms.Bradtkecalls
theformdescribedaboveanoptimizingQ-learningalgorithmbecauseit tries to learntheQ-function
directly. A slightly modifiedform calledthe policy-basedQ-learningalgorithmtries to learnan
optimalsequenceof plantcontrolinputs(thecontrol policy).
Manyrecentcontrolsystemapplicationsof the ideasof reinforcementlearningandadaptive
criticarchitecturesexist. Gullapallidescribesareinforcementlearningalgorithmfor learningcontrol.
Thismethodusesradialbasisfunctionsandtheadjustableparametersof thenetworkaremeansand
variancesof normaldistributionfunctions.Themethodisappliedto a simulated3 degree-of-freedom
12
robotic arm [22]. Stamenkovich uses adaptive critic and adaptive search elements for learning to
guide a ship through a channel [23]. Shelton [24] demonstrates an adaptive critic design for
controlling a track with a CMAC (Cerebellar Model Articulated Controller, [18). Tham and Prager
compare the adaptive heuristic critic algorithm with the Q-Learning algorithm for obstacle avoidance
and control in multi-linked robotic manipulators [25]. Gachet et al., present an adaptive heuristic
critic based control system for learning goal based behavior for autonomous robot control. The three
types of behavior discussed are: 1) move to a goal state,
path [26].
3.
2) do surveillance, and 3) follow a specified
Neural Identification and Control. There has been an explosion of reported research
in the use of neural networks in control systems in recent years. Bavarian [27] gives an introduction
to the use of neural networks for intelligent control. Several monographs have been compiled
including a well known work edited by Miller, Sutton, and Werbos [28]. White and Sofge have
compiled a book which includes several chapters dealing with the use of neural networks in intelligent
control systems [29]. Hunt et al., have produced a comprehensive survey of the field [30].
Psaltis, Sideris and Yamamura describe three possible architectures for neural control systems
[31]. The indirect learning architecture attempts to invert the plant in order to provide control signals
which track a given input signal. In the generalized learning architecture the desired plant input signal
is assumed known and the neural network is trained to produce input signals for the next sampling
interval given the current plant output. The result is an output feedback control. The third
architecture is called the specialized learning architecture where the neural network is trained to
provide control to track an input function by minimizing the tracking error.
13
Levin andNarendra[32] presenta theory for the designof neuralcontrol systemswhich
stabilizenonlineardynamicsystemsaboutanequilibriumpoint. This theory is basedonnonlinear
control theory. Thearticlecontainsnecessarybackgroundinformationin nonlinearcontroltheory
and many examplesillustrating the interactionbetweennonlineartheory and the useof neural
networksfor stableregulation. Possiblecontrol methodsfor nonlinearsystemsinclude:1) theuse
of a linearcontrollerwhich assumesthat the plantcanbe linearizedaboutthe operatingpoint, 2)
stabilizingcontrol usingfeedbackstabilizationwherea changein statevariablesanda feedback
controllawareusedto transformasysteminto onewhich is linearaboutanoperatingpoint, and3)
directstabilizationthroughtheuseof anonlinearcontrol law. Neuralcontrol designsaregivenfor
thefeedbackstabilizationanddirect stabilizationmethods.
As statedabove,adaptiveandlearningcontrol systemsdependon theability to identifyplant
dynamics. Therehavebeena numberof contributionsin the useof neuralnetworksfor system
identification. Narendraand Parthasarathydiscussfeedforwardand recurrent neural network
structuresfor identificationandcontrolof systems[33]. Theauthorspresenta methodfor training
recurrentneuralnetworksanddescribenecessaryassumptionsfor wellposedneuralcontrolproblems.
Fernandez,Parlos,andTsaiinvestigatenonlinearsystemidentificationwith neuralnetworksby using
a recurrent networkto identifynonlineardynamicsystemsin discretetimebasedon input-output
measurements.Theresultsareappliedto the identificationof boilerdynamics[34]. Polycarpouand
Ioannou presenta stabilitytheoryapproachto synthesisandanalysisof identificationandcontrol
schemesin nonlinearsystemsusingneuralnetworks[35]. Both gradientandLyapunovsynthesis
approachesareapplied.
14
u
Applications of neural networks in adaptive control have also been investigated by several
researchers. Guez, Eilbert, and Kam [36] propose a neural network architecture for neural model
reference adaptive control. This system adjusts feedback gains so that the closed loop time response
matches a desired time response of a given reference model. Hoskins, Hwang and Vagners [37] use
iterative inversion of a neural plant model to provide control signals to the plant. The method is
applied to a problem in redundant manipulator kinematics, a model reference adaptive control system,
and a linear mass-spring-damper system. Hoskins and Himmelblau use similar techniques with an
emphasis on reinforcement learning applied to process control [38].
Goldenthal and Farre[1 [39] backpropagate the error between the actual plant and a reference
model through a neural network model of the plant and then continue the backpropagation procedure
through the controller network to update controller weights. The technique is demonstrated in a
model reference neural adaptive control system applied to the cart-pole problem. To accomplish this,
the backpropagation algorithm is extended so that the network can function as a closed-loop
controller and to force the closed loop system to match desired reference response.
Lan and Chand also investigate the discrete time linear quadratic regulator problem [40].
They point out that the conventional solution of the problem is an off-line solution. The computed
control history is stored and used later in an open loop control. The disadvantage to this approach
is that it is not robust and does not work for time-varying systems. Lan and Chand formulate an
augmented performance index with the linear constraint equations of the controlled system embedded.
The augmented performance index is then related to parameters in the energy function of a Hopfield
network [41]. The Hopfield network then minimizes the performance index in an iterative fashion
producing the required optimal control.
15
uncertainties.
space model.
Iiguni, Sakai and Tokumam [42] report a nonlinear regulator design which uses feedforward
neural networks to augment a linear quadratic regulator design for a nonlinear plant with parameter
The authors assume that the nonlinear plant can be modeled using a known linear state
This linear model is then used as the basis for a linear quadratic regulator (LQR)
design. The LQR design procedure yields gains for plant state feedback which minimizes a linear
quadratic performance index. We now have a regulator design which may be used with the actual
plant, however, the range of optimal control operation is limited.
Bouzerdoum and Pattison give a method for mapping a class of optimization problems onto
a recurrent neural network architecture [43].
index,
1 TQ x xTyJ (x) = --x2
with respect to vectors x e IR" subject to bound constraints
gi -< Xi < Vi' i = 1,''" ,n
The method minimizes a static quadratic performance
(1)
(2)
where the subscripts indicate components of the respective vectors. This static optimization problem
has a known solution. However, a matrix inversion is necessary and this is computationally intensive
for large dimensional spaces and difficult for ill-conditioned weighting matrices. The recurrent neural
network solution provides a parallel implementation for solving the problem.
Antony and Acar develop algorithms for real-time optimal control of discrete systems with
respect to a quadratic performance index over a finite time interval [44]. Problem formulations based
on the discrete time Hamiltonian for linear and partially unknown nonlinear systems are given. The
method depends on a model of the plant dynamics using a feedforward neural network. Two distinct
methods are given. For the first method, control vectors at each sample instant are modified during
16
every iteration of the algorithm. The second method develops the optimal control by a backward
sweep beginning at the final time. The second method has slower convergence rates but requires less
storage and fewer computations during each iteration.
In this research, we have formulated two types of neural networks. The first one is called an
"Adaptive Critic' architecture. The reason for choosing this structure for formulating the hypersonic
vehicle optimal control problems are: 1) this structure obtains an optimal controller through solving
dynamic programming equations, 2) this approach (see, Figure 1), has a supervisor (critic) which
critiques the outputs of the controller network and a neural network controller. Therefore, this
approach has a built-in fault tolerance, 3) this approach needs NO external training as in other forms
ofneurocontrollers, 4) this is not an open loop optimal controller but a feedback controller, and 5)
it preserves the same structure regardless of the problem (linear or nonlinear).
The adaptive critic method determines an optimal control law for a system by successively
adapting two networks, an action and a critic network. The control law does not need to be
determined a priori mathematically. This method simultaneously computes and adapts the neural
networks to the optimal control policy for both linear and nonlinear systems. In addition, it is
important to know that the form of control does not need to be known in order to use this method.
Since the control law is computed for a range of initial conditions, this approach is ideal for design
studies.
The second approach is to formulate a neural network for simultaneous identification and
control. This uses a modified form of Hopfield neural networks. The need for this network arose
atler the customer indicated that there is a large level of uncertainty in the system parameters. We
anticipated the need for this during the second year and formulated the network while awaiting the
17
POST3Dprogramandinputs. Researchanddevelopmentbasedon this approacharepresentedas
a conferencepaperattheend.Thispaperwaspresentedat the 1996AtmosphericFlightMechanics
ConferenceinJuly 1996at SanDiego,CA. Thispaperis enclosedin the Appendix.
The first part of the rest of this report dealswith the adaptivecritic approach,problem
formulation,algorithmdevelopmentandresults.
le
IV. PROBLEM FORMULATION
Statement of the General Problem
In this study a problem of the form (finite-time with terminal constraints) where a cost
function, J, given by
tf
J=qb(x(tr))0
(3)
subject to differential constraints
:L=f(x,u) (4)
tr-given Xo-given (5)
is considered, x is an n-dimensional state vector, u is an m-dimensional control vector, qb(), qJ(),
and f( ) are linear or nonlinear functions of state and/or control. Xo are the initial conditions and
tf is the final time.
18
o Dynamic Programming Background
We can rewrite Eq. (3)
J(x(t)) =U(x(t),u(x(t))) +<_r(x(t + 1))> (6)
Here, J(x(t)) is the cost associated with going from time t to the final time. U(x(t),u(x(t))) is the
utility, which is the cost from going from time t to time t+ 1. <.I(x(t+ 1)) > is assumed to be the
minimum cost associated with going from time t+l to the final time. If both sides of the
equation are differentiated and we define
).(x(t))- _iJ(x(t))6x(t) (7)
then
,1.(x(t))= 6U(x(t),u(t)) + 6U(x(t),u(t))6x(t) 6u(t)
( +( 6x(t+l) 6u(x(t)) /6x(t+!)/ _.(x(t+l))
_.(x(t+l)) 6x(t) / 6u(t) 6-_ /
(8)
From this it can be seen that if < Z(x(t+ 1)) >, U(x(t),u(t)) and the system model derivatives axe
known then _.(x(t)) can be found.
Next, the optimality equation is defined as
6J(x(t)) _ 0
6u(t)(9)
Dynamic programming uses these equation to aid in solving an infinite horizon policy or to
determine the control policy for a finite horizon problem.
19
3. Training Methods (Approximation Techniques)
This study uses Eqns. (8) and (9) in order to determine the optimal control policy. The
basic tr_ing takes place in two stages, the training of the action network (the network modeling
u(x(t)) and the training of the critic network (the network modeling, or approximating 3.(x(t)).
Both networks are assumed to be feedforward multiple layer perceptron networks.
The schematics of the controller (action) and critic networks are presented in Figures 2 and
3. To train the action network for time step t, first x(t) is randomized and the action network
outputs u(t). The system model is then used to find x(t+ 1) and (Sx(t+ 1))/(Su(t)). Next, the
critic from t+ 1 is used to find X(x(t+ 1)). This information is used to update the action network.
This process is continued until a predetermined level of convergence is reached.
In order to train the critic network for the time step t, x(t) is randomized and the output
of the critic 3.(x(t)) is found. The action network from step t calculates u(t) and (Su(t))/(6x(t)).
The model is then used to find (6x(t+ 1))/(Sx(t)), (6x(t+ 1))/(6u(t)) and x(t+ 1). The critic from
step t+ 1 is then used to find 3.(x(t+ 1)). After this, Eq. (8) is used to find X'(x(t)), the target
value for the critic. This process is continued until a predetermined level of convergence is
reached. In an infinite-dimensional problem, the training ends with one stage; however, for a
finite dimensional problem, such as this study, this series of steps is used at each stage. This
process will be explained in detail in the next section.
V. OPTIMIZATION/CONTROL
Motivation for the formulation in this section comes from the need of the customer in that
they would like to study the trajectories from the scramjet turn-off.to the rocket burn-out conditions
2O
of a certain vehicle. The reason for this is the uncertainties in the parameters of the earlier stage
designs. Consequently, there will be an envelope of conditions from which the rocket will have to
start and yet carry the payload to the pre-specified burn-out conditions. It is assumed that the rocket
burn-out conditions will ensure a proper apogee through the coasting period.
The cost function is given by
J
where
=
=
m
v
Y
h
Si
Subscripts
f
1 1
J = - Slmf+-_-S2(vf-vfD)2 +_'S3_f-YfD) 2
+ ! S4(hf- hfD)22
cost function to be minimized
mass
velocity
flightpath angle
altitude
-- weights on the final conditions
(10)
= final
fD -: desired final
Note that this cost function maximizes the final payload while ensuring that the velocity, the flightpath
angle and the altitude at the final time are as close to the finaVdesired burn-out conditions as possible.
The equations of motion are given by
21
rh T
glp
In = vsiny
(11)
(12)
,;' = (T cos_ -D)/m - gsiny/r 2 (13)
where
(v 1? = (Tsinct +L)/mv + - - g/r2v cosyr
T - thrust
1
L = kl, a _- pv2S = lift
D -= _-pvES - drag
la = gravitational constant
r = radial distance from the center of the earth
R_ = radius of the earth
I = specific impulsesla
ct ; angle of attack
(14)
A schematic of the scenario is presented in Figure 4. Final time is unknown. That means, this is a
'flee-final time' problem. There is no solution in the current literature for solving the 'free-final time'
problem for an envelope of initial conditions (other than the general method of dynamic
programming).
In order to solve this problem with neural networks, we transform it to one where altitude is
the independent variable. Through this step, we convert it to a problem where we can break it down
22
m
into several segments of altitude; this also allows us to reach the final desired altitude in all cases.
The initial conditions for this scenario are the possible final conditions from the termination of
scram jet.
This is a two-point boundary value problem where the initial conditions are known but the
final conditions are unknown. Usually, it is solved for a given set of initial conditions; however, in
this project we develop an adaptive critic-based solution which will solve the problem for an envelqpe
of initial conditions. By reformulating the model, we are able to remove altitude from the cost
function since the final condition in altitude is satisfied exactly.
The reformulated equations of motion with altitude, h as the independent variable, are given
by
dm/dh = T/gIp. 1/vsiny (15)
1 ldv/dh = mvslny - g / v (16)
COS t_ -
where the drag coefficient has been approximated with a parabolic drag polar with a least squares fit.
dy/dh TsinO_mv2siny+kl_lpv2S 1
In Eqns. (15-17), where lift coefficient CL has been approximated with a linear least squares fit.
23
CDo ,K 2,K l = constants
glocal acceleration due to gravity
In order to calculate the flight time, a fourth equation is added as,
dt/dhv sin y
For solutions with neural networks,
single-step discrete equations as:
(18)
we convert these nonlinear differential equations to
ink, I = m k -_Tk I ) Ahgklsp vksinYk
(19)
vk, t =v k+[(T kcos%- Dk)/m kvksinY k - gk/VklAh (2O)
Yk*l = yk +[I •T k sin % + Lk)/m k v 2 sm Yk + rk Vk Vk sm Yk]
1tk. i = t k +vksinYk
Ah
(21)
(22)
where
Ah
k
step size in altitude
stage
24
The corresponding Hamiltonian of the optimized problem is
Hk = )" mR + )'vk Vk" + )'_'k,Ykmk, 1 +i ",1 I . -I
where
Xk+!
Lagrangian multiplier for variable x at stage (k+l).
(23)
The propagation equations for the Lagrange's
differentiation of the Hamiltonian with respect to the states.
multipliers
They are:
are obtained by partial
X 0Hk , ]T- , x =- [m v,yxk _X k
(24)
Xm k
Ah
mk+l 2
m k
T k cos a k - D k
v k sin Yk
"XVk,I
T k sin a k --+Lk lS - Xy_., JVk staY k
(25)
"_'Yk =
Xv k Vk, 1
Ah
v k siny k Tk'nk" gk Isp
gk sinYk+
V k
+).Vk*l
T k cos a k
mkV k
I 2T ksina k+_'Yk., mk V 2
2gkcosy k
2
V k
Ah [ TkCOtYk ).
_"tk., + vksinyk [ gklsp mk+,
-(T kcosak - Dk)COtYk")'Vk-' mk
(TkSinak +Dk)COtYk/_Yk-t -- 2
mkV kV kr k
1
V k
2:k]1]
Vk
(26)
(27)
25
Note that %k+_is needed to solve for _k. The boundary conditions for the multiplier equations are
0J) (28)
Optimal control is obtained by partially differentiating the Hamiltonian with respect to the control.
In our case, angle of attack, cq is the control variable. We get
0Hk - 0 (29)
This gives
/_' Vk. I
_Yk*l
T k coso_ k + k I _ 9k Vk S = 0
V k
(30)
First, we solve for the control at the (N-l) 'h stage where N is the preselected number of stages.
That is, (aRer using small angle (_)) assumption
)'vN T_-t+kt_ py-Lvy-ls /vY-I = 0
(31)
26
Note that
/-vs = S2@N-VfD) (32)
Iys = S3(Yy-YFD). (33)
By substituting for IvN and Iyg in Eq. (31), we get
+ $3 N-Y_] TN-I÷kI_'PN-IVN-IS /Vs-1
(34)
We substitute for vs and YN in Eq. (34) in terms ofvs_ 1 and Ys._ by using propagation equations, Eq.
(25-27).
+ $3 [ YY-I
VN- 1 +
+
( 2) s t- TN - 1 CD D + k2 aN- 1 qN- 1 _ qN- I Ah
mN-1 VN_ 1 sinYN_ 1 VN_ 1
- reD] [ - (Trv_t + 2 k_ qS-_) _N_,]
/._.•,qN_ls.._l/ vN_l qN_l)mN_ 1VN-i sinyN_ _ rN-t VN-_
f'-' qN-1sI:°._1cOtYN- 11VN - 1
(35)
27
where the dynamic pressure qN-t is
1 2
qN-t - 10N-I VN-12
(36)
This leads to a cubic equation in _S-tas
T2T 3 o__t + (TIT 3 +T_T6) C_y_L + T 4 T_ =0 (37)
where
TI
TN-I - CDDqN-I S
mN - 1 VN - 1sin %_ x
qN-I
VN - 1
Ah- (38)
r 2
S 2 k2qN_ 1 S
mN-IVN-lsinYN-1
Ah (39)
T 3
T 4
- (TN_, +21qqs_,S )
S3[Yy_t + (VN-_I qN-t )cOtYN-,rN - 1 VN - I VN - 1
Ah-
(40)
(41)
Y s
T 6
TN-t + kl qN-1 S
VN - 1
$3 T5 / (mN-I sinYN-l vN-t)
28
(42)
(43)
Wecanobservethat all quantitiesareknownin termsof quantitiesat N-1. That is c_s., is available
asa feedbackcontrol basedonstatesatN-1.
For all other stages,k, we obtainthe expressionfor control in termsof the Lagrangian
multipliersat k + 1.
)'Yk., 1 Tk + kl qk S0_K -
_vk+i Vk Tk + 2k2 qk S
(44)
How do we construct the neural networks to solve this problem?
1. Solve for c_s._ in terms of my. L, vs. ,, Ys.I
Generate various _s-_ by changing ms. 1 , vs.1, Y,-z._•
Use a neural network to output as. 1 for m._.,, vs. t, Ys-, .... called _,,,., network.
[ We have optimal c_s. l now]
2. In order to solve for % (k=-0,1,2...N-2), of mk, vk, Yk, we need 3.m_., , 3.v_._ , )'Yk., ' So, use
)-s, ms-,, VN-I, YN-I and c_s. ' from step 1 to solve ford.ms, ._ , 3. N_, , and )_y__, using the ).-
(backward) propagation equations, Eq. (25-27). Train a neural network with ms.,, vs.,, Ys._
as inputs and ks-_ as output. Call this )-s-_ network.
[We have optimal ).s-_ now.]
How do we construct other networks?
3. Assume different values of ms. 2, vs.2, Ys-z and use a neural network to output e%,. This will
no, be optimal. Use rosa, vs.z, Ys-z, cos-2 in state equations, Eq. (19-21), to obtain m, v, y at
N-1. Use these states in 3.s. t network to output )-s-t. Use these 3._._ in optimal c_ equation,
Eq. (44), to compute (as.,) ,_,,_,. Continue this process till convergence.
[We have optimal _s-2 now.]
29
4. Assumedifferentvaluesof ms.2,vN.2,YN-2andusethemto get t_y.2 from aN-2 network. Use
all these in the state propagation equations to calculate states at N-1. Input these states in
)'s-_ network to get Xs. _. Use this XN._ and states and control at N-2 to find )-N-2 from the _,
propagation equations. Construct a ,ks.,, network to output ks. 2 with ms.2, vs.2, Ys-2 as inputs.
[We have optimal Ys-2 now.]
5. Assume different values of m_. 3, vs. 3, YN-3 and construct an aN. 3 network similar to ¢tN.2
network in step 3.
6. Construct a XN.3 network similar to Xs. 2 network in step 4.
Continue this process from k = N-l, N-2, ....0
How do we use these networks to generate optimal trajectory from given initial conditions?
Assume any mo, Vo,¥o and l_ [within the trained range]. Use q, neural network to find optimal t_ and
integrate till h for a_ network is reached. Use the m_, vt, y_ values to find t_t from the ct, neural
network and integrate till h2 is reached, and so on, till hr is reached.
Note that the forward intem'ation can be done in terms _)f_;im¢. and note that the Lagrange multiplier
network, used in the controller synthesis, is not needed now.
VI. NUMERICAL RESULTS
In order to verify the applicability of the adaptive critic approach to flexible trajectory
optimization, we used the rocket vehicle contained in a test case sent by the customer. We present
the results corresponding to two stages of neural-controlled trajectories from the burn-out of the
rocket in Figures 5-15. The desired end conditions are vf = 7617 ft/sec, yf = 16.636 deg., hf =
243,600 ft. In trying to match the final conditions, the values ofS_, S.,, and S3 are chosen to be 1,
30
1,and106.Thismeansthatwedesireto try andmatchthefinal flightpathanglemorecloselyrelated
to maximizationof final weightandmatchingthe desiredfnat velocity. Effect of changesto initial
flightpathanglearepresentedinFigures5-7. Wehavefixedtheinitial velocityandmassandchanged
theinitialflightpathangles. It canbeobservedfrom Figure5that afterfollowing differentpathsof
velocityin Stage1for thefirst 12.2seconds,all the l0 pathstry to converge;thesametrendcanbe
seenin Figure6 whichshowsthe flightpathanglehistories. Due to the relativeemphasison the
flightpathangle,wecanobservethat theflightpathanglesaremoreconvergentto thedesiredfinal
valuethanthevelocities.Theweighthistoryisalmostinvariantsincethe thrustisalmostconstant.
Figures8-I 1representthemass,flightpathangle,velocity,andaltitudehistorieswith timewherewe
changetheinitialmassin steps.Theeffectivenessof thisformulationis clearfrom theflightpathangle
historypresentedinFigure9. Eventhoughtheinitial step(dueto changesin mass)leadsto different
flightpathangles,thecontrolfrom thelaststagebringsthemveryclose.Althoughvelocitiesappear
divergent,it shouldbeobservedthattheyarescatteredcloseto thedesiredfinalvalue. Thealtitude
historyisverycloseto thesameinall thecasesasexpectedandsatisfiesthefinal condition. Figures
12-15representthestatevariablehistoriesdueto changesin thevelocities.Dueto thedivergence
of theflightpathanglevalueat theendof the first stage,thesecondstagevelocitiesshowapparent
deviationsfrom the desiredvaluesothat theresultingsecondstageflightpathanglescanbecloser
to thedesiredvalue. Theslightvariationsin thefinal altitudearedueto theforward integrationin
timewhichwe limitedto 20.4seconds.
31
VII. CONCLUSIONS
An approachto solving'free finaltime' problemswith anenvelopeof initial conditionshas
beenproposed.Thisapproachcalled'the adaptivecritic' consistsof two neuralnetworksat stage
developedinabackwardsweep.Afterdevelopment,onlythecontrollerisusedin forward integration
of trajectories.Numericalresultsfrom thelaststageof a launchvehicletrajectory(providedby the
customer)showthatthisapproachworkswell andcanbeusedin design.Furtherwork will involve
integrationwith POST3D,considerationof theotherphasesof flight etc.
VIII. ACKNOWLEDGMENT
We gratefullyacknowledgethepartialsupportprovidedbyNASA grantNAG-l-1728.
32
-- BIBLIOGRAPHY
[1
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
N. Wiener, Cybernetics: or Control and Communication #1 the Animal and the Machine.
Cambridge, MA: MIT Press, 1949.
W. Ashby, Design for a Brain. New York, NY: Wiley & Sons, 1952.
W. Ashby, An hztroduction to Cybernetics. New York, NY: Chapman & Hall, Ltd., 1957.
K. S. Fu, "Learning Control Systems," in Computer andbzformation Sciences (J. T. Tou and
R. H. Wilcox, eds.), pp. 318-343, Washington D. C.: Spartan Books, 1964.
J. Sklansky, "Learning Systems for Automatic Control," IEEE Transactions on Automatic
Control, Vol. AC-11, pp. 6-19, January 1966.
Z. Nikolic and K. Fu, "An Algorithm for Learning without External Supervision and Its
Application to Learning ControI Systems," [EEE Transactions on Automatic Control, Vol.
AC-11, pp. 414-422, July 1966.
Y, Tsypkin, "Optimization, Adaptation, and Learning in Automatic Systems," in Computer
andbformation Sciences-KK (J. T. Tou, ed.), pp. 15-32, New York, NY: Academic Press,
1967. Proceedings of the Second Symposium on Computer and Information Sciences held
at Battelle Memorial Institute, August 22-24, 1966.
Y. Tsypkin, "Self-Learning--What is it?," IEEE Transactions on Automatic Control, Vol.
AC-13, pp. 608-612, December 1968.
Y. Tsypkin, Adaptation and Learning in Automatic Systems. New York, NY: Academic
Press, 1971.
J. Mendel and R. McLaren, "Reinforcement Learning Control and Pattern Recognition
Systems," in Adaptive, Lectrnmg, cmd Pattern RecogT#tion Systems: Theory and AppBcations
(J. Mendel and K. S. Fu, eds.), pp. 287-318, New York, NY: Academic Press, 1970.
A. Barto, "Connectionist Learning for Control: An Overview," in Neural Networks for
Control(W. T. M. l-I, R. Sutton, and P. Werbos, eds.), ch. 1, pp. 5-58, Cambridge, MA: MIT
Press, 1990.
B. Widrow, N. Gupta, and S. Maitra, "Punish/Reward: Learning with a Critic in Adaptive
Threshold Systems," IEEE Transactions on Systems, Matt., and Cybernetics, Vol. SMC-3,
pp. 455-465, September 1973.
33
[13]
[14]
[15]
[16]
[17]
[18]
[19]
A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike Adaptive Elements That Can
Solve Difficult Learning Control Problems," IEEE Transactions on Systems, Matt., and
Cybernetics, Vol. SMC-13, pp. 834-846, September/October 1983.
R. Sutton, Temporal Credit AssigvmTent m Re#forcement Learning. Phi9 thesis, University
of Massachusetts, Amherst, 1984. This is the origin of the Adaptive Heuristic Critic
algorithm.
P. Werbos, "Backpropagation and Neurocontrol: A Review and Prospectus," in Proceedings
of the International Joint Conference on Neural Networks, Voi. I, (Washington D.C.), pp.
209-216, June 18-22, 1989.
J. Jameson, "A Neurocontroller Based on Model Feedback and the Adaptive Heuristic
Critic," in Proceedings of the bTternational Joint Conference on Neural Networks, (San
Diego, CA), pp. II37-II44, IEEE, June 1990.
D. A. Sofge and D. A. White, "Neural Network Based Process Optimization and Control,"
in Proceedings of the 29th Conference on Decision and Control, (New York, NY), pp. 3270-
3276, IEEE, December 1990.
J. Albus, "A New Approach to Manipulator Control: The Cerebellar Model Articulated
Controller (CMAC)," Transactions of the ASME Journal of Dynamic Systems, Meas_lrement
and Control, Vol. 97, pp. 220-227, September 1975.
C. Watkins, Learning with Delayed Rewards. Phi) Thesis, Cambridge University, 1989.
[20]
[21]
[22]
[23]
[24]
C. Watkins and P. Dayan, "Q-learning," Machine Learning, Vol. 8, No. 3-4, pp. 279-292,
1992.
S. Bradtke, "Reinforcement Learning Applied to Linear Quadratic Regulation," in Advances
in Neuralbforma#on Processing Systems, pp. 295-302, San Mateo, CA: Morgan Kaufmann
Publishers, 1993.
V. Gullapalli, "A Stochastic Reinforcement Learning Algorithm for Learning Real-Valued
Functions," Neural Networks, Vol. 3, pp. 671-992, 1990.
M. Stamenkovich, "An Application of Artificial Neural Networks for Autonomous Ship
Navigation Through a Channel," in Proceedings of the Vehicle Navigation & Information
Systems Conference, (Dearborn, M-I), pp. 475-481, October 20-23, 1991.
R. Shelton and J. Peterson, "Controlling a Truck with an Adaptive Critic CMAC Design,"
Simulation, Vol. 58, pp. 319-326, May 1992.
34
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
C. Tham and R. Prager, "Reinforcement Learning Methods for Multi-Linked Manipulator
Obstacle Avoidance and Control," Tech. Rep., Cambridge University Engineering
Department, Trumpington Street, Cambridge CB2 1PZ, UK, March 25, 1993.
D. Gachet, M. Salichs, and J. Pimentel, "Learning Emergent Tasks for an Autonomous
Mobile Robot," Tech. Rep., Dpto. Ingenieria, Universidad Carlos III de Madrid, Spain, 1994.
B. Bavarian, "Introduction to Neural Networks for Intelligent Control," IEEE Control
Systems Magazine, Vol. 8, pp. 3-7, April 1988.
W. Miller, R. Sutton, and P. Werbos (Eds.), Neural Networks for Control. Cambridge, MA:
The MIT Press, 1990.
D. White and D. Sofge (Eds.), Handbook of hTtelligent Control: Neural, Fuzzy, and Adaptive
Approaches, New York, NY: Van Nostrand Reinhold, 1992.
K. Hunt, D. Sbarbaro, R. Zbikowski, and P. Gawthrop, "Neural Networks for Control
Systems-A Survey," Automatica, Vol. 28, No. 6, pp. 1083-1112, 1992.
D. Psaltis, A. Sideris, and A. Yamamura, "A Multilayered Neural Network Controller,"
IEEE Control Systems Magazine, Vol. 8, pp. 17-21, April 1988.
A. Levin and K. Narendra, "Control of Nonlinear Dynamic Systems Using Neural Networks:
Controllability and Stabilization," 1EEE Transactions on Neural Networks, Vol. 4, pp. 192-
206, March 1993.
K. Narendra and K. Parthasarathy, "Identification and Control of Dynamical Systems Using
Neural Networks," 1EEE Transactions on Neural Networks, Vol. 1, pp. 4-27, March 1990.
B. Fernandez, A. Parlos, and W. Tsai, "Nonlinear Dynamic System Identification Using
Artificial Neural Networks (Anns)," in Proceedings of the International Joint Conference on
NeuralNetworks, Vol 2. (New York, NY), pp. 133-141, IEEE, June 17-21, 1990.
M. Polycarpou and P. A. Ioannou, "Identification and Control of Nonlinear Systems Using
Neural Network Models: Design and Stability Analysis," Tech. Rep. Report 91-09-01,
Electrical Engineering, University of Southern California, Sept. 1991.
A. Guez, J. Eilbert, and M. Kam, "Neural Network Architecture for Control," IEEE Control
Systems Magazine, Vol. 8, pp. 22-25, April 1988.
D. Hoskins, J. Hwang, and J. Vagners, "Iterative Inversion of Neural Networks and Its
Application to Adaptive Control," IEEE Transactions on Neural Networks, Vol. 3, pp. 292-
301, March 1992.
35
[38]
[39]
[40]
[41]
[42]
[43]
[44]
J. Hoskins and D. Himmelblau, "Process Control Via Artificial Neural Networks and
Reinforcement Learning," Computers and Chemical Engineering, Vol. 16, pp. 241-251, April1992.
W. Goldenthal and J. Farrell, "Application of Neural Networks to Automatic Control," in
Proceedings of the AIAA Conference on Guidance, Navigation, and ControL, (Portland,
OR), pp. 1108-1112, August 20-22 1990.
M.-S. Lan and S. Chand, "Solving Linear Quadratic Discrete-time Optimal Controls Using
Neural Networks," in Proceedings of the 29 'h Cot_erence on Decision and Control,
(Honolulu, HI), pp. 2770-2772, December 1990.
D. Tank and J. Hopfield, "Simple Neural Optimization Networks: An a/d Converter, Signal
Decision Circuit and a Linear Programming Circuit," IEEE Transactions on Circuits and
Systems, Vol. CAS-33, pp. 533-541, May 1986.
Y. Iguni, H. Sakai, and H. Tokumaru, "A Nonlinear Regulator Design in the Presence of
System Uncertainties using Multilayered Neural Networks," IEEE Transactions on Neural
Networks, Vol. 2, No. 4, pp. 410-417, 1991.
A. Bouzerdoum and T. Pattison, "Neural Network for Quadratic Optimization with Bound
Constraints," 1EEE Transactions on Neural Networks, Vol. 4, pp. 293-304, March 1993.
J. Antony and L. Acar, "Real Time Nonlinear Optimal Control Using Neural Networks," in
Proceedings of the American Control Cotference, Vol. 3, (Baltimore, MD), pp. 2926-2930,
June 29-July 1, 1994.
m
36
C_'_rZ'ZC
_f_QDEL Dq':r_ K'y
: ..... > TARGET= _A@:)x(:)
Figure I: Adaptive Critic for Control
37
I
i
X
c_
S
c-b
g\
L
cO
cn_
i
°_
w°_
J
v
V
Z_ v>4
v
c"0
V
X
c-O
L
V
XV
-,-I
0
_JZ
0.,-t
0
io
-,-I
38
m
X
<9 --
.,4
0
Z
0
.,-4
,o
39
Local VerticalT
LL_ _ _/, vV
__ Local Horizontal
\ \ \_ _ /_ Center of the earth
Figure 4: Schematic of the Trajectory
Optimization Scenario
40
7600
7550
o 7500
.m
o0
> 7450
7400
73500 2 4 6 8
Figure 5 :
0 12 14 16
Time (sec)
Velocity History
(with changes in initial gamma)
18 2O
4[
18"2I
18
17.8 ........................ ........................ : ..........................
17.6
_17.4 .........
"U
17.2 ..............................EE
© 17
16.8 ................. ........ ......... : .........
16.6 ............... ........................... :......
16.4 .................. : .................................................. : ............................
16.2 I , , , r0 2 4 6
Figure 6:
8 10 12 14 16 18
Time (sec)
Gamma History
(with changes in initial gamma)
2O
42
x 10 410.3
10.2 .........................................................................
10.1 ..............................................................
10 ........
o719.6 ........................................................................
9.5 ..................................................................... .........
9.4 ' ' _ '
-_ 9.9mv
f-CYl
9.8
0 2 4 6
Figure 7:
8 10 12 14 16 18
Time (sec)
Weight History
(with changes in initial gamma)
20
43
10.3x 104
10.2
10.1
10
%"9.9
1v
c-G3
9.8
9.7
9.6
9.5
9.40
Figure 8:
I ! t ' 1 I
2 4 6 8 10 12 14 16 18 20
Time (sec)
Weight History
(with changes in initial mass)
44
18
17.5
03d3
v
E
17
16.50 2 4 6
Figure 9:
8 10 2 14 16
_me (sec)
Gamma History
(with changes in initial mass)
18 20
z_5
7620
7600
7580 ....................... ................................. _................
7560 .........................................................................
"6"7540 ................................................
_7520 ...........................................................
0
> 7500 ....................
7480
7460
744O
74200 2 4 6 8
Figure i0:
10 2 14 16
Time (sec)
Velocity History
(with changes in initial mass)
18 2O
_6
2.5
2.4
2.3
v
_2
.-AM
<
2.1
2
1.90
x 10 s
: : : : :
F l
2 4 6 8 10 12 14 16 18 20
Time (sec)
Altitude History
(with changes in initial mass)
Figure II:
47
18
0-_
17.8
17.6 -
17.41
_3'u 17.2 ....................
EE
©17
16.8
16.6 ................ : .......... ..................................
16.4- ..........................................................................
16.2 L , , i r ' ,0 2 4 6 8
Figure 12:
10 12 14 16 18
Time (sec)
Gamma History
(with changes in initial velocity)
2O
48
x 10 4.10.3 , ! b
10.2 .................. ......... ................. .................. : ........................
10.1 ........................ ......... : ....... .......................................
10
%-.-Q 9.9
c_
9.8
9.7
9.6 ......... :......... : ........ :................... :.......... ' ......... i ..........................
9.5 ............... ....................................... : .....................
9.4 , , i !0 2 4 6 8
Figure 13:
10 12 14 16 18
Time (sec)
Weight History(with changes in initial velocity)
2O
_9
7650
7600
7550
75oo
07400
7400
7350
7300 L0 4 6
Figure 14:
8 10 12 14 16 18 20
Time (sec)
Velocity History(with changes in initial velocity)
50
2.5x lO s
0 2 4 6
Figure 15:
8 10 12 14 16 18
Time (sec)
Altitude History
(with changes in initial velocity)
2O
51
APPENDLX
52
APPENDIX
A Class of Modified Hopfield Networks
for Aircraft Identification and Control
Jie Shen S.N. Balakrishnan"
Department of Mechanical and Aerospace Engineering
and Engineering Mechanics
University of Missouri-Rolla
Rolla, MO 65401
(573)341-4675
Abstract
This paper presentsa classofmodifiedHopfieldneuralnetworks and theiruse in
solvingaircraftoptimalcontroland identificationproblems.This classofnetworksconsistsofparallelrecurrentnetworks which have variabledimensions that can be
changed tofitthe problemsunder considezal;ion.Ithas a structureto implement an
inversetransformationthatisessentialforembedding optimalcontrolgainsequences.
Equilibriumsolutionsaxe discussed.Energy minimizationofthe networks leadstoidentificationofthe system parameters.Numerical resultsareprovided toidenti_v
the dynamics ofan aircraft,and the correspondingoptimalcontroliscalculatedon-
line.Comparison of the neuralnetwork solutionswith point-wiseoptimalcontrol
using LQiW.formulationforthismuitivaxiablecontrolproblem shows nearidencica/
resultsthroughoutthe trajectories,
I Introduction
There has been a spurt of activities in the area of
artificial neural networks (ANN) during the last ten
years. For a survey of the ANN work done in the
areas of identification and control, see bibliography.
There are two types of networks used in almost allANN applications. The first is the more widespreadfeedforward network and the second is a less un-
derstood recurrent network. The feedforward net-
works where data flow is unidirectional are essen-
tially static; the recurrent networks, on the otherhand, are based on feedback connections. Due to
feedback connections, the recurrent networks are
better suited for control problems which are based
on closed-loop solutions.
In this paper, a variation of the Hopfield net-
work is proposed. Compared to the classic Hopfieid
network, it keeps the characteristic of energy min-
imization, which is used to minimize the identifi-
cation errors. The mean-square error is used as a
performance criterion in system identification, andis formulated in an energy form to utilize the net-
work functionality. Based on the equilibrium analy-sis, these networks can perform an inverse transfor-mation on matrices and other auxiliary mathemat-
ical operations. This feature allows the networksto give out optimal control gain sequences based on
the identified system parameters. In addtion, this
class of networks has more degrees of freedom thanthe classic Hopfield networks. The network archkec-
ture can be augmented according to the problems athand.
The modified Hopfield network is analyzed in sec-
tion 2. Its identification application is presented in
section 3, while the control application is in section
4. Both the principles and examples are given in
"Associate Fellow, AIAA (to whom all correspondence should be sent)
I
American Instituteof Aeronautics and Astronautics
eachind_viduMsection.Conclusionsarepresentedin section5.
2 Nlodified Hopfietd iN-etworks
2.1 Stability
T_e modified Hopfie[d network isa varianto[ the
da_sicad Hopfield network. Fig (1) shows ks basicfeatures.
We will demonstrate its stability by analyzing its
dynamics and using energ-y _ncfion. The network
has two clusters of neurons. The right part of the
networks is characterized by outputs @i which are
nonlinea_ functionsf of theirs_a_eu_
% = f(_,) (1)
where
uj =iw#vi-b_, j=t,2 .... ,m (2)i=].
with bj the exogenous input current, and v_ the out-
pu_ of the left cluster of amplifiers. Conductance wijconnects the output of the j'thneuron to the input
of the i'thneuron, which are indicatedinFig (i) as
I.
The leftpar_ of the networks ischaracterizedby
the dynamics. The amplifiershave input conduc-
lancesand capacitancesdenoted asgiand c_,respec-
tive!y.They both represen_ the a_mpiifiers'particle
input impedance and are responsibleforthe appro-
priatedme-doma/n behavior of the entirenetwork.
._.t_he same time, we assume that _he responsetime
of _(uj) isnegligiblysmall compared to that of the
ampkLfiersg(u_).
Under these assumptions, f(irchhoff'slaw _ves
US
d_ _-C_ d--?= -a_ - Gim - y__w#_j,
:=_ (3)
(i = ].,__.... ,_)
where Gi denotes _he sum of all conduc_ances con-
nec:ed to the input of _he i_h neuron and is equal
tO
Gi _ag, --fi _i] (4)1_--L
and ai isthe exogenous knput current.
Using Equation (1), the above formula can beexpressed as follows
j=t k=_
(i = t, <..., n) (5)
We now d&,ne _he followingLiapunou fi.mc.ion
as an energy function Efor the modified Hopfieidne_worka
E(_) a_u_ + F w_u_ - bjk=l 7=I
+_a, _-_(_)d_, (_)
Define
dF(:)/(_)= __- (r)
The components of the g-rad[ent vector of the as-
sumed energD" [unc_ion (6) can be expressed by find-
ing its derivatives as follows
OE(v) i ±-- = ai + Giai + w_if( w_jv_ - b_):=_ _=_ (8)
The tkne derivative of the energT _nction can
now be expressed using the above equations
dE
dt -_ al + u_Gi + fujii=t 7:1
fi dvi du{= - C, _-T" _-7
-- - i c'_-t (Vd k at /i=t
(9)
Since C, > 0, and g-t(vi) is a monotonically La-
creasing .;unction n, :he sum on the righ_ sight of (9)is aormegal:ive, and there/ore w e have dE�dr <_ O,unless d_i/dt = O, in which case dE�dr = O. Thismeans that the evolution of dynamic sys=em (5) in
state space always seeks the minima of the energysurface E. Imega=ioa of Eqs. (5) and (6) shows
tha_ _he outputs v i do followgTadJen_descen_ pathson the E surface.
2
American Institute of Aeronautics and Astronautics
----o-----------O
_-_b--_T --,_---._I °' i :'
\¢/ ....
; I
1 I
°i
,Qio
Fiomare h .Modified Hopfie[d Networks
2.2 Solution
In order _o get the anal_ic expression for the con-
verged value of the networks, we assume small sig-
nals and that they work in the linear region of the
amplkier. 'Note that in _he above derivation, there is
no di_,erence if.-redenote the connection matrices in
_he leftand right adjoint subnets separately. These
connection matrices are no<king but the weights w_.
Le_ _he righ_ connection matrix be DI, and the lef_
connec:ion matrix be Do_, the stability conclusion
stillholds. [inder these mild assumptions, and with
ffrchho_s law, we can have a relation in a matrix
form as
dl/
C de - a- GU - D_Q (i0)
q = K (D V-b)
= Ko(KID?U - b) (II)
where a and b are the exogenous inouts of the ad-
join_ networks G and F respectively. U is the inpu_
to G and V is the output: of G. _,Vealso assume
tha_ all aa:npiiSergains Kt fn G are equal. Similarly
_he ga/ns of aanpki_ers in F are K2. K1 and /(? are
sca/axs. Suhsutu=e Equation (Ii) into Equation (I0)
to get
CdI7dt
- a - GU -
Dr_ K_.(K:D,.U - b) (12)
= -(G + ._',.KtD_rD:)U
+KtD_rb - a (13)
When the networ "_
dIJ/dt = 0, and
V = K:U
= /DrD: + GKt K, )\
reach equilibrium,
2.3 Discussion
Equation (14) _ves the general solution for the mod-
i._ed Hopaeld networks. Compared with the classi-
cal Hopfieid networks, an obvious /ea_ure is tha_ i_
involves more parameters. We may _nd some ap-
plications in which these parameters can be taken
advantage of. Also some of them can be avoided
depending upon _he desized objective.
Note we get two fat:ors involved in the averse
operation. As a resut_, the s_ru_ure of _his .kind of
recurren_ networks is quite flemble, while the clas-
sical Flopfieid is self-recurrent, that is, i_ feeds back
itsov,-n output; the wariadon is mutually recurrent,
that is, [_ feeds back :he outputs of ks _wo-adjoin_
3
American tnsu_ute of Aeronautics and Astronautics
pans. This architecturecanbeexpandedfurther,_-ithease to three or four subnets or several layers
needed. Some special appiicacions may need that
computationa/relationship, but it is not needed for
the application considered here.
The dimensions of parameters a, b, DI and Do.
depend on the applications. KL and K2 also can be
desired to provide appropriate magnkudes. If Kt
IS large, then G and a will both have [es3 effect on
the output V or ig-norab[e. If we want a have rea-
sonable influence in _he expression while G should
not, then we design Ko large, and determine [(i ac-
cording to the requirements on a.
and (.)r is the transpose of macrt'<. (see, R;_.ol, Bib-
liography)
E= T!/or ;_q(:)r_q(t)dt_
troT1 •= _ [(x- A,x- B,u) r
•(i - A,x - B,u)dt
In order to facilitate the deriv'at[on, we expand
the kems in the factors of the ener=_/ function, and
utilize the trace identities to sknp[[_.
3 System Identification
3.1 Problem Formulation
The proposed structure for system identification in
the time domain is shown in Fig (2). The dynam-
ics of a linear pLant (to be identified) are defined by
the usual equations, where .4p and Bp are unknown
matrices and z and u are the state and control re-
spectively.
.%= Apx + Bpu (15)
The dynamic equation of the system mode[ de-
pends on e, which is the error vector between actual
system s=a_es x and eszimated values y.
_"= A,(e, t)x + B,(e, t)u - Ke (16)
Therefore, the error dynamics equation is a func-
tion of state and control.
= (Ap -- A,)x + (Bp - B,)u + Ke
(17)
The goal is to minimize simultaneously square-
error rates of all stares utilizinga Hopfield network.
To ensure global convergence of the paxameters, the
enero_y function of the network mus_ be quadratic
in terms of the parameter errors, (Ap- A.,) and
(Bp -B,). However, the error rates e in Eq. (IT)
axe functions of the parameter errors and the s_ate
errors. The state error depends on y, which, L_ turn,
is inHuenced by A, and B,. Hence, an ener_ func-
tion based on _ will have a recurrent relation with
A., and B,. To avoid this, we use the follow'ragen-
ergy hmc_ion, _here tr defines the trace of a marrL_,
E = _r
+ tr
4- tr
4-
A,
i xuTd tA, T [r
J0
fT T ]
B, T)
))
(19)
Equation (19) is quadratic in terms of A, and
B,. SubstitutLng ApX 4- Bpu for i m Eq. (19) in-
dicates that E is also a quadratic function of the
parameter errors. Based on Eq. (19), we can pro-
=='ram a Hopfie[d network chat has neurons with their
states representing dhYerenc elements of the A, and
B, matrices. From the convergence properties of the
Hopfietd network, the equilibrium state is achieved
when the paxzial derivatives OE/OA, and _E/OB,
are zero. We use the following identities_o find the
parziai derivatives of E.
t_ (,4_BAr) = 2,-ta (2o)&4
O__.tr(ABD) = BrD r (21)bA
This results in the following, where A_ and B:
are optimum solutions of the estimation problem.
American Insutute of Aeronautics and Aszronau=ics
. AI_GLK 0 P AT-FACK , / ""
Y" FUGHT _ATH A'_tGLE" i• %'U._OCIT"Y _._/""
a • PrT_H ,t_GLEV.'"
q - PFrCH P.ATE ...%//
Figure 3: Schematic of Lon_tudinal FHght
De6ne,
[ ,Jl = ¥
"xx T 0 0 0 xu 0 0
0 _x T 0 0 0 xu 0
0 0 XX T 0 0 0 XZL
0 0 0 XX T 0 0 0
ux r 0 0 0 u 2 0 0
0 ux r 0 0 0 u 2 0
0 0 ux r 0 0 0 42
0 0 0 ux r 0 0 0
0
0
0
XU
dt0
0
0
I_2
C d=¥ =3×(2g)
With these as weights and biases of _he networks,
a_j, and bj can be solved _hrough Eqs. (27) and (28).
Derivation of [wu] and [al]assumes tha_ the neuron
input conductance, Gi, islow enough so that _he the
second term in Eq. (3) can be negiected.
3.2 Numerical Example
We presen_ a representative numerical example to
va/ida_e the capacities of the modiEed Kop6e[d ne_-
works. The oden=a_ion of an aircraftinvolving [on=_.-
_udina/dynamics is shown in Fig (3). The linearized
equauions of motion of an aircraf_ in a vertical plane
are _ven by
._ =Ax + Bu (30)
where, _he elements of :he stare space x are
x=[_' _ 8 q]r (31)
The matrhx .% represents the dynamic s_ability
derivatives and [s _ven by
--0.0148 -13.88
-0.00019 -0.34
"% = 0 0
0.00005 -4.S
-32.2 0
0 1
0 1
0 -05
The matrLx B represents _he control derivatives and
is _ven by-I.I
-0.ii
B_ = 0
-8.74
The control vzriabie u represents elevator deflection.
Fig (4) shows :he simulation results of the system
identification. These flgmres represen_ only A_tz,
A_2, B_2, and B_ histories;simL1ar results can be
obtained fo_ other elements of :he A_ and B_ matri-
ces. From _.henumerical results shown in Fig (4), [t
is clear cha_ the network is able _o identify system
parameters very well.
4 Optimal Control Application
4.1 Problem Formuladon
ie_ _he plant. _o be con_roLled be described by the
linear equation
=_+t = A_z_= + B_u_ (32)
w/_h =_ E I%= and u_ E IZm. The associated perfor-
mance index is the quad:anc ffmction
l T !'_-7"_ _" _"
- - _= (33)
6
American [rmd_ute of Aeronautics and Astronautics
:ii : rE:iilI
0 I 2 _ _ _ _ ? 8 _ IQ
-3
= ,, L ,
1"_'_*( = ,,¢:)
Figure 4: Identification History
de_ned over the time [nterval of [nteres_ [i,N]. Note
that both the plant and the ¢os_-weighdng matrices
can be time-vaxy/ng. The {nitia/plane s_a_e is _ven
as z_. \Ve assume thac Q_:, R_ and SN are symmetric
positive semidefnke matrices, and is addition thag
Ia_i# 0 _'or all k.
The objective [s to fund the control sequence
u{, u___,. .. ,uN-_ to minimize J_.
To solve _his linear quadratic regq.t[a_or(LQR)
pcob[em, we bezin with the }-IaxnJ/tonianfunction
(3_)
and the stationaxity condition
0 = OJ--! = R_,_ + B[_\_+_ (37)
This procedure will finallylead to the control,
where the Kalman gain K_ is given by
K,= (B/S,+,B_+a_)-_B_S_÷_.&_
Then we can ge_ the szate and cos_ate equations
-- = A_=.z_+ B_u_ (35)
In _erms of the R/cacti variable S_, now
S_ = r
7
American InsziCuce of Aeronau=ic_ and As=ronau=ics
(3S)
(39)
L
Q
Figure 5: Simulation plo:
In the application where the control in_er_-a[is
finite,S_v will be _ven. Alternatively use Equa-
tion (38) and (39), we will gec a series of/<'e. The
gMn matrix /<_ v,illgenerally be _ime-var-ying even
when the matrices .4_, B_, O;= and E_ are all con-
stan_. But ifthe control interval isinfinite,the above
formu/arion need to be changed a little.
4.2 _'etwork Solution/Implementation
%Ve brieflydiscuss the recuzrent network solution for
optimal gain sequence.
Based on the recursions in Equations (38) and
(89), the most commonly encoun=ered opera=ions are
scalar and outer product vector multip[icadons and
macrt,c-vector multiplication. Bu_ the crucial oper-
ation here is the reverse to gec the Kalman gain.
The modiEed Hop6eld networks contain both Lu-
variant and variable parameters. Invarian_ parame-
ters are fuxed [n _he neuron-compu=ing model, while
v-ariableparameters can be modified. By compa__.ng
Eqs. (38) and (39) with the s=able outpu= of the ne_.-
work Eq. (14), ifwe se= D_ T = B_rS_+_, Do- = B_,
G = R_, and b = A_, a = 0, the networkK, W,give us the Kaknan sequence..-is we know, i= [s noc
di;_cu/_ for the circuks to achieve the multiplication
of two signais. However, since D_ and Do are con-
nection conductances, can _hey be changed by ocher
sig-na[s like S_ S_ _ z and B_ ?
The answer is a voltag_con_ro[led s_icch. A
voltage-controlled switch can be implemented usin_
a single _eldoe_'ecu or MOS transistor operating in
the resistive(ohmic, also called linear) reg'.on. So, all
the slg-naisare preferred _o be voltage signals. The
system parameters A_ and .9_ are generally the ouc-
pu_s of identification modules which are convenient
_.o be _ven out as voltages. The optimal control
formulation does no_ [im/_ _he X_, S_, O_ and R_
matrices to be constants and _he modl_ed [-Iop_eld
HopEeld network doesn't [Lrnk its capacities either.
Tlme-v-_g A_, S_ e_c. are easy to be feed into
the nec as voltage sig-nalsto be used in =he compu-
tations.
4.3 Numerical Example
%Ve consider the synr.hesis of an optimal lon_udi-
nai autopL[o_ in _his section. The performance index
in _h/s application is an infinite-dine quadratic cos_
func:ion. The niazinizing control isexpec:ed to drive
the deviations of _he [on_udinal dv-na._xucs[n pi:ck
angie 8, piuch race q, forward ve[ocky u', and andle
of attack a _o zero.
Ame_,/can Institute of Aeronau=ics and .Ls_ronautics
I
2P ..... J
l I
:L'I ̧: :- • [ ..... i ¸ " : ........ :............. i......
P _
- 0 1 2 1 _ 5 G 7 3 3 lO
Figure 6: Control His_ouI
The s?'s_em parameters are she same as identifi-
cation. The performance index has the form
//j = (×rq× + ) dt (ao)
where Q, and R are appropriate weighdn_ matrices.
We seIec_ R = 91.32 and
"10.37 0
0 0.0004
Q = 0 0.0016
0 0
0 0
0.0016 0
7.25 0
0 14.84
The simulation plot is shown isFig (5). The con-
trols which are calculated by networks, compared
,with LQR results are shown [n Fig (6). The sta_es
_rajec_ories are shown in F[_ (7). The controls are
applied at '2seconds.
5 Conclusion
A class of modified HopEeid necwor_ has been pre-
sented to solve para.me_er identificationand opc£mal
control problems. The architectures are designed
to suit an ener=o-ym_inim/zacion for system iden=i_-
cauon and a _ypical opcunal contro[ algorithm for
syscem controL Shrnilax to the Hopfield he=work,
_he stability of these modified networks [s guarAn-
teed. Bu_ _hey provide more de_rees of freedom and
5exibiiity _o accommodate differen_ applications. A
four-dimensional aircraft con=to[ problem is iden-
"_Lfiedand optimal control is obtained as illus=ra-
_ions of these approaches. Future work on this _opLc
wiil investigate the robustness of such network con-
trollersand _he use of these methods for ocher rele-
va.a_ apphcauons.
A CKNO WL ED G _VIEIV'T
This s_udy was partially funded by
NSF Grant ECS-9313946, the Missouri
Department of Economic Developmen_
Cen_er [or Advanced Teci'mo[o=o-y Pro-
_am and by NASA Grant NAGI-1728.
Bibliography
i. Balakhshnan, S.N. and Weft, R..D., "Neuro-
control: A literature_urvey', _fathi Compu_.
,_{odelEng, Vo[. 23, :qo. 1/2 pp. i01-I17, 1996.
2. Hun_, N.F., Sbarba_o, D., Zbikowski, I<. and
Gaw-_h.rop, P.J., "Neural Networks [or Control
System - A Survey,:' Automatic.a, Vo[. 28, No.
6, pp. i083-II12, 1992.
3. Hop6eld, J.J., and D. W. Tank. 1986. _'Com-
puung w%h Neural Circuits: A Mode[," Sci-
enc_ 233: 625-633.
4.._[ftler, W.T., Suttom I:LS. and \Verbos, P.J.
_Yeural _Ve_works /or" Contrv_, M_IT Press,
Caznbrid=_e, MA, 1990.
9
American [nstituce of Aeronautics and As=ronau=ics
'° /1 J//_ 4;- .............. • •
_I- .... :'r ': I :_ ................................. " J
I 2 3 • S _ ? ,,3 ; 10
i
1-20 f ..............
..30 i0 1
0
:O r
I
20[ .................................. f_ ]
101- ......... i _. . .........
!:
i
3 4 i _ 7 t _ 10
Fig-ureT: Sta:esTrajectories
5. Whi:e, D.A. and Serge, D.A., f-/andbook of In-
teiligent Control - .VeuraI, .z',azzy, and Adap-
tive Approaches, Van Nos_rand Reinhold, New
York, 1992.
6. R2_[, J._q.., Parameter estimation o� state
space models by recurrent neurat networks, lEEPrec.-Control Theo_" Appl., Vol. 142, No. 2,
pptl4118, March 1995
7. Naxendra, K.S, and Paxzhasarathy, K., "Iden-tilcarion and Control of Dynamical Systems
Using Neural Networks', IEEE Trans. onzVeuraI :Vetworks, VoL 1, No. 1, pp. _26,
March, 1990.
8. DARPA .Veurat _Vetwork S_udy, Faiffa.x, Vir-
gin.ta: AFCEA [.n_. ?re._s, 1988
9. Kopfie[d, J.J., '_T_he Effectiveness of .knalog_-e'Neural N'e_work' Hardware,".Vetwork 1: 27-
10.
!1.
40, 1990
Kopfieid, J.J., ';Neurons with Graded Re-
sponse Have Coilec:ive Computational Pro_er_ies Like Those of Two Sta_e Neurons," Prec.
_Vationa_ Academy o� Sc:ences 81: 3088-3092,
1984
Hopfie[d, J.J., "Neural Networks and Physical
Sysrems with Emergent Celiac=ire Compu=a-tional Abilities," Prec. #,rational Academy o/Sc.ences 79: 2554-2558, i982
12. Speciaf Section on :Veural _Vetworks for Con-
trot Systems, IEEE Con_. Sys. May., Vol. 9,
No. 3, pp. 25-59, April 1989.
I3. Special issue on _Veural zVetworics in Contro_
Systems, I'EEE Con_. Sys. >tag., VoL i0,No.
3, pp 3-87,April 1990.
!0
American Insu_u=e of Aeronautics and As_ronau=ics
I4. Kamp,Y., andHaster,M., Recnrsive :Venrat ._Vet-works /or Assoc:e_ive 5'fernorS', Chiches_er, U.K.:
John Wiley & Sons, I990
15. Friedlaad, B., Control $Tjstem Design, .McGraw-Hili
Book Company, 1986
16. Simullnk, D_jnemic Sl/stem Simulation Software
Fc>.r the 2( Windsw S_s_em, Tb.e Ninth Works Inc.,
1993
11
American Ins=i_ute of Aeronautics and Astronautics