Nonlinear Network Structures for Optimal Control

63
Slide 1 Cheng Tao & Frank L. Lewis Cheng Tao & Frank L. Lewis Advanced Controls & Sensors Group Automation & Robotics Research Institute (ARRI) The University of Texas at Arlington Nonlinear Network Structures for Optimal Control

Transcript of Nonlinear Network Structures for Optimal Control

Page 1: Nonlinear Network Structures for Optimal Control

Slide 1

Cheng Tao & Frank L. LewisCheng Tao & Frank L. LewisAdvanced Controls & Sensors Group

Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington

Nonlinear Network Structures for Optimal Control

Page 2: Nonlinear Network Structures for Optimal Control

Neural Network Solution for Fixed-Final Time

Optimal Control of Nonlinear Systems

Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington

ECC 07 Kos

Cheng TaoFrank L. Lewis

Page 3: Nonlinear Network Structures for Optimal Control

Slide 3

qdRobot System[Λ I]

Robust ControlTerm

q

v(t)

e

PD Tracking Loop

τrKv

Neural Network Robot Controller

^

qd

f(x)

Nonlinear Inner Loop

..

Feedforward Loop

Universal Approximation Property

Problem- Nonlinear in the NN weights sothat standard proof techniques do not work

Feedback linearization

Easy to implement with a few more lines of codeLearning feature allows for on-line updates to NN memory as dynamics changeHandles unmodelled dynamics, disturbances, actuator problems such as frictionNN universal basis property means no regression matrix is neededNonlinear controller allows faster & more precise motion

Page 4: Nonlinear Network Structures for Optimal Control

Slide 4

4 US Patents Sponsored by NSF- Paul Werbos

ARO- Randy Zachery

Page 5: Nonlinear Network Structures for Optimal Control

Slide 5

Problem- Nonlinear in the NN weights sothat standard proof techniques do not work

New book by Jay Farrell and Marios Polycarpou

Adaptive Approximation Based Control

Page 6: Nonlinear Network Structures for Optimal Control

Slide 6

Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.

Cellular Metabolism

Permeability control of the cell membrane

http://www.accessexcellence.org/RC/VL/GG/index.html

Optimality in Biological Systems

Page 7: Nonlinear Network Structures for Optimal Control

Slide 7

2. Neural Network Solution of Optimal Design Equations – 2002-2006

Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning

1. Neural Networks for Feedback Control – 1995-2002

Based on FB Control ApproachUnknown system dynamicsOn-line tuningNN- FB lin., sing. pert., backstepping, force control, dynamic inversion, etc.

3. Approximate Dynamic Programming – 2006-

Nearly Optimal ControlBased on recursive equation for the optimal valueUsually Known system dynamics (except Q learning)

The Goal – unknown dynamicsOn-line tuningOptimal Adaptive Control

ARRI Research Roadmap in Neural Networks

Extended adaptive control to NLIP systemsNo regression matrix

Nearly optimal solution ofcontrols design equations.No canonical form needed.

Extend adaptive control toyield OPTIMAL controllers.No canonical form needed.

Page 8: Nonlinear Network Structures for Optimal Control

Slide 8

Objective and Significance

Provide a tool to solve finite-horizon continuous-time optimal control problems for nonlinear systems.

Continuous time finite horizon optimal control problems appear applications in which people use model predictive control (receding horizon control).

Page 9: Nonlinear Network Structures for Optimal Control

Slide 9

Outline:

1. Fixed-Final Time Optimal Control of Nonlinear Systems Using Neural Network HJB Approach

2. Neural Network Solution for Finite-Final Time H-Infinity State Feedback Control

3. Neural Network Solution for Fixed-Final time Constrained Optimal Control

• This research was supported by NSF grant ECS-0140490 and ARO grant DAAD 19-02-1-0366.

Page 10: Nonlinear Network Structures for Optimal Control

Slide 10

Review of Related work and Motivation

Approximate HJB solutions

Munos et. al [65](Gradient descent approaches)

Kim, Lewis and Dawson [47](NNs)

Huang and Lin [44](Taylor series expansion)

NN applications to an optimal control

Miller [63](NNs for control)

Parisini and Zoppoli [70](Infinite horizon)

Constrained-input optimization

Sussmann, Sontag and yang [84]

Bernstein [15]

Dolphus [33]

Abu-Khalaf, M [1](Infinity horizon)

Unconstrained policy iteration with finite-time horizon

Beard[11]

Page 11: Nonlinear Network Structures for Optimal Control

Slide 11

with , positive definite on

where , , and the input

Background on Fixed-Final-Time HJB Optimal Control

Nonlinear dynamical system

)()()( tuxgxfx +=

nx ℜ∈ nxf ℜ∈)( mnxg ×ℜ∈)( ( ) mRtu ∈.

(1)

It is desired to find the control that minimizes a generalized nonquadratic functionalu

[ ]∫ ++= ft

tff dtuWxQttxttxV0

)()()),(()),(( 00 φ

)(xQ )(uW Ω

(2)

Page 12: Nonlinear Network Structures for Optimal Control

Slide 12

An infinitesimal equivalent to (2) is

( ) ( ) ( ) ( ) ( )( )tuxgxfx

txVLt

txV T

+⎟⎠⎞

⎜⎝⎛

∂∂

+=∂

∂−

,,

where . This is a time-varying partial differential equation with the cost function for any given and is solved backwards in time from .

( ) ( )uWxQL += ( )txV ,( )tu ftt =

By setting in (2) its boundary condition isftt =0

( )( ) ( )( )ffff ttxttxV ,, φ=

(3)

(4)

Background on Fixed-Final-Time HJB Optimal Control

Page 13: Nonlinear Network Structures for Optimal Control

Slide 13

According to Bellman’s optimality principle, the optimal cost is given by

( )( )

( ) ( ) ( ) ( )( )⎟⎟

⎜⎜

⎛+⎟⎟

⎞⎜⎜⎝

∂∂

+=∂

∂− xuxgxf

xtxVL

ttxV

T

tu

** ,min

,

which yields the optimal control

where is the optimal value function.( )txV ,*

( ) ( ) ( )x

txVxgRxu T

∂∂

−= −*

1* ,21

Substituting (6) into (5) yields the well-known time-varying Hamilton-Jacobi-Bellman (HJB) equation

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0,,41,, *

1***

=∂

∂∂

∂−+

∂∂

+∂

∂ −

xtxVxgRxg

xtxVxQxf

xtxV

ttxV T

T

(5)

(6)

(7)

Background on Fixed-Final-Time HJB Optimal Control

Page 14: Nonlinear Network Structures for Optimal Control

Slide 14

Then (5) becomes

(8)

If this HJB equation can be solved for the value function , then the optimal control is

( )txV ,

Background on Fixed-Final-Time HJB Optimal Control

( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) 0,,41

,,,

*1

*

**

=∂

∂∂

∂−

+∂

∂+

∂∂

=

xtxVxgRxg

xtxV

xQxfx

txVt

txVtxVHJB

TT

( ) ( ) ( )x

txVxgRxu T

∂∂

−= −*

1* ,21

Page 15: Nonlinear Network Structures for Optimal Control

Slide 15

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

NN Approximation of the Cost Function ( )txV ,

Using the following equation to approximate for on a compactset

[ ]fttt ,0∈nℜ⊂Ω

( ) ( ) ( ) ( ) ( )xtwxtwtxV LTL

L

jjjL σσ == ∑

=1, (9)

The NN weights are and is the number of hidden-layer neurons.

is the vector of activation function.

is the vector of NN weights.

( )tw j L

( ) ( ) ( ) ( )[ ] TLL xxxx σσσ ...21≡σ

( ) ( ) ( ) ( )[ ]TLL twtwtwt ...21≡w

( )txV ,

In Sandberg [78], it is shown that NNs with time-varying weights can be used to uniformly approximate continuous time-varying functions.

Page 16: Nonlinear Network Structures for Optimal Control

Slide 16

Note:

The set is selected to be independent. Then without loss of generality, they can be assumed to be orthonormal, i.e. select equivalent basis functions to that are also orthonormal. The orthonormality of the set on implies that if a function then

( )xjσ( )xjσ

( ){ }∞

1xjσ Ω

( ) ( )Ω∈ 2, Ltxψ

( ) ( ) ( ) ( )∑∞

=1

,,,j

jj xxtxtx σσψψ

where is inner product.∫ΩΩ⋅= gdxfgf ,

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

Page 17: Nonlinear Network Structures for Optimal Control

Slide 17

Note that( ) ( ) ( ) ( ) ( )txt

xx

xtxV

LTLL

TLL wσw

σ∇≡

∂∂

=∂

∂ , (10)

where is the Jacobian , and that( )xLσ∇ ( ) xxL ∂∂σ

( ) ( ) ( )xtt

txVL

TL

L σw=∂

∂ ,(11)

Therefore approximating by uniformly in in the HJB equation (8) results in

( )txV , ( )txVL ,

(12)

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( ) ( )txexQ

txxgRxgxt

xfxtxt

L

LTL

TL

TL

LTLL

TL

,41 1

=−

+

∇−−

− wσσw

σwσw

Page 18: Nonlinear Network Structures for Optimal Control

Slide 18

or

( ) ( ) ( ) ( )txextwtxVHJB L

L

jjjL ,,

1

=⎟⎟⎠

⎞⎜⎜⎝

⎛= ∑

=

σ (13)

where is a residual equation error. The corresponding optimal control input is( )txeL ,

(14)

To find the least-squares solution for , the method of weighted residuals is used( )tLw

( )( ) ( ) 0,,,

=∂

Ω

txettxe

LL

L

w

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

( ) ( ) ( )x

txVxgRxu T

∂∂

−= −*

1* ,21 ( ) ( )txgR L

TL

T wσ∇= −1

Page 19: Nonlinear Network Structures for Optimal Control

Slide 19

(15)

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

Ω

Ω

Ω

−−

Ω

Ω

Ω

⋅−

∇⋅

∇⋅+

⋅∇⋅−

=

)(),()(),(

)(),()(

)()()(41

)(),(

)()(),()()(),(

)(

1

11

1

xxQxx

xtxg

Rxgxtxx

txxfxxx

t

LLL

LLTL

T

LTL

LL

LLLLL

L

σσσ

σwσ

σwσσ

wσσσσ

w

Boundary condition ( )( ) ( )( ) ( ) ( )( )fLfTLffff txtttxttxV σw== ,, φ

Page 20: Nonlinear Network Structures for Optimal Control

Slide 20

Lemma 1: Convergence of Approximate HJB Equation

Lemma 2: Convergence of NN Weights

Lemma 3: Convergence of Approximate Value Function

Lemma 4: Convergence of Value Function Gradient

Lemma 5: Convergence of Control Inputs

Lemma 6: Convergence of State Trajectory

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

Lemmas Proved:

Page 21: Nonlinear Network Structures for Optimal Control

Slide 21

Lemma 1: Convergence of Approximate HJB Equation.

( ) ( ) ( )∑=

=L

jj

TjL xtwtxV

1, σ ( )( ) ( ) 0,, =

ΩxtxVHJB LL σ ( ) 0, =

ΩLfL tV σLet satisfy and

( ) ( ) ( )∑∞

==

1,

j jT

j xtctxV σ ( ) ( ) ( ) ( )[ ]TLL tctctct ...21≡c

( )( ) 0, =txVHJB ( ) ( )( )fff ttxtxV ,, φ=

Let and satisfy

and

( )( ) 0, →txVHJB L Ω LThen on as increases.

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

Page 22: Nonlinear Network Structures for Optimal Control

Slide 22

1. Calculate using eq. (13).( )( ) ( )Ω

xtxVHJB jL σ,,

0, =Ωjk σσ2. Apply if to simplify .( )( ) ( )

ΩxtxVHJB jL σ,,

3. Prove as increases,( )( ) 0, →txVHJB L L

Here .( )( ) ( )( ) ( ) ( )∑∞

= Ω=

1,,,

j jjLL xxtxVHJBtxVHJB σσ

Outline of proof of Lemma 1

jk ≠

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

Page 23: Nonlinear Network Structures for Optimal Control

Slide 23

Optimal Algorithm Based on NN Approximation

A mesh of points over the integration region can be introduced on 0Ω

( ) ( )[ ]TLL xxAp1 xx |......| σσ=

( ) ( ) ( ) ( )[ ]TLL xfxxfxBp1 xx |......| σσ=

( ) ( )[ ]TxQxQDp1 xx |...|=

where in represents the number of points of the mesh.p px

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

( )( )

( ) ( )( )

T

TL

TL

TL

TL

xxgRxgx

xxgRxgxC

⎥⎥⎥⎥

⎢⎢⎢⎢

∇∇

∇∇=

p

1

x1

x1

|)()(41

......|)()()(41

σσ

σσ

Page 24: Nonlinear Network Structures for Optimal Control

Slide 24

(16)

then

(17)

This is a nonlinear ODE that can easily be integrated backwards using final condition to find the least-squares optimal NN weights.( )fL tw

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

DAAAtCtAAA

BAtAAtTT

LTL

TT

TL

TL

11

1

)()()()(

)()()(−−

−+

−=

ww

ww

( ) ( ) ( ) ( ) 0=−+⋅−⋅− DAtCtAtBAtAA TL

TL

TL

TL

T wwww

Page 25: Nonlinear Network Structures for Optimal Control

Slide 25

Numerical Examples

a) Linear System

(18)

To find a nearly optimal time-varying controller, the following smooth function is used to approximate the value function of the system

(19)

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

1 1 2 1

2 1 2 2

2 35 6 2

x x x ux x x u

= + += + +

( ) 223212

21121, xwxxwxwxxV ++=

Page 26: Nonlinear Network Structures for Optimal Control

Slide 26

Fig. 1 Linear System Weights Fig. 2 State Trajectory of Linear System

Fig. 3 Optimal NN Control Law

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

Time

Wei

ght

Linear system weights

w1w2w3

0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

1

1.5

2

Time

Sta

tes

State Trajectory

x1x2

0 1 2 3 4 5 6 7 8 9 10-30

-25

-20

-15

-10

-5

0

5

Time

Con

trol I

nput

Optimal Control

u1u2

Page 27: Nonlinear Network Structures for Optimal Control

Slide 27

b) Nonlinear Chained System

213

22

11

uxxuxux

===

(20)

( )

33221

33220

33119

321183

31172

3116

2321153

22114

322113

23

2212

23

2111

22

2110

439

428

417

326315214233

222

211321 ,,

xxwxxw

xxwxxwxxwxxwxxxwxxxw

xxxwxxwxxwxxwxwxwxw

xxwxxwxxwxwxwxwxxxV

++

++++++

+++++++

+++++=

Smooth approximating function

(21)

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

Page 28: Nonlinear Network Structures for Optimal Control

Slide 28

0 5 10 15 20 25 30-4

-2

0

2

4

6

8

10

Time

W

NN Weights

Fig. 4 Nonlinear System Weights0 5 10 15 20 25

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time

Sta

tes

State Trajectory

x1x2x3

Fig. 5 State Trajectory of Nonlinear System

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

0 0.5 1 1.5 2 2.5 3-120

-100

-80

-60

-40

-20

0

20Optimal Control

Time

Con

trol I

nput

u1u2

Fig. 6 Optimal NN Control Law

Page 29: Nonlinear Network Structures for Optimal Control

Slide 29

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

Fixed-Final-Time HJI Optimal Control

1. Based on solving a related Hamilton-Jacobi-Isaacs equation of the correspondingfinite-horizon zero-sum game.

2. The game value function is approximated by a neural network with time-varyingweights.

3. It is shown that the neural network approximation converges uniformly to thegame-value function and the resulting nearly optimal constrained feedbackcontroller provides closed-loop stability and bounded gain. 2L

Page 30: Nonlinear Network Structures for Optimal Control

Slide 30

2 2Tz h h u= +

2

0

2

0

2

0

2

0

2

)(

)(

)(

)(γ≤

+=

∫∞

dttd

dtuhh

dttd

dttz T

System

),(

)()()(

uxzxy

dxkuxgxfx

ψ==

++=

)(ylu =

d

u

z

y control

Performance output

Measuredoutput

disturbance

where

Find control u(t) so that

For all L2 disturbancesAnd a prescribed gain γ2

L2 Gain Problem

H-Infinity Control Using Neural Networks

Zero-Sum differential Nash game

Page 31: Nonlinear Network Structures for Optimal Control

Slide 31

Hamiltonian function

( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) 222,,,, tdtuxhxhtdxktuxgxfx

txVdupxH TT

γ−++++∂

∂=

Minimizing the Hamiltonian of the optimal control problem with regard to and gives

u d

( ) ( ) ( ) 02, * =+∂

∂ tux

txVxgT ( ) ( ) ( ) 02, *2 =−∂

∂ tdx

txVxkT γ

( ) ( ) ( )dx

txdVxgtu T ,21* −= ( ) ( ) ( )

dxtxdVxktd T ,

21

2*

γ=

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

Page 32: Nonlinear Network Structures for Optimal Control

Slide 32

Hamilton-Jacobi-Isaacs equation

So

Boundary condition ( )( ) ( )( )ffff ttxttxV ,, φ=

Here ( ) ( ) ( ) ( ) ( ) ( )TTT xkxkxgxgxgxg 241

41ˆˆ

γ−=

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

( ) ( ) ( ) ( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( ) 0

,,

2*22*

**

=−++

++∂

∂+

∂∂

tdtuxhxh

tdxktuxgxfx

txVt

txV

T

T

γ

( )( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) 0,ˆˆ,

,,,

**

***

=+∂

∂∂

∂−

∂∂

+∂

∂=

xhxhx

txVxgxgx

txV

fx

txVt

txVtxVHJI

TTT

T

Page 33: Nonlinear Network Structures for Optimal Control

Slide 33

Cannot solve HJI !!

CT Policy Iteration for H-Infinity Control

Consistency equationFor Value Function

Successive Solution- Algorithm 1:Let γ be prescribed and fixed.

0u a stabilizing control with region of asymptotic stability 0Ω

1. Outer loop- update controlInitial disturbance 00 =d

2. Inner loop- update disturbanceSolve Value Equation

( ) 0)()(2)( 2

0

=−++++∂

∂∫ − iTiu

TTj

Tj

i

dddhhkdgufx

V j

γννφ

Inner loop update disturbance

xVxkd j

iTi

∂∂

=+ )(21

21

γgo to 2.Iterate i until convergence to jVd ∞∞ , with RAS j

∞Ω

Outer loop update control action

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂−=

+ xVxgu jT

j )(21

1 φ

Go to 1.Iterate j until convergence to ∞

∞∞ Vu , , with RAS ∞

∞Ω

Murad Abu KhalafInfinite Horizon Case

Page 34: Nonlinear Network Structures for Optimal Control

Slide 34

Results for this Algorithm

For this to occur it is required that 0* Ω⊆Ω

The algorithm converges to )(*),(*,),(* 0000 ΩΩΩΩ duV

the optimal solution on the RAS 0Ω

Sometimes the algorithm converges to the optimal HJI solution V*, *Ω , u*, d*

For every iteration on the disturbance di one hasj

ij

i VV 1+≤ the value function increasesj

ij

i 1+Ω⊇Ω the RAS decreases

For every iteration on the control uj one has1+

∞∞ ≥ jj VV the value function decreases

1+∞∞ Ω⊆Ω jj the RAS does not decrease

Converges to available storage

Murad Abu Khalaf

Page 35: Nonlinear Network Structures for Optimal Control

Slide 35

Hamilton-Jacobi-Isaacs equation

Boundary condition ( )( ) ( )( )ffff ttxttxV ,, φ=

Here ( ) ( ) ( ) ( ) ( ) ( )TTT xkxkxgxgxgxg 241

41ˆˆ

γ−=

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )* * * *

* , , , ,ˆ ˆ, 0

T TT TV x t V x t V x t V x t

HJI V x t f g x g x h x h xt x x x

∂ ∂ ∂ ∂= + − + =

∂ ∂ ∂ ∂

Page 36: Nonlinear Network Structures for Optimal Control

Slide 36

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

NN Approximation of the Cost Function ( )txV ,

( ) ( ) ( ) ( ) ( )xtwxtwtxV LTL

L

jjjL σσ == ∑

=1,

The NN weights are and is the number of hidden-layer neurons.

is the vector of activation function.

is the vector of NN weights.

( )tw j L

( ) ( ) ( ) ( )[ ] TLL xxxx σσσ ...21≡σ

( ) ( ) ( ) ( )[ ]TLL twtwtwt ...21≡w

In Sandberg [78], it is shown that NNs with time-varying weights can be used to uniformly approximate continuous time-varying functions.

Page 37: Nonlinear Network Structures for Optimal Control

Slide 37

Note that( ) ( ) ( ) ( ) ( )txt

xx

xtxV

LTLL

TLL wσw

σ∇≡

∂∂

=∂

∂ ,

( )L x∇ =σ ( ) xxL ∂∂σ

( ) ( ) ( )xtt

txVL

TL

L σw=∂

∂ ,

HJI becomes

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )11 ,4

T TL L L L

T T TL L L L L

t x t x f x

t x g x R g x x t Q x e x t−

− − ∇

+ − =

w σ w σ

w σ σ w

This turns a PDE into an ODE for the NN weights

Page 38: Nonlinear Network Structures for Optimal Control

Slide 38

Lemma 1: Convergence of Approximate HJI Equation

Lemma 2: Convergence of NN Weights

Lemma 3: Convergence of Approximate Value Function

Lemma 4: Convergence of Value Function Gradient

Lemma 5: Convergence of Control Inputs

Lemma 6: Convergence of State Trajectory

Nonlinear Fixed-Final-Time HJI Solution by NN Least-Squares Approximation

Lemmas Proved:

Stability for enough hidden layer neurons

As number of NN hidden layer neurons L → ∞

Conv. in Sobolev space

Page 39: Nonlinear Network Structures for Optimal Control

Slide 39

( ) ( ) ( ) ( )txextwtxVHJB L

L

jjjL ,,

1=⎟⎟

⎞⎜⎜⎝

⎛= ∑

=

σ

The corresponding optimal control input is

To find the least-squares solution for , the method of weighted residuals is used( )tLw

( )( ) ( ) 0,,,

=∂

Ω

txettxe

LL

L

w

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

( ) ( ) ( )x

txVxgRxu T

∂∂

−= −*

1* ,21 ( ) ( )txgR L

TL

T wσ∇= −1

Page 40: Nonlinear Network Structures for Optimal Control

Slide 40

Neural-network-based nearly optimal FEEDBACK control law.

This uses preliminary off-line tuning to solve HJI equation using NN.Dynamics must be known.

Page 41: Nonlinear Network Structures for Optimal Control

Slide 41

NN Least-Squares Approximate HJI Solution

( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )txhhxx

xtxgxgxtxx

txxfxxx

t

LLT

LL

LLTL

TL

TLLL

LLLLL

L

wσσσ

σwσσwσσ

wσσσσ

w

⋅⋅−

∇∇⋅+

⋅∇⋅−

=

Ω

Ω

Ω

Ω

Ω

Ω

,,

,ˆˆ,

,,

1

1

1

Boundary condition .( )( ) ( )( )ffff ttxttxV ,, φ=

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

NN approx has converted a PDE into an ODE for the NN weightsc.f. Mech Eng assumed mode shapes method for flexible systems

Page 42: Nonlinear Network Structures for Optimal Control

Slide 42

Optimal Algorithm Based on NN Approximation

A mesh of points over the integration region can be introduced on 0Ω

( ) ( )[ ]TLL xxAp1 xx |......| σσ=

( ) ( ) ( ) ( )[ ]TLL xfxxfxBp1 xx |......| σσ=

where represents the number of points of the mesh.p

( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )( )[ ]TTL

TL

TL

TL xxgxgxxxgxgxC

p1 xx |ˆˆ......|ˆˆ σσσσ ∇∇∇∇=

[ ]TDp1 x

Tx

T |hh......|hh=

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

Page 43: Nonlinear Network Structures for Optimal Control

Slide 43

( ) ( ) ( ) ( ) ( ) ( ) ( ) DAAAtCtAAABAtAAt TTL

TL

TTTL

TL

111 −−−−+−= wwww

This is a nonlinear ODE that can easily be integrated backwards using final condition to find the least-squares optimal NN weights.( )fL tw

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

Page 44: Nonlinear Network Structures for Optimal Control

Slide 44

Numerical Examples

a) Linear System

(18)

To find a nearly optimal time-varying controller, the following smooth function is used to approximate the value function of the system

(19)

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

1 1 2 1

2 1 2 2

2 35 6 2

x x x ux x x u

= + += + +

( ) 223212

21121, xwxxwxwxxV ++=

Page 45: Nonlinear Network Structures for Optimal Control

Slide 45

Fig. 1 Linear System Weights Fig. 2 State Trajectory of Linear System

Fig. 3 Optimal NN Control Law

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

Time

Wei

ght

Linear system weights

w1w2w3

0 1 2 3 4 5 6 7 8 9 10-1

-0.5

0

0.5

1

1.5

2

Time

Sta

tes

State Trajectory

x1x2

0 1 2 3 4 5 6 7 8 9 10-30

-25

-20

-15

-10

-5

0

5

Time

Con

trol I

nput

Optimal Control

u1u2

Page 46: Nonlinear Network Structures for Optimal Control

Slide 46

b) Nonlinear Chained System

213

22

11

uxxuxux

===

(20)

( )

33221

33220

33119

321183

31172

3116

2321153

22114

322113

23

2212

23

2111

22

2110

439

428

417

326315214233

222

211321 ,,

xxwxxw

xxwxxwxxwxxwxxxwxxxw

xxxwxxwxxwxxwxwxwxw

xxwxxwxxwxwxwxwxxxV

++

++++++

+++++++

+++++=

Smooth approximating function

(21)

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

Page 47: Nonlinear Network Structures for Optimal Control

Slide 47

0 5 10 15 20 25 30-4

-2

0

2

4

6

8

10

Time

W

NN Weights

Fig. 4 Nonlinear System Weights0 5 10 15 20 25

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time

Sta

tes

State Trajectory

x1x2x3

Fig. 5 State Trajectory of Nonlinear System

Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation

0 0.5 1 1.5 2 2.5 3-120

-100

-80

-60

-40

-20

0

20Optimal Control

Time

Con

trol I

nput

u1u2

Fig. 6 Optimal NN Control Law

Fix using time-varying transformations–Zhihua Qu

Page 48: Nonlinear Network Structures for Optimal Control

Slide 48

Simulation-Benchmark Problem

Fig. 8 Rotational actuator to control a translational oscillator.

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

Page 49: Nonlinear Network Structures for Optimal Control

Slide 49

( ) ( ) ( ) ( )tdxkuxgxfx ++= 2≤u

224

23

22

21 1.01.01.0

qT uxxxxzz ++++=

( )( )2.0

2=

++=

mMmeImeε 10=γ

( ) T

xxxxx

xxxxx

xf ⎥⎦

⎤⎢⎣

−−

−+−

=3

223

2413

43

223

241

2 cos1sincos

cos1sin

εεε

εε

T

xxx

g ⎥⎦

⎤⎢⎣

−−−

=3

223

223

cos110

cos1cos

0εε

ε

T

xx

xk ⎥

⎤⎢⎣

−−

−=

322

3

322 cos1

cos0

cos110

εε

ε

Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control

Page 50: Nonlinear Network Structures for Optimal Control

Slide 50

0 10 20 30 40 50 60 70 80 90 100-3

-2

-1

0

1

2

3x 1,x

3

Time in seconds

Nearly Optimal Controller State Trajectories

rtheta

Fig. 9 , State Trajectoriesr θ

0 10 20 30 40 50 60 70 80 90 100-1

-0.5

0

0.5

1

1.5

x 2,x4

Time in seconds

Nearly Optimal Controller State Trajectories

rdotthetadot

Fig. 10 , State Trajectoriesr θ

0 10 20 30 40 50 60 70 80 90 100-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

cont

rol

Time in seconds

Nearly Optimal Controller

Fig. 11 Control Input( )tu0 10 20 30 40 50 60 70 80 90 100

0

2

4

6

8

10

12

14

16

Atte

nuat

ion

Time in seconds

Nearly Optimal Controller Cost

Fig. 12 Disturbance Attenuation

Page 51: Nonlinear Network Structures for Optimal Control

Slide 51

Neural Network Solution for Fixed-Final time Constrained Optimal Control

When the control input is constrained by a saturated function .To guarantee bounded controls, [1][46] introduced a generalized nonquadratic functional

( )⋅φ

( ) ( )∫ −=u T RdvvuW0

2 φ

( ) ( ) ( )[ ] Tmvvv φφ 1=φ

( ) ( ) ( )[ ]muuu 11

11 −−− = φφφ

where , , and is a bounded one-to-one function that belongs to and .

mv ℜ∈ mℜ∈φ ( )⋅φ( )1≥pC p ( )Ω2L

Moreover, it is a monotonic odd function with its first derivative bounded by a constant .M

(22)

Page 52: Nonlinear Network Structures for Optimal Control

Slide 52

-3 -2 -1 0 1 2 30

1

2

3

4

5

6

7

8

9

10

u

WFig. 14 Nonquadratic Cost

( ) ( )∫ −⋅=u T RdvAvAuW0

tanh2

Fig. 13 Hyperbolic tangent function

-5 -4 -3 -2 -1 0 1 2 3 4 5-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

( )xy tanh=

Neural Network Solution for Fixed-Final time Constrained Optimal Control

Page 53: Nonlinear Network Structures for Optimal Control

Slide 53

When (22) is used, (5) becomes

( )( )

( ) ( ) ( ) ( ) ( ) ( )( )⎟⎟

⎜⎜

⎛+

∂∂

++=∂

∂− ∫ − xuxgtxf

xtxVRdvvxQ

ttxV

Tu T

tu,,2min, *

0

*

φ

Minimizing the Hamiltonian of the optimal control problem with regard to givesu

( ) ( ) ( ) 02, *1*

=+∂

∂ − ux

txVxg T φ

so

( ) ( ) ( )⎟⎟⎠

⎞⎜⎜⎝

∂∂

−= −

xtxVxgRxu T

*1* ,

21φ mUu ℜ⊂∈ (23)

Neural Network Solution for Fixed-Final time Constrained Optimal Control

Page 54: Nonlinear Network Structures for Optimal Control

Slide 54

HJB equation

( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) 0,21,2

,,,

*1

*

0

***

=+⎟⎟⎠

⎞⎜⎜⎝

∂∂

⋅⋅∂

∂−+

∂∂

+∂

∂=

−−∫ xQx

txVxgRxgx

txVRdvv

xfx

txVt

txVtxVHJB

TT

u T

T

φφ(24)

If this HJB equation can be solved for the value function , then (24) gives the optimal constrained control.

( )txV ,

Neural Network Solution for Fixed-Final time Constrained Optimal Control

Page 55: Nonlinear Network Structures for Optimal Control

Slide 55

So that

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )Ω

Ω

Ω

−−

Ω

Ω

−−

Ω

Ω

Ω

⋅−

⎟⎠⎞

⎜⎝⎛ ∇⋅⋅∇⋅+

⋅∇⋅−=

xxQxx

xtxxgRxgxtxx

xRdvvxx

txxfxxxt

LLL

LLTL

TL

TLLL

L

u TLL

LLLLLL

σσσ

σwσφσwσσ

σφσσ

wσσσσw

,,

,21,

,2,

,,

1

11

0

1

1

(25)

Neural Network Solution for Fixed-Final time Constrained Optimal Control

Page 56: Nonlinear Network Structures for Optimal Control

Slide 56

(12) can be converted to

( ) ( ) ( ) 0=−+−−− EAtDACAtBAtAA TL

TTL

TL

T www (26)

then

( ) ( ) ( ) ( )( ) ( ) ( ) EAAAtDAAA

AAAtBAAAtTT

LTT

TTL

TTL

11

11

−−

−−

−+

−−=

w

ww(27)

This is a nonlinear ODE that can easily be integrated backwards using final condition to find the least-squares optimal NN weights.( )fL tw

Neural Network Solution for Fixed-Final time Constrained Optimal Control

Optimal Algorithm Based on NN Approximation

Page 57: Nonlinear Network Structures for Optimal Control

Slide 57

Numerical Examples

a) Linear System

133

2212

3211 2

uxxuxxxxxxx

+=+−=++=

(28)

To find a nearly optimal time-varying controller, the following smooth function is used to approximate the value function of the system

(29)( ) 326315214233

222

21121 , xxwxxwxxwxwxwxwxxV +++++=

Neural Network Solution for Fixed-Final time Constrained Optimal Control

51 ≤u 202 ≤u

Page 58: Nonlinear Network Structures for Optimal Control

Slide 58

0 5 10 15 20 25 300

10

20

30

40

50

60

70

80

Time

W

w1w2w3w4w5w6

Fig. 15 Constrained Linear System Weights

0 5 10 15 20 25 30-4

-3

-2

-1

0

1

2

3

Time

Sta

tes

State Trajectory

x1x2x3

Fig. 16 State Trajectory of Linear System with Bounds

0 5 10 15 20 25 30-10

-5

0

5

10

15

20Optimal Control with Bounds

Time

Con

trol I

nput

u1u2

Fig. 17 Optimal NN Control Law with Bounds

Page 59: Nonlinear Network Structures for Optimal Control

Slide 59

b) Nonlinear Chained System

213

22

11

uxxuxux

===

(30)

( )

33221

33220

33119

321183

31172

3116

2321153

22114

322113

23

2212

23

2111

22

2110

439

428

417

326315214233

222

211321 ,,

xxwxxw

xxwxxwxxwxxwxxxwxxxw

xxxwxxwxxwxxwxwxwxw

xxwxxwxxwxwxwxwxxxV

++

++++++

+++++++

+++++=

Selecting the smooth approximating function

(31)

Neural Network Solution for Fixed-Final time Constrained Optimal Control

11 ≤u 22 ≤u

Page 60: Nonlinear Network Structures for Optimal Control

Slide 60

0 5 10 15 20 25 30-4

-2

0

2

4

6

8

10

Time

W

NN Weights

Fig. 18 Nonlinear System Weights0 5 10 15 20 25

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time

Sta

tes

State Trajectory

x1x2x3

Fig. 19 State Trajectory of Nonlinear System

Neural Network Solution for Fixed-Final time Constrained Optimal Control

0 5 10 15 20 25-2

-1.5

-1

-0.5

0

0.5Optimal Control with Constrains

Time

Con

trol I

nput

u1u2

Fig. 20 Optimal NN Constrained Control Law

Page 61: Nonlinear Network Structures for Optimal Control

Slide 61

0 10 20 30 40 50 60 70 80 90 100-6

-5

-4

-3

-2

-1

0

1

2

3x 1,x

3

Time in seconds

Nearly Optimal Controller State Trajectories

rtheta

r θFig. 21 State Trajectories

0 10 20 30 40 50 60 70 80 90 100-2

-1.5

-1

-0.5

0

0.5

1

1.5

x 2,x4

Time in seconds

Nearly Optimal Controller State Trajectories

rdotthetadot

r θFig. 22 State Trajectories

0 10 20 30 40 50 60 70 80 90 100-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

cont

rol

Time in seconds

Nearly Optimal Controller with Constrains

Fig. 23 Control Input( )tu0 10 20 30 40 50 60 70 80 90 100

0

1

2

3

4

5

6

7

8

9

10

Atte

nuat

ion

Time in seconds

Nearly Optimal Controller Cost with Constrains

Fig. 24 Disturbance Attenuation

C) Simulation-Benchmark Problem

Page 62: Nonlinear Network Structures for Optimal Control

Slide 62

Overview of the Method

Neural networks are used to approximately solve the finite-horizon optimal state feedback control problem

The method is based on solving a related Hamilton-Jacobi equation of the corresponding finite-horizon problem

Transform the problem into solving an ODE equation backwards in time.

Neural network approximation converges uniformly to the function and the resulting controller provides closed-loop stability.

The result is a nearly exact feedback controller with time-varyingcoefficients.

No policy iteration needed.

Page 63: Nonlinear Network Structures for Optimal Control

Slide 63