Stochastic and Adaptive Optimal Control
Robert Stengel Optimal Control and Estimation, MAE 546
Princeton University, 2018
Copyright 2018 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html
http://www.princeton.edu/~stengel/OptConEst.html
! Nonlinear systems with random inputs and perfect measurements
! Stochastic neighboring-optimal control! Linear-quadratic (LQ) optimal control with
perfect measurements! Adaptive Critic Control! Information Sets and Expected Cost
1
Nonlinear Systems with Random Inputs and Perfect Measurements
Inputs and initial conditions are uncertain, but the state can be measured without error
x t( ) = f x t( ),u t( ),w t( ),t⎡⎣ ⎤⎦z t( ) = x t( )
E x 0( )⎡⎣ ⎤⎦ = x 0( )E x 0( )− x 0( )⎡⎣ ⎤⎦ x 0( )− x 0( )⎡⎣ ⎤⎦
T{ } = 0Assume that random disturbance
effects are small and additive
x t( ) = f x t( ),u t( ),t⎡⎣ ⎤⎦ + L t( )w t( )
E w t( )⎡⎣ ⎤⎦ = 0
E w t( )wT τ( )⎡⎣ ⎤⎦ =W t( )δ t −τ( )
2
Cost Must Be an Expected Value• Deterministic cost function cannot be minimized because
– disturbance effect on state cannot be predicted– state and control are random variables
minu(t )
J = φ x(t f )⎡⎣ ⎤⎦ + L x(t),u(t)[ ]to
t f
∫ dt
minu(t )
J = E φ x(t f )⎡⎣ ⎤⎦ + L x(t),u(t)[ ]to
t f
∫ dt⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
• However, the expected value of a deterministic cost function can be minimized
3
Stochastic Euler-Lagrange Equations?
There is no single optimal trajectoryExpected values of Euler-Lagrange necessary
conditions may not be well defined
1) E λ(t f )⎡⎣ ⎤⎦ = E∂φ[x(t f )]
∂x⎧⎨⎩
⎫⎬⎭
T
2) E λ(t)⎡⎣ ⎤⎦ = −E
∂H[x(t),u(t),λ(t),t]∂x
⎧⎨⎩
⎫⎬⎭
T
3) E∂H[x(t),u(t),λ(t),t]
∂u⎧⎨⎩
⎫⎬⎭= 0
4
Stochastic Value Function for a Nonlinear System
Expected values of terminal and integral cost are well defined
Apply Hamilton-Jacobi-Bellman (HJB) to expected terminal and integral costs
Principle of OptimalityOptimal expected value function at t1
V * t1( ) = E φ x*(t f )⎡⎣ ⎤⎦ − L x*(τ ),u*(τ )[ ]t f
t1
∫ dτ⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
= minuE φ x*(t f )⎡⎣ ⎤⎦ − L x*(τ ),u(τ )[ ]
t f
t1
∫ dτ⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪5
Rate of Change of the Value Function
dV *dt t= t1
= −E L x * (t1),u * (t1)[ ]{ }
x(t) and u(t) can be known precisely; therefore
Total time-derivative of V*
dV *dt t= t1
= −L x * (t1),u * (t1)[ ]
6
Incremental Change in the Value FunctionApply chain rule to total derivative
dV *dt
= E ∂V *∂t
+∂V *∂xx⎡
⎣⎢⎤⎦⎥
ΔV*= dV *dt
Δt = E ∂V *∂t
Δt + ∂V *∂x!xΔt + 1
2!xT ∂
2V *∂x2
!x⎛⎝⎜
⎞⎠⎟Δt 2 +"
⎡
⎣⎢
⎤
⎦⎥
= E ∂V *∂t
Δt + ∂V *∂x
f .( ) +Lw .( )( )Δt + 12 f .( ) +Lw .( )( )T ∂2V *∂x2
f .( ) +Lw .( )( )⎡
⎣⎢
⎤
⎦⎥Δt
2 +"⎧⎨⎩
⎫⎬⎭
Incremental change in value function,Expand to second degree
Next: cancel Δt7
Introduction of the TraceTrace of a matrix product is scalar
Tr ABC( ) = Tr CAB( ) = Tr BCA( )Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr •( )⎡⎣ ⎤⎦ = 1×1
dV *dt
≈ E ∂V *∂t
+ ∂V *∂x
f .( ) +Lw .( )( ) + 12 Tr f .( ) +Lw .( )⎡⎣ ⎤⎦T ∂2V *
∂x2f .( ) +Lw .( )⎡⎣ ⎤⎦
⎡
⎣⎢
⎤
⎦⎥Δt
⎧⎨⎩
⎫⎬⎭
= E ∂V *∂t
+ ∂V *∂x
f .( ) +Lw .( )( ) + 12 Tr∂2V *∂x2
f .( ) +Lw .( )⎡⎣ ⎤⎦ f .( ) +Lw .( )⎡⎣ ⎤⎦T⎡
⎣⎢
⎤
⎦⎥Δt
⎧⎨⎩
⎫⎬⎭
8
Toward the Stochastic HJB Equation
dV *dt
= E ∂V *∂t
+ ∂V *∂x
f .( ) +Lw .( )⎡⎣ ⎤⎦ +12Tr ∂2V *
∂x2f .( ) +Lw .( )⎡⎣ ⎤⎦ f .( ) +Lw .( )⎡⎣ ⎤⎦
T⎡
⎣⎢
⎤
⎦⎥Δt
⎧⎨⎩
⎫⎬⎭
= ∂V *∂t
+ ∂V *∂x
f .( ) + E ∂V *∂x
Lw .( ) + 12Tr ∂2V *
∂x2f .( ) +Lw .( )⎡⎣ ⎤⎦ f .( ) +Lw .( )⎡⎣ ⎤⎦
T⎡
⎣⎢
⎤
⎦⎥Δt
⎧⎨⎩
⎫⎬⎭
Because x(t) and u(t) can be measured,
1st two terms can be taken outside the expectation
9
Toward the Stochastic HJB Equation
E w t( )⎡⎣ ⎤⎦ = 0
E f .( )wT τ( )⎡⎣ ⎤⎦ = 0
dV *dt
= ∂V *∂t
+ ∂V *∂x
f .( ) + 12limΔt→0
Tr ∂2V *∂x2
E f .( )f .( )T( )Δt +LE w t( )w τ( )T( )LT⎡⎣
⎤⎦Δt
⎧⎨⎩
⎫⎬⎭
= ∂V *∂t
t( ) + ∂V *∂x
t( )f .( ) + 12Tr ∂2V *
∂x2t( )L t( )W t( )L t( )T⎡
⎣⎢
⎤
⎦⎥
Uncertain disturbance input can only increase the value function rate of change
Disturbance is assumed to be zero-mean white noise
10
E w t( )wT τ( )⎡⎣ ⎤⎦ =W t( )δ t −τ( )E w t( )wT τ( )⎡⎣ ⎤⎦Δt Δt→0⎯ →⎯⎯ W t( )
0
Stochastic Principle of Optimality (Perfect Measurements)
∂V *∂t
t( ) =
−minuE − dV *
dt+ ∂V *
∂xt( )f x* t( ),u t( ),t⎡⎣ ⎤⎦ +
12Tr ∂2V *
∂x2t( )L t( )W t( )L t( )T⎡
⎣⎢
⎤
⎦⎥
⎧⎨⎩
⎫⎬⎭
dV *dt
= ∂V *∂t
t( ) + ∂V *∂x
t( )f .( ) + 12Tr ∂2V *
∂x2t( )L t( )W t( )L t( )T⎡
⎣⎢
⎤
⎦⎥
! Substitute for total derivative, dV*/dt = –L(x*,u*)! Solve for the partial derivative, ∂V*/∂t! Expected value of HJB Equation -> Stochastic HJB Equation
11
= −minuE L x * t( ),u t( ),t⎡⎣ ⎤⎦ +
∂V *∂x
t( )f x * t( ),u t( ),t⎡⎣ ⎤⎦ +12
Tr ∂2V *∂x2 t( )L t( )W t( )L t( )T⎡
⎣⎢
⎤
⎦⎥
⎧⎨⎩
⎫⎬⎭
Boundary (terminal) condition : V * t f( ) = E φ t f( )⎡⎣ ⎤⎦
Observations of Stochastic Principle of Optimality
(Perfect Measurements)∂V *∂t
t( ) =
−minuE L x* t( ),u t( ),t⎡⎣ ⎤⎦ +
∂V *∂x
t( )f x* t( ),u t( ),t⎡⎣ ⎤⎦ +12Tr ∂2V *
∂x2t( )L t( )W t( )L t( )T⎡
⎣⎢
⎤
⎦⎥
⎧⎨⎩⎪
⎫⎬⎭⎪
! Control has no effect on the disturbance input! Criterion for optimality is the same as for the
deterministic case! Disturbance uncertainty increases the magnitude
of the total optimal value function, E[V*(0)]
12
Adaptive Critic Controller
13
Adaptive Critic Controller• Nonlinear control law, c, takes the general form
• On-line adaptive critic controller– Nonlinear control law (“action network”)– “Criticizes” non-optimal performance via “critic network”
• Adapts control gains to improve performance, respond to failures, and accommodate parameter variation
• Adapts cost model to improve performance evaluation
�
u t( ) = c x(t),a,y * t( )[ ]x(t) : statea : parameters defining an operating point
y*(t) : command input
14
15
Ferrari, Stengel, “Model-Based Adaptive Critic Designs,” in Learning and Approximate Dynamic Programming, Si et al, ed., 2004
Adaptive Critic Iteration Cycle
16
Overview of Adaptive Critic Designs
17
Modular Description of Typical Iteration Cycle
Gain Scheduled Controller• Design PI-LQ controllers (LQR with integral
compensation) that satisfy requirements at N operating points
• Scheduling variable, a, identifies each operating point
u tk( ) = CF ai( )y* tk( ) +CB ai( )Δx tk( ) +CI ai( ) Δy τ( )dτtk−1
tk
∫≈ c x(tk ),ai ,y* tk( )⎡⎣ ⎤⎦, i = 1,N
18
Replace Gain Matrices by Neural Networks
Replace control gain matrices by sigmoidal neural networks
u tk( ) =NNF y* tk( ),a tk( )⎡⎣ ⎤⎦ +NNB x tk( ),a tk( )⎡⎣ ⎤⎦ +NN I Δy τ( )dτ∫ ,a tk( )⎡
⎣⎤⎦
= c x(tk ),a,y* tk( )⎡⎣ ⎤⎦
19
Sigmoidal Neuron Input-Output Characteristic
Sigmoid with two inputs, one output
Logistic sigmoid function
u = s(r) = 11+ e−r
u = s(r) = 11+ e− w1r1 +w2r2 +b( )
20
u = s(r) = er −1er +1
= tanh r2
or
Algebraically Trained Neural Control Law
• Algebraic training of neural networks produces exact fit of PI-LQ control gains and trim conditions at N operating points – Interpolation and gain scheduling via neural networks– One node per operating point in each neural network– See Ferrari, Stengel, JGCD, 2002 [on Blackboard]
21
On-line Optimization of Adaptive Critic Neural Network Controller
Critic adapts neural network weights to improve performance using approximate dynamic programming
22 Ferrari, Stengel, JGCD, 2004
Heuristic Dynamic Programming Adaptive Critic
• Dual Heuristic, Dynamic Programming Adaptive Critic for receding-horizon optimization problem
V xa tk( )⎡⎣ ⎤⎦ = L xa tk( ),u tk( )⎡⎣ ⎤⎦ +V xa tk+1( )⎡⎣ ⎤⎦
∂V∂u
= ∂L∂u
+ ∂V∂xa
∂xa∂u
= 0∂V xa t( )[ ]∂x a t( )
= NNC xa t( ),a t( )[ ]
23
• Critic and Action (i.e., Control) networks adapted concurrently
• LQ-PI cost function applied to nonlinear problem
• Modified resilient backpropagation for neural network training
Ferrari, Stengel, JGCD, 2004
on 2nd Time Step
Action Network On-line TrainingTrain action network, at time t, holding the critic parameters fixed
NNC
Aircraft Model • Transition Matrices • State Prediction
Utility Function Derivatives
NNA
xa(t)
a(t)
Optimality Condition
NNA Target
Target Generation 24
Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed
NNC(old)
Utility Function Derivatives
NNA
NNC Target
Target Generation
Aircraft Model • Transition Matrices • State Prediction
NNC
Target Cost Gradient
xa(t) a(t)
25
Adaptive Critic Flight Simulation
26
70-deg Bank Angle Command Multiple Failures (Thrust, Stabilator, Rudder)
Ferrari, Stengel, JGCD, 2004
Stochastic Linear-Quadratic Optimal Control
27
Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem
Linear dynamic constraint
V to( ) = E φ x(t f )⎡⎣ ⎤⎦ − L x(τ ),u(τ )[ ]t f
to
∫ dτ⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
= 12E xT (t f )S(t f )x(t f )− xT (t) uT (t)⎡
⎣⎤⎦
Q(t) M(t)MT (t) R(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
x(t)u(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
t f
to
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
x t( ) = F(t)x(t) +G(t)u(t) + L(t)w(t)
Quadratic value function
28
Components of the LQ Value Function
Certainty-equivalent value function
V t( ) = 12xT (t)S(t)x(t) + v t( )
Quadratic value function has two parts
Stochastic value function increment VCE t( ) 1
2xT (t)S(t)x(t)
v t( ) = 12
Tr S τ( )L τ( )W τ( )L τ( )T⎡⎣ ⎤⎦dτt
t f∫29
Value Function Gradient and HessianCertainty-equivalent value function
Gradient with respect to the state VCE t( ) 1
2xT (t)S(t)x(t)
∂V∂x
t( ) = xT (t)S(t)
Hessian with respect to the state∂2V∂x2
t( ) = S(t)
30
Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation
(Perfect Measurements)Certainty-equivalent plus stochastic terms
∂V *∂t
= −minu
12E x*T Qx*+2x*T Mu+ uTRu( ) + x*T S Fx*+Gu( ) + Tr SLWLT( )⎡⎣ ⎤⎦
= −minu
12x*T Qx*+2x*T Mu+ uTRu( ) + x*T S Fx*+Gu( ) + Tr SLWLT( )⎡⎣ ⎤⎦
Terminal condition
V t f( ) = 12 xT (t f )S(t f )x(t f )
31
Optimal Control Law(Perfect Measurements)
Differentiate right side of HJB equation w.r.t. u and set equal to zero
∂ ∂V ∂t( )∂u
= 0 = xTM + uTR( ) + xTSG⎡⎣ ⎤⎦
u t( ) = −R−1 t( ) GT t( )S t( ) +MT t( )⎡⎣ ⎤⎦x t( )! −C t( )x t( )
Solve for u, obtaining feedback control law
32
LQ Optimal Control Law(Perfect Measurements)
u t( ) = −R−1 t( ) GT t( )S t( ) +MT t( )⎡⎣ ⎤⎦x t( ) −C t( )x t( )
Zero-mean, white-noise disturbance has no effect on the structure and gains of the LQ feedback control law
33
Matrix Riccati Equation for ControlSubstitute optimal control law in HJB equation
Matrix Riccati equation provides S(t)
S t( ) = −Q(t)+M(t)R−1(t)MT (t)⎡⎣ ⎤⎦ − F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦TS t( )
−S t( ) F(t)−G(t)R−1(t)MT (t)⎡⎣ ⎤⎦ + S t( )G(t)R−1(t)GT (t)S t( ), S t f( ) = φxx t f( )
12xT Sx + v = 1
2xT −Q+MR−1MT( )− F −GR−1MT( )T S− S F −GR−1MT( ) + SGR−1GTS⎡
⎣⎤⎦x
+ 12Tr SLWLT( ) u t( ) = −R−1 t( ) GT t( )S t( ) +MT t( )⎡⎣ ⎤⎦x t( )
! Stochastic value function increases cost due to disturbance! However, its calculation is independent of the Riccati equation
v = 1
2Tr SLWLT( )
34
Evaluation of the Total Cost(Imperfect Measurements)
! Stochastic quadratic cost function, neglecting cross terms
J =12Tr E xT (t f )S(t f )x(t f )⎡⎣ ⎤⎦ + E xT (t) uT (t)⎡
⎣⎤⎦Q(t) 00 R(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥
x(t)u(t)
⎡
⎣⎢⎢
⎤
⎦⎥⎥dt
to
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
=12Tr S(t f )E x(t f )x
T (t f )⎡⎣ ⎤⎦ + Q(t)E x(t)xT (t)⎡⎣ ⎤⎦ + R(t)E u(t)uT (t)⎡⎣ ⎤⎦{ }dtto
t f
∫
J =12Tr S(t f )P(t f ) + Q(t)P(t) + R(t)U t( )⎡⎣ ⎤⎦dt
to
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
where
P(t) E x(t)xT (t)⎡⎣ ⎤⎦U t( ) E u(t)uT (t)⎡⎣ ⎤⎦
or
35
Optimal Control Covariance
u t( ) = −C t( ) x̂ t( )Optimal control vector
U t( ) = C t( )P t( )CT t( )= R−1 t( )GT t( )S t( )P t( )S t( )G t( )R−1 t( )
Optimal control covariance
36
Express Cost with State and Adjoint Covariance Dynamics
Integration by parts to express S(tf)P(tf)
J =
12Tr S(to )P to( ) + Q(t)P t( ) + R(t)U t( ) + S(t)P(t) + S(t) P(t)⎡⎣ ⎤⎦dt
to
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
S(t)P(t) tot f = S(t)P(t)+ S(t) P(t)⎡⎣ ⎤⎦dt
to
t f
∫
S(t f )P(t f ) = S(to )P(to )+ S(t)P(t)+ S(t) P(t)⎡⎣ ⎤⎦dtto
t f
∫Rewrite cost function to incorporate initial cost
37
Evolution of State and Adjoint Covariance Matrices
(No Control)
State covariance response to random disturbance and initial uncertainty
Adjoint covariance response to state weighting and terminal cost weight
P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P to( ) given
S t( ) = −FT t( )S t( ) − S t( )F t( ) −Q t( ), S t f( ) given
u t( ) = 0; U t( ) = 0
38
Evolution of State and Adjoint Covariance Matrices
(Optimal Control)
P t( ) = F t( ) −G t( )C t( )⎡⎣ ⎤⎦P t( ) + P t( ) F t( ) −G t( )C t( )⎡⎣ ⎤⎦
T + L t( )W t( )LT t( )
S t( ) = −FT t( )S t( ) − S t( )F t( ) −Q t( ) − S t( )G t( )R−1 t( )GT t( )S t( )
Dependent on S(t)
Independent of P(t)
39
State covariance response to random disturbance and initial uncertainty
Adjoint covariance response to state weighting and terminal cost weight
With no control
Jno control =12Tr S(to )P to( ) + S t( )L t( )W t( )LT t( )dt
to
t f
∫⎡
⎣⎢⎢
⎤
⎦⎥⎥
With optimal control, the equation for the cost is the same
Joptimal control =12Tr S(to )P to( ) + S t( )L t( )W t( )LT t( )dt
to
t f
∫⎡
⎣⎢⎢
⎤
⎦⎥⎥
... but evolutions of S(t) and S(to) are different in each case
Total Cost With and Without Control
40
Information Sets and Expected Cost
41
! Sigma algebra(Wikipedia definitions)! The collection of sets over which a measure is defined! The collection of events that can be assigned probabilities! A measurable space
! Information available at current time, t1! All measurements from initial time, to! All control commands from initial time
I to,t1[ ] = z to,t1[ ],u to,t1[ ]{ }
The Information Set, I
! Plus available model structure, parameters, and statistics
I to,t1[ ] = z to,t1[ ],u to,t1[ ], f •( ),Q,R,{ }
42
! Measurements may be directly useful, e.g.,! Displays! Simple feedback control
! ... or they may require processing, e.g.,! Transformation! Estimation
! Example of a derived information set! History of mean and covariance from a state estimator
ID to,t1[ ] = x̂ to,t1[ ],P to,t1[ ],u to,t1[ ]{ }
A Derived Information Set, ID
43
! Markov derived information set! Most current mean and covariance from
a state estimator
IMD t1( ) = x̂ t1( ),P t1( ),u t1( ){ }
Additional Derived Information Sets
! Multiple model derived information set! Parallel estimates of current mean, covariance,
and hypothesis probability mass function
IMM t1( ) =x̂A t1( ),PA t1( ),u t1( ),Pr HA( )⎡⎣ ⎤⎦, x̂B t1( ),PB t1( ),u t1( ),Pr HB( )⎡⎣ ⎤⎦,!{ }
44
! Optimal control requires propagation of information back from the final time! Hence, it requires the entire information set, extending from to
to tf
Required and Available Information Sets for Optimal Control
I to,t f⎡⎣ ⎤⎦
! Separate information set into knowable and future sets
I to,t f⎡⎣ ⎤⎦ = I to,t1[ ] + I t1,t f⎡⎣ ⎤⎦
! Knowable set has been received! Future set is based on current knowledge of model, current estimates,
and statistics of future uncertainties
45
Expected Values of State and Control
Expected values of the state and control are conditioned on the information set
E x t( ) | ID⎡⎣ ⎤⎦ = x̂ t( )E x t( )− x̂ t( )⎡⎣ ⎤⎦ x t( )− x̂ t( )⎡⎣ ⎤⎦
T | ID{ } = P t( )
... where the conditional expected values are estimates from an optimal filter
46
Dependence of the Stochastic Cost Function on the Information Set
Expand the state covariance
J = 12E E Tr S(t f )x(t f )x
T (t f )⎡⎣ ⎤⎦ | ID⎡⎣ ⎤⎦ + E Tr Qx t( )xT t( )⎡⎣ ⎤⎦{ }dt0
t f
∫ + E Tr Ru t( )uT t( )⎡⎣ ⎤⎦{ }dt0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
P t( ) = E x t( )− x̂ t( )⎡⎣ ⎤⎦ x t( )− x̂ t( )⎡⎣ ⎤⎦T | ID{ }
= E x t( )xT t( )− x̂ t( )xT t( )− x t( ) x̂T t( ) + x̂ t( ) x̂T t( )⎡⎣ ⎤⎦ | ID{ } E x t( ) x̂T t( )⎡⎣ ⎤⎦ | ID{ } = E x̂ t( )xT t( )⎡⎣ ⎤⎦ | ID{ } = x̂ t( ) x̂T t( )
P t( ) = E x t( )xT t( )⎡⎣ ⎤⎦ | ID{ } − x̂ t( ) x̂T t( )or
E x t( )xT t( )⎡⎣ ⎤⎦ | ID{ } = P t( ) + x̂ t( ) x̂T t( )
... where the conditional expected values are
obtained from an optimal filter
47
Certainty-Equivalent and Stochastic Incremental Costs
J = 12E Tr S(t f ) P t f( ) + x̂(t f )x̂T (t f )⎡⎣ ⎤⎦{ }+ Tr Q P t( ) + x̂ t( ) x̂T t( )⎡⎣ ⎤⎦{ }dt
0
t f
∫ + Tr Ru t( )uT t( )⎡⎣ ⎤⎦dt0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪ JCE + JS
JCE =12E Tr S(t f )x̂(t f )x̂
T (t f )⎡⎣ ⎤⎦ + Tr Qx̂ t( ) x̂T t( ){ }dt0
t f
∫ + Tr Ru t( )uT t( )⎡⎣ ⎤⎦dt0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
JS =12E Tr S(t f )P t f( )⎡⎣ ⎤⎦ + Tr QP t( )⎡⎣ ⎤⎦dt
0
t f
∫⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
! Cost function has two parts! Certainty-equivalent cost! Stochastic increment cost
48
Expected Cost of the Trajectory
V * to( ) J * t f( ) = E φ x * (t f )⎡⎣ ⎤⎦ + L x * (τ ),u * (τ )[ ]
t0
tF
∫ dτ⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
E ξ( ) = E ξ | I to,t1[ ]( )Pr I to,t1[ ]{ }+ E ξ | I t1,t f⎡⎣ ⎤⎦( )Pr I t1,t f⎡⎣ ⎤⎦{ }= E E ξ |I( )⎡⎣ ⎤⎦
Law of total expectation
Optimized cost function
Because the past is established at t1
E J *( ) = E J* | I to,t1[ ]( ) 1[ ]+ E J* | I t1,t f⎡⎣ ⎤⎦( )Pr I t1,t f⎡⎣ ⎤⎦{ }= E J* | I to,t1[ ]( ) + E J* | I t1,t f⎡⎣ ⎤⎦( )Pr I t1,t f⎡⎣ ⎤⎦{ }
49
Expected Cost of the Trajectory For planning
50
E J *( ) =E J* | I t0,t1( )( ) + E J* | I t1,t f( )⎡⎣ ⎤⎦Pr I t1,t f( ){ }
Known Estimated
For real-time control, t1 ≤ tf
For post-trajectory analysis (with perfect measurements)
E J *( ) = E J* | I t0,t f( )⎡⎣ ⎤⎦Pr I t0,t f( ){ }
E J *( ) = E J* | I t0,t f( )( )Pr I t0,t f( ){ } = E J* | I t0,t f( )( ) 1[ ]
Dual Control(Fel’dbaum, 1965)
! Nonlinear system! Uncertain system parameters to be estimated! Parameter estimation can be aided by test inputs
! Approach: Minimize value function with three increments! Nominal control! Cautious control! Probing control
minuV* =
minu
V *nominal + V *cautious + V *probing( )
Estimation and control calculations are coupled and necessarily recursive
51
Next Time:Linear-Quadratic-Gaussian
Regulators
52
Top Related