Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School...
Transcript of Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School...
![Page 1: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/1.jpg)
iLQR
Deep Reinforcement Learning and Control
Katerina Fragkiadaki
Carnegie MellonSchool of Computer Science
![Page 2: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/2.jpg)
Optimal Control (Open Loop)
s.t. x0 = x0
xt+1 = f(xt, ut) t = 0, ..., T � 1
minx,u
TX
t=0
c
t
(xt
, u
t
)
• The optimal control problem:
![Page 3: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/3.jpg)
Optimal Control (Open Loop)
• Solution:
• Sequence of controls and resulting state sequence
• In general non-convex optimization problem, can be solved with sequential convex programming (SCP): https://stanford.edu/class/ee364b/lectures/seq_slides.pdf
xu
s.t. x0 = x0
xt+1 = f(xt, ut) t = 0, ..., T � 1
minx,u
TX
t=0
c
t
(xt
, u
t
)
• The optimal control problem:
![Page 4: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/4.jpg)
Optimal Control (Closed Loop a.k.a. MPC)
Given:!!
For!t=0,!1,!2,!…,!T!
! Solve!
! Execute!ut
! Observe!resul3ng!state,!
Op3mal!Control!(Closed!Loop)!
=!“Model!Predic3ve!Control”!!Ini3alize!with!solu3on!from!t!F!1!to!solve!fast!at!3me!t!
minx,u
TX
k=t
c
k
(xk
, u
k
)
s.t. x
k+1 = f(xk
, u
k
), 8k 2 {t, t+ 1, . . . , T � 1}x
t
= x
t
xt+1
Given:
For
• Solve
• Execute
• Observe resulting state,
• Initialize with solution from to solve fast at time
minx,u
TX
k=t
c
k
(xk
, u
k
)
x0
t = 0, 1, 2, ..., T
s.t. xk+1 = f(xk, uk), 8k 2 {t, t+ 1, ..., T � 1}
ut
xt+1
xt = xt
t� 1 t
![Page 5: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/5.jpg)
Shooting methods vs collocation methods
Shooting methods vs collocation
collocation method: optimize over actions and states, with constraints
Collocation Method: optimize over actions and state, with constraints
minu1,...,uT ,x1,...,xT
TX
t=1
c(xt
, u
t
) s.t xt
= f(xt�1, ut�1)
Diagram: Sergey Levine
![Page 6: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/6.jpg)
Shooting methods vs collocation
shooting method: optimize over actions only
Shooting Method: optimize over actions only
minu1,...,uT
c(x1, u1) + c(f(x1, u1), u2) + · · ·+ c(f(f(...)...), uT )
Diagram: Sergey Levine
Indeed, x are not necessary since every u results (following the dynamics) in a state sequence x, for which in turn the cost can be computed• Not clear how to initialize in a way that nudges towards a goal state
Shooting methods vs collocation methods
![Page 7: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/7.jpg)
Bellman’s Curse of Dimensionality
Bellman’s!Curse!of!Dimensionality!! n:dimensional!state!space!
! Number!of!states!grows!exponenBally!in!n!(for!fixed!number!of!discreBzaBon!levels!per!coordinate)!
! In!pracBce!! DiscreBzaBon!is!considered!only!computaBonally!feasible!up!to!5!or!6!
dimensional!state!spaces!even!when!using!! Variable!resoluBon!discreBzaBon!! Highly!opBmized!implementaBons!
• n-dimensional state space
• Number of states grows exponentially in n (for fixed number of discretization levels per coordinate)
• In practice
• Discretization is considered only computationally feasible up to 5 or 6 dimensional state spaces even when using
• Variable resolution discretization
• Highly optimized implementations
![Page 8: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/8.jpg)
Linear case: LQR
Linear case: LQR
linear quadratic
• Very special case: Optimal Control for Linear Dynamic Systems and Quadratic Cost (a.k.a. LQ setting)
• Can solve continuous state-space optimal control problem exactly• Running time: O(Tn3)
![Page 9: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/9.jpg)
Linear dynamics: Newtonian Dynamics
•
•
•
•
x
t+1 = x
t
+�tx
t
+�t
2F
x
yt+1 = yt +�tyt +�t2Fy
yt+1 = yt +�tFy
x
t+1 = x
t
+�tF
x
![Page 10: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/10.jpg)
What is the state x?
• position and velocities of the robotic joints
• position and velocity of the object being manipulated
In most robotic tasks, state is hand engineered and includes:
Those are both known: the robot knows its state and we perceive the state of the objects in the world. In tasks where we do not even want to bother with object state, we just concatenate the robotic state across multiple time steps to implicitly infer the interaction (collision with the object)
![Page 11: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/11.jpg)
What is the cost
•
is the target state
• In the final time step, you can add a term with higher weight:
Final cost
• For object manipulation, includes not only desired pose of the end effector but also desired pose of the objects
c(xt, ut)
c(xt, ut) = kxt � x
⇤k+ �kutk
x
⇤
c(xT , uT ) = 2(kxT � x
⇤k+ �kuT k)
x
⇤
![Page 12: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/12.jpg)
Definitions: : optimal action value function, optimal cost-to-go at state as a function of assuming we act optimal past step tQ(xt, ut) xt
ut : optimal state value function, optimal cost-to-go from state
V (xt) xt
Linear Quadratic Regulator (LQR)Linear case: LQR
V (xt) = minut
Q(xt, ut)
: the initial state, known and given x0
![Page 13: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/13.jpg)
Principle of Optimality
An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must
constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)
![Page 14: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/14.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
Value iteration: backward propagation!Start from and work backwards
Linear case: LQR
uT
![Page 15: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/15.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQRLinear case: LQR
Value iteration: backward propagation!Start from and work backwards
Linear case: LQR
Cost matrices for the last time step:
uT
![Page 16: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/16.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQRLinear case: LQR
Linear case: LQR
Value iteration: backward propagation!Start from and work backwards
Linear case: LQR
Cost matrices for the last time step:
uT
Set derivative to zero since we have a quadratic to find minimizing u_T:
![Page 17: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/17.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQRLinear case: LQR
Linear case: LQR
Value iteration: backward propagation!Start from and work backwards
Linear case: LQR
Cost matrices for the last time step:
uT
Set derivative to zero since we have a quadratic to find minimizing u_T:
Linear case: LQRLinear case: LQRLinear case: LQR
![Page 18: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/18.jpg)
Linear Quadratic Regulator (LQR)Remember:Substituting the minimizer into gives us !
V (xt) = minut
Q(xt, ut)
uT Q(xT , uT ) V (xT )
![Page 19: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/19.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
Remember:Substituting the minimizer into gives us !
V (xt) = minut
Q(xt, ut)
uT Q(xT , uT ) V (xT )
Linear case: LQR
Linear case: LQR
![Page 20: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/20.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
Remember:Substituting the minimizer into gives us !
V (xt) = minut
Q(xt, ut)
uT Q(xT , uT ) V (xT )
Linear case: LQR
Linear case: LQR
Linear case: LQR
![Page 21: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/21.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
Remember:Substituting the minimizer into gives us !
V (xt) = minut
Q(xt, ut)
uT Q(xT , uT ) V (xT )
Linear case: LQRLinear case: LQR
![Page 22: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/22.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
Remember:Substituting the minimizer into gives us !
V (xt) = minut
Q(xt, ut)
uT Q(xT , uT ) V (xT )
Linear case: LQRLinear case: LQR
optimal cost-to-go as a function of the final state
![Page 23: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/23.jpg)
Linear Quadratic Regulator (LQR)We propagate the optimal value function backwards!!
Linear case: LQR
linear linearquadratic
cost at T-1 best cost-to-go
![Page 24: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/24.jpg)
Linear Quadratic Regulator (LQR)We propagate the optimal value function backwards!!
Linear case: LQR
linear linearquadratic
best cost-to-go
Linear case: LQR
linear linearquadratic
cost at T-1
![Page 25: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/25.jpg)
Linear Quadratic Regulator (LQR)We propagate the optimal value function backwards!!
q⇤(s, a) = r(s, a) + �X
s02S
T (s0|s, a)v⇤(s0)
Linear case: LQR
linear linearquadratic
best cost-to-go
Linear case: LQR
linear linearquadratic
cost at T-1
![Page 26: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/26.jpg)
Linear Quadratic Regulator (LQR)We propagate the optimal value function backwards!!
q⇤(s, a) = r(s, a) + �X
s02S
T (s0|s, a)v⇤(s0)
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
Linear case: LQR
linear linearquadraticWe can eliminate x_T by writing only in terms of quantities of T-1!
![Page 27: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/27.jpg)
Linear Quadratic Regulator (LQR)We propagate the optimal value function backwards!!
q⇤(s, a) = r(s, a) + �X
s02S
T (s0|s, a)v⇤(s0)
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
We can eliminate x_T by writing only in terms of quantities of T-1!
![Page 28: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/28.jpg)
Linear Quadratic Regulator (LQR)We propagate the optimal value function backwards!!
q⇤(s, a) = r(s, a) + �X
s02S
T (s0|s, a)v⇤(s0)
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
We can eliminate x_T by writing only in terms of quantities of T-1!
![Page 29: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/29.jpg)
Linear Quadratic Regulator (LQR)We propagate the optimal value function backwards!!
q⇤(s, a) = r(s, a) + �X
s02S
T (s0|s, a)v⇤(s0)
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
We can eliminate x_T by writing only in terms of quantities of T-1!
We have written only in terms of !V (xT ) xT�1, uT�1
![Page 30: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/30.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
We propagate the optimal value function backwards!!
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
![Page 31: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/31.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
We have written optimal action value function only in terms of !
Q(xT�1, uT�1)
xT�1, uT�1
We propagate the optimal value function backwards!!
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
![Page 32: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/32.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
We have written optimal action value function only in terms of !
Q(xT�1, uT�1)
xT�1, uT�1
We propagate the optimal value function backwards!!
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
Let’s take derivative to find the minimizing u_{T-1}!
![Page 33: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/33.jpg)
Linear Quadratic Regulator (LQR)Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
Linear case: LQR
linear linearquadratic
We have written optimal action value function only in terms of !
Q(xT�1, uT�1)
xT�1, uT�1
We propagate the optimal value function backwards!!
Linear case: LQR
linear linearquadratic
Immediate cost best cost-to-go
Let’s take derivative to find the minimizing u_{T-1}!
![Page 34: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/34.jpg)
Linear case: LQR
Linear case: LQR
Diagram: Sergey Levine
Backward recursion:
![Page 35: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/35.jpg)
Linear case: LQR
Linear case: LQRBackward recursion:
We know x_0!
Linear case: LQR
Forward recursion:
![Page 36: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/36.jpg)
Non-linear case:Use iterative approximations!
First order Taylor expansion for the dynamics around a trajectory :
Nonlinear case: DDP/iterative LQR
xt, ut, t = 1 · · ·T
Nonlinear case: DDP/iterative LQR
Second order Taylor expansion for the cost around a trajectory :xt, ut, t = 1 · · ·T
![Page 37: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/37.jpg)
Non-linear case:Use iterative approximations!
First order Taylor expansion for the dynamics around a trajectory :
Nonlinear case: DDP/iterative LQR
xt, ut, t = 1 · · ·T
Nonlinear case: DDP/iterative LQR
Second order Taylor expansion for the cost around a trajectory :xt, ut, t = 1 · · ·T
![Page 38: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/38.jpg)
Non-linear case:Use iterative approximations!
Nonlinear case: DDP/iterative LQR
First order Taylor expansion for the dynamics around a trajectory :
Nonlinear case: DDP/iterative LQR
xt, ut, t = 1 · · ·T
Nonlinear case: DDP/iterative LQR
Second order Taylor expansion for the cost around a trajectory :xt, ut, t = 1 · · ·T
![Page 39: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/39.jpg)
Nonlinear case: DDP/iterative LQRInitialization: Given , pick a random control sequence and obtain corresponding state sequence
Iterative LQR (i-LQR)x0 u0...uT
x0...xT
8t
8t
8t8t
8t
8t
ut = ut +Kt(xt � xT ) + kt
![Page 40: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/40.jpg)
Nonlinear case: DDP/iterative LQR
Iterative LQR (i-LQR)
Linear approximation around
Find so that minimizes the linear approximation
Go to the and
8t
8t
8t
8t
8tx, u
�ut, t = 1...T
ut +�ut
x
0 = x+�xt u0 = u+�ut
Initialization: Given , pick a random control sequence and obtain corresponding state sequence
x0 u0...uTx0...xT
8tut = ut +Kt(xt � xT ) + kt
![Page 41: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/41.jpg)
Nonlinear case: DDP/iterative LQR
Nonlinear case: DDP/iterative LQR
Nonlinear case: DDP/iterative LQRThe quadratic approximation in invalid too far away from the reference trajectory
Nonlinear case: DDP/iterative LQR
line search for \alpha
Instead of finding the argmin i do a line search
Run forward pass with real nonlinear dynamics and ut = ut +Kt(xt � xT ) + ↵kt
![Page 42: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/42.jpg)
• So far we have been planning (e.g. 100 steps) and then we close our eyes and hope our modeling was accurate enough..
• At convergence of iLQR and DDP, we end up with linearization around the (state, input) trajectory the algorithm converged to.
• In practice: the system could not be on this trajectory due to perturbations / initial state being off / dynamics model being off / …
• Can we handle such noise better?
Nonlinear case: DDP/iterative LQR
![Page 43: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/43.jpg)
Model Predictive Control• Yes! If we close the loop! Model predictive control!• Solution: at time t when asked to generate control input u_t, we could re-
solve the control problem using iLQR or DDP over the time steps t through T
Case study: nonlinear model-predictive control
• Re-planning entire trajectory is often impractical -> in practice: replay over horizon H (receding horizon control)
![Page 44: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/44.jpg)
i-LQR: When it works Cost:
Direction for minimizing the cost
kxt � x
⇤k
x
⇤
xt
![Page 45: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/45.jpg)
i-LQR: When it doesn’t work
Cost:
Due to discontinuities of contact, the local search fails! Solution?Initialize using a human demonstration instead of random!
kxt � x
⇤k
x
⇤
xt
Learning Dexterous manipulation Policies from Experience and Imitation, Kumar, Gupta, Todorov, Levine 2016
![Page 46: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/46.jpg)
Time varying linear dynamicsLocal models
t
reference trajectoryxt, ut, t = 1, ..., T
![Page 47: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/47.jpg)
Time varying linear dynamicsLocal models
t
f(xt, ut) ⇡ Atxt +Btut
At =df
dxtBt =
df
dut
reference trajectorylearn time varying linear dynamics:
xt, ut, t = 1, ..., TAt,Bt
![Page 48: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/48.jpg)
Time varying linear dynamicsLocal models
t
f(xt, ut) ⇡ Atxt +Btut
At =df
dxtBt =
df
dut
reference trajectorylearn time varying linear dynamics:
xt, ut, t = 1, ..., TAt,Bt
How do I get the data to fit my linear dynamics at each time step?We execute the controller at state to explore how the world works in the vicinity of the reference trajectory!
ut xt
![Page 49: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/49.jpg)
Discrete and Continuous version
+D
![Page 50: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/50.jpg)
Fitting Dynamics(1): Compute analytically derivatives of the “true” non-linear dynamics
• We may not have such analytic non linear dynamic equations available• Very limiting: under modeling errors• Complicated derivations
![Page 51: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/51.jpg)
Fitting Dynamics(2): Finite Differences
We need 2 samples per state dimension
![Page 52: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/52.jpg)
Fitting Dynamics(3): Linear regression
Use linear regression to fit A,B,D to samples
Use GMM priors as described in the lecture
+D
![Page 53: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/53.jpg)
Learning Neural Network Policies with guided Policy Search under Unknown Dynamics, Levine and Abbeel 2014
Bayesian Linear dynamics fittingFit a Global Gaussian Mixture Model using all samples of all iterations and time steps. -> priorUse current samples (from this iteration) and obtain Gaussian posterior for , which you condition to obtain .Such prior results in 4 to 8 times less samples needed, despite the fact that it is not accurate enough by itself.
Posterior of mean and covariance where are the empirical means and covariances and an inverse Wishart prior�, µ0, n0,m
µ, ⌃
p(xt+1|xt, ut)
(xt, ut, xt+1)
(xt, ut, xt+1)
![Page 54: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/54.jpg)
One shot Learning of Manipulation Skills with Online Dynamic Adaptation and Neural Network Priors, Fu et al.
Fit a Global Model of Dynamics by fitting a Neural Network using all samples of all iterations and time steps, and across multiple manipulation tasks->multi-task learning. Use model predictive control with iLQR for computing the policy at every time step. State is the robotic arm configuration and cost depends on a desired end-effector pose. No object involved in the state.
Bayesian Linear dynamics fitting
(xt, ut, xt+1)
![Page 55: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/55.jpg)
Time varying linear dynamics
Local models
We iteratively fit dynamics and update the policy. Why such iteration is important?So that the space (state, action distribution) our dynamics are estimated is similar to the one our policy visits (last lecture).
![Page 56: Katerina Fragkiadaki - GitHub Pages · 2018. 1. 9. · Katerina Fragkiadaki Carnegie Mellon School of Computer Science. Optimal Control (Open Loop) s.t. x 0 =¯x 0 x t+1 = f (x t,u](https://reader035.fdocuments.net/reader035/viewer/2022070217/61233089fe7ea143ae4ffffe/html5/thumbnails/56.jpg)
Fitting time varying linear dynamics
Local models
• Can we further improve sample complexity? Right now each sample contributes in one linear model fitting.
• Instead of linear regression use Bayesian linear regression!(xt, ut, xt+1)