PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8....
Transcript of PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8....
![Page 1: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/1.jpg)
Optimal Control and Planning
CS 285
Instructor: Sergey LevineUC Berkeley
![Page 2: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/2.jpg)
Today’s Lecture
1. Introduction to model-based reinforcement learning
2. What if we know the dynamics? How can we make decisions?
3. Stochastic optimization methods
4. Monte Carlo tree search (MCTS)
5. Trajectory optimization
• Goals:• Understand how we can perform planning with known dynamics models in
discrete and continuous spaces
![Page 3: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/3.jpg)
Recap: the reinforcement learning objective
![Page 4: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/4.jpg)
Recap: model-free reinforcement learning
assume this is unknown
don’t even attempt to learn it
![Page 5: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/5.jpg)
What if we knew the transition dynamics?
• Often we do know the dynamics1. Games (e.g., Atari games, chess, Go)
2. Easily modeled systems (e.g., navigating a car)
3. Simulated environments (e.g., simulated robots, video games)
• Often we can learn the dynamics1. System identification – fit unknown parameters of a known model
2. Learning – fit a general-purpose model to observed transition data
Does knowing the dynamics make things easier?
Often, yes!
![Page 6: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/6.jpg)
1. Model-based reinforcement learning: learn the transition dynamics, then figure out how to choose actions
2. Today: how can we make decisions if we know the dynamics?a. How can we choose actions under perfect knowledge of the system dynamics?
b. Optimal control, trajectory optimization, planning
3. Next week: how can we learn unknown dynamics?
4. How can we then also learn policies? (e.g. by imitating optimal control)
Model-based reinforcement learning
policy
system dynamics
![Page 7: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/7.jpg)
The objective
1. run away
2. ignore
3. pet
![Page 8: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/8.jpg)
The deterministic case
![Page 9: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/9.jpg)
The stochastic open-loop case
why is this suboptimal?
![Page 10: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/10.jpg)
Aside: terminologywhat is this “loop”?
closed-loop open-loop
only sent at t = 1,then it’s one-way!
![Page 11: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/11.jpg)
The stochastic closed-loop case
![Page 12: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/12.jpg)
Open-Loop Planning
![Page 13: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/13.jpg)
But for now, open-loop planning
![Page 14: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/14.jpg)
Stochastic optimization
simplest method: guess & check “random shooting method”
![Page 15: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/15.jpg)
Cross-entropy method (CEM)
can we do better?
typically use Gaussian distribution
see also: CMA-ES (sort of like CEM with momentum)
![Page 16: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/16.jpg)
What’s the problem?
1. Very harsh dimensionality limit
2. Only open-loop planning
What’s the upside?
1. Very fast if parallelized
2. Extremely simple
![Page 17: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/17.jpg)
Discrete case: Monte Carlo tree search (MCTS)
![Page 18: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/18.jpg)
Discrete case: Monte Carlo tree search (MCTS)
e.g., random policy
![Page 19: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/19.jpg)
Discrete case: Monte Carlo tree search (MCTS)
+10 +15
![Page 20: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/20.jpg)
Discrete case: Monte Carlo tree search (MCTS)
Q = 10N = 1
Q = 12N = 1
Q = 16N = 1
Q = 22N = 2Q = 38N = 3
Q = 10N = 1
Q = 12N = 1
Q = 22N = 2Q = 30N = 3
Q = 8N = 1
![Page 21: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/21.jpg)
Additional reading
1. Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, Colton. (2012). A Survey of Monte Carlo Tree Search Methods.• Survey of MCTS methods and basic summary.
![Page 22: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/22.jpg)
Trajectory Optimization with Derivatives
![Page 23: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/23.jpg)
Can we use derivatives?
![Page 24: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/24.jpg)
Shooting methods vs collocation
shooting method: optimize over actions only
![Page 25: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/25.jpg)
Shooting methods vs collocation
collocation method: optimize over actions and states, with constraints
![Page 26: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/26.jpg)
Linear case: LQR
linear quadratic
![Page 27: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/27.jpg)
Linear case: LQR
![Page 28: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/28.jpg)
Linear case: LQR
![Page 29: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/29.jpg)
Linear case: LQR
linear linearquadratic
![Page 30: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/30.jpg)
Linear case: LQR
linear linearquadratic
![Page 31: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/31.jpg)
Linear case: LQR
![Page 32: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/32.jpg)
Linear case: LQR
![Page 33: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/33.jpg)
LQR for Stochastic and Nonlinear Systems
![Page 34: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/34.jpg)
Stochastic dynamics
![Page 35: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/35.jpg)
The stochastic closed-loop case
![Page 36: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/36.jpg)
Nonlinear case: DDP/iterative LQR
![Page 37: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/37.jpg)
Nonlinear case: DDP/iterative LQR
![Page 38: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/38.jpg)
Nonlinear case: DDP/iterative LQR
![Page 39: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/39.jpg)
Nonlinear case: DDP/iterative LQR
![Page 40: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/40.jpg)
Nonlinear case: DDP/iterative LQR
![Page 41: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/41.jpg)
Nonlinear case: DDP/iterative LQR
![Page 42: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/42.jpg)
Case Study and Additional Readings
![Page 43: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/43.jpg)
Case study: nonlinear model-predictive control
![Page 44: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/44.jpg)
![Page 45: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/45.jpg)
Additional reading
1. Mayne, Jacobson. (1970). Differential dynamic programming. • Original differential dynamic programming algorithm.
2. Tassa, Erez, Todorov. (2012). Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization.• Practical guide for implementing non-linear iterative LQR.
3. Levine, Abbeel. (2014). Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics.• Probabilistic formulation and trust region alternative to deterministic line search.
![Page 46: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games](https://reader035.fdocuments.net/reader035/viewer/2022071607/6145082934130627ed50ba1c/html5/thumbnails/46.jpg)
What’s wrong with known dynamics?
Next time: learning the dynamics model