Value iteration networks
-
Upload
sungjoon-samuel -
Category
Engineering
-
view
823 -
download
0
Transcript of Value iteration networks
![Page 1: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/1.jpg)
Value Iteration Networks
Aviv Tamar, Sergey Levine, and Pieter Abbeel
Presenter: Sungjoon Choi
arXiv:1602.02867v1 [cs.AI] 9 Feb 2016
![Page 2: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/2.jpg)
This paper can be used for
![Page 3: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/3.jpg)
Convolutional Networks
Today, we will see a very clever interpretation of CNN ! CNN is not just used for efficient feature extractor but this paper finds an analogy between operations in CNN and value iteration algorithm in reinforcement learning.
![Page 4: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/4.jpg)
Convolutional NetworksWhen it comes to an image processing, CNN is used in almost Everywhere!
![Page 5: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/5.jpg)
Structured Prediction?Structured prediction is an umbrella term for su-pervised machine learning techniques that involve predicting structured objects, rather than scalar dis-crete or real values.
![Page 6: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/6.jpg)
Path Planning?
![Page 7: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/7.jpg)
Why not just End to End?
![Page 8: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/8.jpg)
Is it Deep Q Learning?
No, it is different. DQN only models the Q-function with CNN.
![Page 9: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/9.jpg)
Reinforcement Learning
We only get the reward at certain points. What makes RL different from other methods?
But we have to make decision every time.
![Page 10: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/10.jpg)
RL: Value IterationSo, we introduce the notion of value.
And of course, ways to find the value function.
![Page 11: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/11.jpg)
Value Iteration via CNN?This papers says “ We introduce the value iteration network: a fully differ-entiable neural network with a panning module embedded within.”
![Page 12: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/12.jpg)
Value Iteration via CNN?
![Page 13: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/13.jpg)
Value Iteration Block
![Page 14: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/14.jpg)
Value Iteration Block
The depth of the Q layer need not to be the same as the number of actions.
![Page 15: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/15.jpg)
Value Iteration Network
VI Block
![Page 16: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/16.jpg)
Value Iteration Network
Or just a feature extraction stage. (I guess)
![Page 17: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/17.jpg)
Hierarchical VI Network
![Page 18: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/18.jpg)
Grid-World Experiment
![Page 19: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/19.jpg)
Grid-World Experiment
Input: Sequence of states (locations)Output: Sequence of actions (controls)
![Page 20: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/20.jpg)
Grid-World Experiment
Value Iteration Network vs. Direct Policy Learning
![Page 21: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/21.jpg)
Mars Rover Navigation
![Page 22: Value iteration networks](https://reader036.fdocuments.net/reader036/viewer/2022070510/58ac5c6f1a28ab8e258b622f/html5/thumbnails/22.jpg)
ConclusionVery clever idea of using CNN as a building block for solving inverse rein-forcement learning problem!
Make things differentiable and use deep networks, deep learning tools will take care of the rest.
Still conceptual level, but potentials are limitless