Value iteration networks

22

Value Iteration Networks Aviv Tamar, Sergey Levine, and Pieter Abbeel Presenter: Sungjoon Choi arXiv:1602.02867v1 [cs.AI] 9 Feb 2

Upload
sungjoon-samuel
Category

Engineering
view
823
download
0

Embed Size (px):

Transcript of Value iteration networks

Page 1: Value iteration networks

Value Iteration Networks

Aviv Tamar, Sergey Levine, and Pieter Abbeel

Presenter: Sungjoon Choi

arXiv:1602.02867v1 [cs.AI] 9 Feb 2016

Page 2: Value iteration networks

This paper can be used for

Page 3: Value iteration networks

Convolutional Networks

Today, we will see a very clever interpretation of CNN ! CNN is not just used for efficient feature extractor but this paper finds an analogy between operations in CNN and value iteration algorithm in reinforcement learning.

Page 4: Value iteration networks

Convolutional NetworksWhen it comes to an image processing, CNN is used in almost Everywhere!

Page 5: Value iteration networks

Structured Prediction?Structured prediction is an umbrella term for su-pervised machine learning techniques that involve predicting structured objects, rather than scalar dis-crete or real values.

Page 6: Value iteration networks

Path Planning?

Page 7: Value iteration networks

Why not just End to End?

Page 8: Value iteration networks

Is it Deep Q Learning?

No, it is different. DQN only models the Q-function with CNN.

Page 9: Value iteration networks

Reinforcement Learning

We only get the reward at certain points. What makes RL different from other methods?

But we have to make decision every time.

Page 10: Value iteration networks

RL: Value IterationSo, we introduce the notion of value.

And of course, ways to find the value function.

Page 11: Value iteration networks

Value Iteration via CNN?This papers says “ We introduce the value iteration network: a fully differ-entiable neural network with a panning module embedded within.”

Page 12: Value iteration networks

Value Iteration via CNN?

Page 13: Value iteration networks

Value Iteration Block

Page 14: Value iteration networks

Value Iteration Block

The depth of the Q layer need not to be the same as the number of actions.

Page 15: Value iteration networks

Value Iteration Network

VI Block

Page 16: Value iteration networks

Value Iteration Network

Or just a feature extraction stage. (I guess)

Page 17: Value iteration networks

Hierarchical VI Network

Page 18: Value iteration networks

Grid-World Experiment

Page 19: Value iteration networks

Grid-World Experiment

Input: Sequence of states (locations)Output: Sequence of actions (controls)

Page 20: Value iteration networks

Grid-World Experiment

Value Iteration Network vs. Direct Policy Learning

Page 21: Value iteration networks

Mars Rover Navigation

Page 22: Value iteration networks

ConclusionVery clever idea of using CNN as a building block for solving inverse rein-forcement learning problem!

Make things differentiable and use deep networks, deep learning tools will take care of the rest.

Still conceptual level, but potentials are limitless

Topological Value Iteration Algorithmshomes.cs.washington.edu/~weld/papers/dai-jair11.pdfLexington, KY 40508 Abstract Value iteration is a powerful yet inefﬁcient algorithm for Markov

Topological Value Iteration Algorithmshomes.cs.washington.edu/~weld/papers/dai-jair11.pdfLexington, KY 40508 Abstract Value iteration is a powerful yet inefﬁcient algorithm for Markov

Empirical Q-Value Iteration - arXiv · 2019. 1. 31. · Kalathil, Borkar and Jain/Empirical Q-Value Iteration 2 step-sizes that are either decreasing slowly in a precise sense or

Empirical Q-Value Iteration - arXiv · 2019. 1. 31. · Kalathil, Borkar and Jain/Empirical Q-Value Iteration 2 step-sizes that are either decreasing slowly in a precise sense or

Primitive (Co)Recursion and Course-of-Value (Co)Iteration ...Primitive (Co)Recursion and Course-of-Value (Co)Iteration, Categorically 9 Catamorphisms obey several nice laws, of which

Primitive (Co)Recursion and Course-of-Value (Co)Iteration ...Primitive (Co)Recursion and Course-of-Value (Co)Iteration, Categorically 9 Catamorphisms obey several nice laws, of which

1 0.561.121.170.322.787.423.147.71 Value 6.214.42 Iteration 0: step 0. Insertion Sort 23450189Array index67 Iteration i. Repeatedly swap element i with.

1 0.561.121.170.322.787.423.147.71 Value 6.214.42 Iteration 0: step 0. Insertion Sort 23450189Array index67 Iteration i. Repeatedly swap element i with.

The Predictron: End-to-end Learning and Planningmlg.postech.ac.kr/~readinglist/slides/20161227.pdf · Value Iteration Networks Summary I Neural network architecture that plans using

The Predictron: End-to-end Learning and Planningmlg.postech.ac.kr/~readinglist/slides/20161227.pdf · Value Iteration Networks Summary I Neural network architecture that plans using

External Memory Value Iteration

External Memory Value Iteration

Value Iteration Network - Runzhe Yang · 2020. 3. 11. · Value Iteration Networks NIPS 2016 BEST PAPER 7-Minute Tour Runzhe Yang @ SJTU ACM CLASS Aviv Tamar, Yi Wu, Garrett Thomas,

Value Iteration Network - Runzhe Yang · 2020. 3. 11. · Value Iteration Networks NIPS 2016 BEST PAPER 7-Minute Tour Runzhe Yang @ SJTU ACM CLASS Aviv Tamar, Yi Wu, Garrett Thomas,

RL 8: Value Iteration and Policy Iteration · RL 8: Value Iteration and Policy Iteration MichaelHerrmann University of Edinburgh, School of Informatics 06/02/2015

RL 8: Value Iteration and Policy Iteration · RL 8: Value Iteration and Policy Iteration MichaelHerrmann University of Edinburgh, School of Informatics 06/02/2015

Reachability in MDPs: Refining Convergence of Value Iteration · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and Benjamin

Reachability in MDPs: Refining Convergence of Value Iteration · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and Benjamin

Non-delusional Q-learning and value iteration...architecture with VC-dimension d

Non-delusional Q-learning and value iteration...architecture with VC-dimension d

Gated Path Planning Networks ICML 2018proceedings.mlr.press/v80/lee18c/lee18c.pdf · 2019-02-08 · Value Iteration Networks (VINs) are effective dif-ferentiable path planning modules

Gated Path Planning Networks ICML 2018proceedings.mlr.press/v80/lee18c/lee18c.pdf · 2019-02-08 · Value Iteration Networks (VINs) are effective dif-ferentiable path planning modules

Sistemi Intelligenti Reinforcement ... - homes.di.unimi.it · Schema di Apprendimento Policy iteration Generalized Policy iteration { Value iteration ... With this setting the estimated

Sistemi Intelligenti Reinforcement ... - homes.di.unimi.it · Schema di Apprendimento Policy iteration Generalized Policy iteration { Value iteration ... With this setting the estimated

Lecture IV Value Function Iteration with Discretization · 2015-09-17 · Lecture IV Value Function Iteration with Discretization Gianluca Violante New York University Quantitative

Lecture IV Value Function Iteration with Discretization · 2015-09-17 · Lecture IV Value Function Iteration with Discretization Gianluca Violante New York University Quantitative

Value iteration and optimization of multiclass queueing networks · 2014-12-08 · Queueing Systems 32 (1999) 65–97 65 Value iteration and optimization of multiclass queueing networks

Value iteration and optimization of multiclass queueing networks · 2014-12-08 · Queueing Systems 32 (1999) 65–97 65 Value iteration and optimization of multiclass queueing networks

Rover-IRL: Inverse Reinforcement Learning with Soft Value … · 2020-05-03 · Rover-IRL: Inverse Reinforcement Learning with Soft Value Iteration Networks for Planetary Rover Path

Rover-IRL: Inverse Reinforcement Learning with Soft Value … · 2020-05-03 · Rover-IRL: Inverse Reinforcement Learning with Soft Value Iteration Networks for Planetary Rover Path

Limiting Extrapolation in Linear Approximate Value Iteration · 2020. 4. 16. · Linear Approximate Value Iteration Andrea Zanette Institute for Computational and Mathematical Engineering,

Limiting Extrapolation in Linear Approximate Value Iteration · 2020. 4. 16. · Linear Approximate Value Iteration Andrea Zanette Institute for Computational and Mathematical Engineering,

MODULE COURSEWORK FEEDBACK - WordPress.com · 2. Question A: Value Iteration In reinforcement learning, value iteration concerns with ﬁnding the optimal policy ⇡ using an iterative

MODULE COURSEWORK FEEDBACK - WordPress.com · 2. Question A: Value Iteration In reinforcement learning, value iteration concerns with ﬁnding the optimal policy ⇡ using an iterative

Dimitri P. Bertsekasdimitrib/Stable_Opt_Control_Slides.pdfDPB, “Value and Policy Iteration in Optimal Control and Adaptive Dynamic Programming," IEEE Trans. on Neural Networks and

Dimitri P. Bertsekasdimitrib/Stable_Opt_Control_Slides.pdfDPB, “Value and Policy Iteration in Optimal Control and Adaptive Dynamic Programming," IEEE Trans. on Neural Networks and

PID Accelerated Value Iteration Algorithm

PID Accelerated Value Iteration Algorithm

Generalized Value Iteration Networks:Life Beyond LatticesGeneralized Value Iteration Networks: Life Beyond Lattices Sufeng Niu ySiheng Chenz, Hanyu Guo , Colin Targonski , Melissa

Generalized Value Iteration Networks:Life Beyond LatticesGeneralized Value Iteration Networks: Life Beyond Lattices Sufeng Niu ySiheng Chenz, Hanyu Guo , Colin Targonski , Melissa

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS