Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson...
-
Upload
ethelbert-chandler -
Category
Documents
-
view
216 -
download
1
Transcript of Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson...
![Page 1: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/1.jpg)
Using Hierarchical Reinforcement Learning to Balance Conflicting Sub-problems
By: Stephen Robertson
Supervisor: Phil Sterne
![Page 2: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/2.jpg)
Presentation Outline
• Project Motivation• Project Aim• Rules of the Gridworld• Flat Reinforcement Learning• Feudal Reinforcement Learning• State Variable Combination Approach
![Page 3: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/3.jpg)
Project Motivation
• Reinforcement Learning is an attractive form of machine learning, but because of the curse of dimensionality, with complex problems it becomes inefficient
• Hierarchical Reinforcement Learning is a method for dealing with this curse of dimensionality
![Page 4: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/4.jpg)
Project Aim
• Implementing various algorithms of Hierarchical Reinforcement Learning to a complex gridworld problem
• Comparing the various algorithms to each other and to flat Reinforcement Learning
![Page 5: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/5.jpg)
Rules of the gridworld
• Possible Actions: Left, Right, Up, Down and Rest
• Collecting food and drink increases nourishment and hydration respectively
• After landing on the tree, the creature is carrying wood which it can use to repair its shelter
![Page 6: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/6.jpg)
Rules of the gridworld• Resting in a repaired
shelter increases health in proportion to the shelter condition
• Landing on the lion decreases health and results in a direct punishment
• After every 4 steps, nourishment, hydration, and shelter condition decrease by 1. After 10 steps, health decreases by 1.
![Page 7: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/7.jpg)
Flat Reinforcement learning
• Sarsa with eligibility traces was used• To get Flat Reinforcement Learning
working, the task needed to be simplified slightly
• Limited to a 6x6 gridworld• Nourishment, Hydration, Health and
Shelter Condition minimised to 5 discrete levels each
• Total states: 6 x 6 x 5 x 5 x 5 x 5 x 2 = 45000
• Managable
![Page 8: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/8.jpg)
Flat Reinforcement Learning
• The given task requires a large amount of exploration in order to find the optimal solution
• Total exploration at first, decreasing gradually until finally total exploitation
• Optimistic initialisation of tables to maximum possible reward of 6400 encourages efficient exploration
![Page 9: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/9.jpg)
Flat Reinforcement Learning Results
![Page 10: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/10.jpg)
Feudal Reinforcement Learning
• Needs to be modified for the given problem
• In the simple maze problem, state variables change independently, and don’t change by more than 1
• In the simple maze problem, high level actions can be defined as the same as low level actions
![Page 11: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/11.jpg)
Feudal Reinforcement Learning
• Main problem with the complex problem is the high level actions are hard to define
• State variables can change simultaneously and by more than one, i.e. creature can move to the left, and fully satisfy hunger in one step, changing two state variables simultaneously
• High level actions are defined as desired high level state
![Page 12: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/12.jpg)
Feudal Reinforcement Learning Results• Feudal reinforcement learning failed
horribly
![Page 13: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/13.jpg)
State Variable Combination Approach• In a problem with conflicting sub-problems,
sub-problems tend to be defined by a limited set of state variables
• Sub-agents are created, each in charge of a limited set of state variables
• Some sub-agents will be inherently equipped to solve a sub-problem
• Some sub-agents will not hold any useful information
• By incorporating all possible combinations, we minimise the amount of designer intervention
![Page 14: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/14.jpg)
Examples of Sub-agents
![Page 15: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/15.jpg)
Choosing between sub-agents
• If the sub-agent which predicts the highest possible reward for a given state is obeyed, the best action should get chosen
• The problem with this is that some sub-agents which do not hold any useful information might falsely predict a high reward
• Reliability of sub-agents also needs to be taken into account
• This is achieved by keeping track of the variance of predicted rewards
• High Variance = Unreliable Prediction• Low Variance = Reliable Prediction
![Page 16: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/16.jpg)
Results
![Page 17: Using Hierarchical Reinforcement Learning to Balance Conflicting Sub- problems By: Stephen Robertson Supervisor: Phil Sterne.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649f115503460f94c24210/html5/thumbnails/17.jpg)
Questions ?