Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and...
Transcript of Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and...
![Page 1: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/1.jpg)
Advanced Prediction Models
Deep Learning, Graphical Models and Reinforcement Learning
![Page 2: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/2.jpg)
Today’s Outline
• Complex Decisions
• Reinforcement Learning Basics
• Markov Decision Process
• (State Action) Value Function
• Q Learning Algorithm
2
![Page 3: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/3.jpg)
Complex Decisions
3
![Page 4: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/4.jpg)
Complex Decisions Making is Everywhere
Optimal Control/Engineering
Machine Learning/AI
Neuroscience/Psychology Economics/Operations Research
RL
Control
• Fly drones• Autonomous driving
Operations
• Retain customers, UX• Inventory management
Logistics
• Schedule transportation• Resource allocation
Games
• Chess, Go, Atari
![Page 5: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/5.jpg)
Complex Decisions Making is Everywhere
Credit: Sebastien Bubeck
Control
• Fly drones
• Autonomous driving
Operations
• Retain customers, UX
• Inventory management
Logistics
• Schedule transportation
• Resource allocation
Games
• Chess, Go, Atari
![Page 6: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/6.jpg)
Complex Decision Making can be addressed using RL
61Reference: technologyreview.com/s/603501/10-breakthrough-technologies-2017-reinforcement-learning/
March/April 2017 Issue
![Page 7: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/7.jpg)
Playing Atari Using RL (2013)
1Figure: Defazio Graepel, Atari Learning Environment
![Page 8: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/8.jpg)
1Reference: DeepMind, March 2016
AlphaGo Conquers Go (2016)
![Page 9: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/9.jpg)
• Videos
9
![Page 10: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/10.jpg)
Need for Reinforcement Learning
101Reference: https://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-1-5-contextual-bandits-bff01d1aad9c
Non-exogenous change of states/contexts
![Page 11: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/11.jpg)
Questions?
11
![Page 12: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/12.jpg)
Today’s Outline
• Complex Decisions
• Reinforcement Learning Basics
• Markov Decision Process
• (State Action) Value Function
• Q Learning Algorithm
12
![Page 13: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/13.jpg)
RL Overview
• Reinforcement Learning (RL) addresses a version of the problem of sequential decision making
• Ingredients:
• There is an environment
• Within which, an agent takes actions
• This action influences the future
• Agent gets a (potentially delayed) feedback signal
• How to select actions to maximize total reward?
• RL provides several sound answers to this question
![Page 14: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/14.jpg)
The Environment
• Sees Agent’s action !" and generates an observation #"$% and a reward &"$%
• Subscript ' indexes time. Current observation #" is called state
• Assume the future (at times ' + 1, ' + 2,… .) is independent of the past (… , ' − 2, ' − 1) given the present ('): this is called the Markov assumption
• Assume everything relevant is observed
1 #"$% #" = 1(#"$%|#%, #4, … #")
![Page 15: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/15.jpg)
The Agent
• Agent observes !"#$, &"#$ and these are not i.i.d. across time
• Agent’s objective is to maximize expected total future reward E[!"#$ + *!"#+ +⋯]
• Agent’s actions affect what it sees in the future (&"#$)
• Maybe better to trade off current reward !"#$ to gain more rewards in the future
*: Discount Factor
![Page 16: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/16.jpg)
The Reward
1Reference: David Silver, 2015
![Page 17: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/17.jpg)
The Goal
1Reference: David Silver, 2015
![Page 18: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/18.jpg)
The Interactions
• Pictorially
Agent Environment
!", $"
%"
Agent Environment
!"&', $"&'
%"&'
Agent Environment
!"&(, $"&(
%"&(
![Page 19: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/19.jpg)
The Interactions
• Pictorially
Agent Environment
!", $"
%"
Agent Environment
!"&', $"&'
%"&'
Agent Environment
!"&(, $"&(
%"&(
![Page 20: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/20.jpg)
The Interactions
• Pictorially
Agent Environment
!", $"
%"
Agent Environment
!"&', $"&'
%"&'
Agent Environment
!"&(, $"&(
%"&(
![Page 21: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/21.jpg)
RL versus other Machine Learning Settings
21
1Reference: David Silver, 2015
![Page 22: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/22.jpg)
RL versus other Machine Learning Settings
22
1Reference: Joelle Pineau, DLSS 2016
![Page 23: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/23.jpg)
Components of an RL Agent
23
1Reference: David Silver, 2015
![Page 24: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/24.jpg)
Components of RL: Policy
24
1Reference: David Silver, 2015
![Page 25: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/25.jpg)
Components of RL: Policy
251Reference: David Silver, 2015
![Page 26: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/26.jpg)
Components of RL: Policy
261Reference: David Silver, 2015
![Page 27: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/27.jpg)
Components of RL: Value Function
27
1Reference: David Silver, 2015
![Page 28: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/28.jpg)
Components of RL: Value Function
28
1Reference: David Silver, 2015
![Page 29: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/29.jpg)
Components of RL: Model
291Reference: David Silver, 2015
![Page 30: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/30.jpg)
Components of RL: Model
30
1Reference: David Silver, 2015
![Page 31: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/31.jpg)
Questions?
31
![Page 32: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/32.jpg)
Today’s Outline
• Complex Decisions
• Reinforcement Learning Basics
• Markov Decision Process
• (State Action) Value Function
• Q Learning Algorithm
32
![Page 33: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/33.jpg)
Components of RL: MDP Framework
33
• We will now revisit these components formally
• Policy !(#|%)
• Value function '((%)
• Model )**+, and ℛ*
,
• In the framework of Markov Decision Processes
• And then we will address the question of optimizingfor the best ! in realistic environments
![Page 34: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/34.jpg)
Towards a Markov Decision Process
• MDPs are a useful way to describe the RL problem
• MDPs can be understood via the following progression
• Start with a Markov Chain
• State transitions happen autonomously
• Add Rewards
• Becomes a Markov Reward Process
• Add Actions that influences state transitions
• Becomes a Markov Decision Process
1Reference: David Silver, 2015
![Page 35: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/35.jpg)
Markov Chain/Process
1Reference: David Silver, 2015
![Page 36: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/36.jpg)
Example Markov Chain
1Reference: David Silver, 2015
![Page 37: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/37.jpg)
Example Markov Chain
1Reference: David Silver, 2015
![Page 38: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/38.jpg)
Example Markov Chain
1Reference: David Silver, 2015
![Page 39: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/39.jpg)
Markov Chain with Rewards
1Reference: David Silver, 2015
![Page 40: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/40.jpg)
Markov Chain with Rewards
1Reference: David Silver, 2015
![Page 41: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/41.jpg)
Markov Chain with Rewards
1Reference: David Silver, 2015
![Page 42: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/42.jpg)
Example Markov Reward Process
1Reference: David Silver, 2015
![Page 43: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/43.jpg)
Recursions in Markov Reward Process
1Reference: David Silver, 2015
![Page 44: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/44.jpg)
Recursions in Markov Reward Process
1Reference: David Silver, 2015
![Page 45: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/45.jpg)
1Reference: David Silver, 2015
Markov Decision Process
![Page 46: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/46.jpg)
Example Markov Decision Process
1Reference: David Silver, 2015
![Page 47: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/47.jpg)
Markov Decision Process: Policy
• Now that we have introduced actions, we can discuss policies again
• Recall
1Reference: David Silver, 2015
![Page 48: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/48.jpg)
MDP is an MRP for a Fixed Policy
1Reference: David Silver, 2015
![Page 49: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/49.jpg)
MDP is an MRP for a Fixed Policy
1Reference: David Silver, 2015
![Page 50: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/50.jpg)
Markov Decision Process: Value Function• We can also talk about the value function(s)
1Reference: David Silver, 2015
![Page 51: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/51.jpg)
Markov Decision Process: Value Function• We can also talk about the value function(s)
1Reference: David Silver, 2015
![Page 52: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/52.jpg)
Recursions in MDP
1Reference: David Silver, 2015
*Also called the Bellman Expectation Equations
![Page 53: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/53.jpg)
Recursions in MDP
1Reference: David Silver, 2015
*Also called the Bellman Expectation Equations
![Page 54: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/54.jpg)
Markov Decision Process: Objective
1Reference: David Silver, 2015
![Page 55: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/55.jpg)
Markov Decision Process: Objective
1Reference: David Silver, 2015
*Also called the Bellman Optimality Equation
![Page 56: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/56.jpg)
Markov Decision Process: Optimal Policy
1Reference: David Silver, 2015
![Page 57: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/57.jpg)
Questions?
57
![Page 58: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/58.jpg)
Today’s Outline
• Complex Decisions
• Reinforcement Learning Basics
• Markov Decision Process
• (State Action) Value Function
• Q Learning Algorithm
58
![Page 59: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/59.jpg)
Finding the Best Policy
• Need to be able to do two things ideally
• Prediction:
• For a given policy, evaluate how good it is
• Compute !"($, &)
• Control:
• And make an improvement from (
• We will focus on the Q Learning algorithm
• It does prediction and control ‘simultaneously’
1Reference: David Silver, 2015
![Page 60: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/60.jpg)
Intuition for an Iterative Algorithm
1Reference: David Silver, 2015
![Page 61: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/61.jpg)
Intuition for an Iterative Algorithm
1Reference: David Silver, 2015
![Page 62: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/62.jpg)
The Q Learning Algorithm
• If we know the model
• Turn the Bellman Optimality Equation into an iterative update
• This is called Value Iteration
1Reference: David Silver, 2015
![Page 63: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/63.jpg)
The Q Learning Algorithm
• If we do not know the model
• Do sampling to get an incremental iterative update
• Choose next actions to ensure exploration
1Reference: David Silver, 2015
![Page 64: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/64.jpg)
The Q Learning Algorithm
• If we do not know the model
• Do sampling to get an incremental iterative update
• Choose next actions to ensure exploration
1Reference: David Silver, 2015
![Page 65: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/65.jpg)
The Q Learning Algorithm
• If we do not know the model
• Do sampling to get an incremental iterative update
• Choose next actions to ensure exploration
1Reference: David Silver, 2015
![Page 66: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/66.jpg)
The Q Learning Algorithm
• If we do not know the model
• Do sampling to get an incremental iterative update
• Choose next actions to ensure exploration
1Reference: David Silver, 2015
![Page 67: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/67.jpg)
The Q Learning Algorithm
• Initialize !, which is a table of size #states×#actions
• Start at state #$
• For % = 1,2,3, … .
• Take -. chosen uniformly at random with probability /
• Take argmax5∈7 !(9., :) with probability 1 − /
• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax
5∈7! 9.@$, : − !(9., -.))
• Parameter / is the exploration parameter
• Parameter >. is the learning rate
• Under appropriate assumptions1, lim.→E
! = !∗
Temporal difference error
1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992
Explore
Exploit
![Page 68: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/68.jpg)
The Q Learning Algorithm
• Initialize !, which is a table of size #states×#actions
• Start at state #$
• For % = 1,2,3, … .
• Take -. chosen uniformly at random with probability /
• Take argmax5∈7 !(9., :) with probability 1 − /
• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax
5∈7! 9.@$, : − !(9., -.))
• Parameter / is the exploration parameter
• Parameter >. is the learning rate
• Under appropriate assumptions1, lim.→E
! = !∗
Temporal difference error
1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992
Explore
Exploit
![Page 69: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/69.jpg)
The Q Learning Algorithm
• Initialize !, which is a table of size #states×#actions
• Start at state #$
• For % = 1,2,3, … .
• Take -. chosen uniformly at random with probability /
• Take argmax5∈7 !(9., :) with probability 1 − /
• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax
5∈7! 9.@$, : − !(9., -.))
• Parameter / is the exploration parameter
• Parameter >. is the learning rate
• Under appropriate assumptions1, lim.→E
! = !∗
Temporal difference error
1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992
Explore
Exploit
![Page 70: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/70.jpg)
The Q Learning Algorithm
• Initialize !, which is a table of size #states×#actions
• Start at state #$
• For % = 1,2,3, … .
• Take -. chosen uniformly at random with probability /
• Take argmax5∈7 !(9., :) with probability 1 − /
• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax
5∈7! 9.@$, : − !(9., -.))
• Parameter / is the exploration parameter
• Parameter >. is the learning rate
• Under appropriate assumptions1, lim.→E
! = !∗
Temporal difference error
1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992
Explore
Exploit
![Page 71: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/71.jpg)
The Q Learning Algorithm
• Initialize !, which is a table of size #states×#actions
• Start at state #$
• For % = 1,2,3, … .
• Take -. chosen uniformly at random with probability /
• Take argmax5∈7 !(9., :) with probability 1 − /
• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax
5∈7! 9.@$, : − !(9., -.))
• Parameter / is the exploration parameter
• Parameter >. is the learning rate
• Under appropriate assumptions1, lim.→E
! = !∗
Temporal difference error
1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992
Explore
Exploit
![Page 72: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/72.jpg)
The Q Learning Algorithm
• Initialize !, which is a table of size #states×#actions
• Start at state #$
• For % = 1,2,3, … .
• Take -. chosen uniformly at random with probability /
• Take argmax5∈7 !(9., :) with probability 1 − /
• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax
5∈7! 9.@$, : − !(9., -.))
• Parameter / is the exploration parameter
• Parameter >. is the learning rate
• Under appropriate assumptions1, lim.→E
! = !∗
Temporal difference error
1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992
Explore
Exploit
![Page 73: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/73.jpg)
The Q Learning Algorithm: Recap
• Bellman Optimality Equation gives rise to the Q-Value Iteration algorithm
• Making this algorithm incremental, sampled and adding !-greedy exploration gives Q Learning Algorithm
73
1Reference: David Silver, 2015
![Page 74: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/74.jpg)
Questions?
74
![Page 75: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/75.jpg)
Summary
• RL is a great framework to make agents intelligent
• Specify goals and provide feedback
• Many challenges still remain: exciting opportunity to contribute towards next generation of artificially intelligent and autonomous agents.
• In the next lecture, we will see that deep learning function approximation based RL agents show promise in large complex tasks: representations matter!• Applications such as • Self-driving cars• Intelligent virtual agents
75
![Page 76: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/76.jpg)
Appendix
76
![Page 77: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/77.jpg)
Sample Exam Questions
• What is the difference between a Markov Chain and a Markov Reward Process?
• What is the difference between a Markov Chain and a Markov Decision Process?
• Why is exploration needed in the reinforcement learning setting?
• What does the optimal state-action value function signify?
• What are the two objects (distributions) of an RL model?
• What is the difference between supervised learning and reinforcement learning?
77
![Page 78: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/78.jpg)
Additional Resources
• An Introduction to Reinforcement Learning by Richard Sutton and Andrew Barto• http://incompleteideas.net/sutton/book/the-book.html
• Course on Reinforcement Learning by David Silver at UCL (includes video lectures)• http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
• Research Papers• Deep RL collection: https://github.com/junhyukoh/deep-
reinforcement-learning-papers• [MKSRVBGRFOPBSAKKWLH2015] Mnih et al. Human-level
control through deep reinforcement learning. Nature, 518:529–533, 2015.
• [SHMGSDSAPLDGNKSLLKGH2016] Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529: 484–489, 2016.
78
![Page 79: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/79.jpg)
Cons of RL
• Reinforcement Learning requires experiencing the environment many many times
• This is because it is a trial and error based approach
• Impractical for many complex tasks
• Unless one has access to simulators where an RL agent can practice a billon times
79
![Page 80: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/80.jpg)
RL versus other Machine Learning Settings
80
• There is a notion of exploration and exploitation, similar to Multi-armed bandits and Contextual bandits
• Key difference: actions influence future contexts
1Reference: David Silver, 2015
![Page 81: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/81.jpg)
RL versus other Sequential Decision Making Settings
81
1Reference: David Silver, 2015
![Page 82: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/82.jpg)
Types of RL Agents
82
1Reference: David Silver, 2015
• There are many ways to design them, so we roughly categorize then as below:
![Page 83: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/83.jpg)
Relating the Two Value Functions I
1Reference: David Silver, 2015
![Page 84: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/84.jpg)
Relating the Two Value Functions II
1Reference: David Silver, 2015
![Page 85: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/85.jpg)
Recursion in MDP: Value Function Version
1Reference: David Silver, 2015
![Page 86: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning](https://reader036.fdocuments.net/reader036/viewer/2022063012/5fc99b073f3db94e3d282d17/html5/thumbnails/86.jpg)
Relating Policy and Value Function
1Reference: David Silver, 2015