Lecture21
-
Upload
albert-orriols-puig -
Category
Education
-
view
835 -
download
1
description
Transcript of Lecture21
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 21Lecture 21Reinforcement Learning
Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull
Recap of Lectures 5-18Supervised learningp g
Data classification
Labeled dataLabeled data
Build a model that covers all the space
Unsupervised learningClusteringClustering
Unlabeled data
G i il bj tGroup similar objects
Association rule analysis
Unlabeled data
Get the most frequent/important associations
Slide 2
Genetic Fuzzy SystemsArtificial Intelligence Machine Learning
Today’s Agenda
IntroductionReinforcement LearningSome examples before going fartherSome examples before going farther
Slide 3Artificial Intelligence Machine Learning
IntroductionWhat does reinforcement learning aim at?g
Learning from interaction (with environment)
Goal-directed learning
GOALState
EnvironmentEnvironmentAction
Agentagent
Learning what to do and its effect
Slide 4
Trial-and-error search and delayed reward
Artificial Intelligence Machine Learning
Introduction
Learn a reactive behaviors
Behaviors as a mapping between perceptions and actions
Th t h t l it h t it l d k i d tThe agent has to exploit what it already knows in order to obtain reward, but it also has to explore in order to make better action selections in the future.
Dilemma − neither exploitation nor exploration can be e a e t e e p o tat o o e p o at o ca bepursued exclusively without failing at the task.
Slide 5Artificial Intelligence Machine Learning
How Can We Learn It?1. Look-up tables 3. Rulesp 3. Rules
Perception ActionState 1 Action 1State 1 Action 1
State 2 Action 2
State 3 Action 3
Ne ral Net orks Fi it t t
… …
2. Neural Networks 4. Finite automata
Slide 6Artificial Intelligence Machine Learning
Reinforcement Learning
Slide 7Artificial Intelligence Machine Learning
Reinforcement LearningReward function
Agent
State Action:r S R→
Reward function
st atReward
rt :r S A R× →or
Environment
Agent and environment interact at discrete time steps t=0,1,2, …
The agentg
observes state at step t: st ε S
produces action a at step t: a ε A(s )produces action at at step t: at ε A(st)
gets resulting reward: rt+1 ε R
Slide 8Artificial Intelligence Machine Learning
goes to the next step st+1
Reinforcement LearningAgent
Statest
Actionat
Rewardr
Environment
rt
Environment
Trace of a trial
st st+1 st+2 st+3rt rt+1 rt+2 rt+3at at+1 at+2 at+3… …
Agent goal:
Maximize the total amount of reward t receives
Therefore, that means maximizing not only the immediate reward,
Slide 9Artificial Intelligence Machine Learning
Therefore, that means maximizing not only the immediate reward, but cumulative reward in the long run
Example of RLExample: Recycling robotExample: Recycling robot
State
charge level of battery
Actions
look for cans, wait for can, go recharge
R dReward
positive for finding cans, negative for running out of battery
Slide 10Artificial Intelligence Machine Learning
More precisely…Restricting to Markovian Decision Process (MDP)g ( )
Finite set of situations
Fi it t f tiFinite set of actions
Transition probabilities
Reward probabilities
This means thatThe agent needs to have complete information of the world
Slide 11
State st+1 only depends on state st and action at
Artificial Intelligence Machine Learning
Recycling Robot Example
1, waitR , searchβ R1 3,β− −
wait search
High Low
recharge1,0g
search waitsearch wait
searchR 1 search 1 waitR, searchα R ,1 searchα− R 1, waitR
Slide 12Artificial Intelligence Machine Learning
Recycling Robot Example{ , }=S high low{ , }g
( ) { , }=A wait seigh archh( ) { , , }=A wait search rechaow rgel
: expected # cans while searchingsearchR : expected # cans while : expected
searchingwait# cans while ingwait
search wait>
R
R
R R >R R
Slide 13Artificial Intelligence Machine Learning
Breaking the Markovian Property
Possible problems that do not satisfy MDPp yWhen action and states are not finite
Solution: Discretize the set of actions and statesSolution: Discretize the set of actions and states
When transition probabilities do not depend only on the current statestate
Possible solution: represent states as structures build up over time from sequences of sensationsqThis is POMDP Partial observable MDPUse POMDP algorithms to solve these problemsg
Slide 14Artificial Intelligence Machine Learning
Elements of Reinforcement Learning
Slide 15Artificial Intelligence Machine Learning
Elements of RL
Policy: what to do
Reward: what’s good
Value: What’s good because it predicts rewarda ue at s good because t p ed cts e a d
Model: What follows what
Slide 16Artificial Intelligence Machine Learning
Components of an RL AgentPolicy (behavior)
Mapping from states to actions
π*: S AS
RewardLocal reward in state t:
rt
ModelProbability of transition from state s to s’ by executing action aProbability of transition from state s to s by executing action a
T(s,a,s’)
AndThe transitions probabilities depend only on these parameters
Slide 17
This is not known by the agentArtificial Intelligence Machine Learning
Components of an RL AgentValue functions
Vπ(s): Long-term reward estimation from state s following policy π
Qπ(s,a): Long-term reward estimation from state s executing action a and then following policy πac o a a d e o o g po cy
A simple exampleAA maze
Note that the agent does not know its own position. It can only
Slide 18
ote t at t e age t does ot o ts o pos t o t ca o yperceive what it has in the surrounding states
Artificial Intelligence Machine Learning
Components of an RL AgentValue functions
Vπ(s): Long-term reward estimation from state s following policy π
Qπ(s,a): Long-term reward estimation from state s executing action a and then following policy πac o a a d e o o g po cy
A simple exampleAA maze
Note that the agent does not know its own position. It can only
Slide 19
ote t at t e age t does ot o ts o pos t o t ca o yperceive what it has in the surrounding states
Artificial Intelligence Machine Learning
Pursuing the goal: Maximize long term reward
Slide 20Artificial Intelligence Machine Learning
Goals and RewardsOk, but I need to maximize my long term reward. How I , y gget the long term reward?
Long term reward defined in terms of the goal of the agentLong term reward defined in terms of the goal of the agent
The agent receives the local reward at each time step
How?Intuitive idea: Sum all the rewards obtained so far
Problem: It can increase heavily in non-ending tasks
Slide 21Artificial Intelligence Machine Learning
Goals and RewardsHow can we deal with non-ending tasks?g
Weighted addition of local rewards
The γ parameter (0 < γ < 1) is the discounting factore γ pa a ete (0 γ ) s t e d scou t g acto
st st+1 st+2 st+3rt rt+1 rt+2 rt+3at at+1 at+2 at+3… …
Note the bias for immediate rewards
Slide 22
ote t e b as o ed ate e a dsIf you want to avoid it, set γ close to 1
Artificial Intelligence Machine Learning
Some examples
Slide 23Artificial Intelligence Machine Learning
Pole balancingBalance the polep
The car can move forward and backwarda d bac a d
Avoid failure: the pole falling beyondthe pole falling beyonda certain critical angle the car hitting the end of the trackg
RewardReward -1 upon failure-ak, for k steps before failurea , for k steps before failure
Slide 24Artificial Intelligence Machine Learning
Mountain Car ProblemObjectivej
Get to the top of the hill as quickly as possiblequ c y as poss b e
St t d fi itiState definition:Car position and speed
ActionsForward, reverse, none
Reward-1 for each step that are not the on the top of the hill
Slide 25
-1 for each step that are not the on the top of the hill-number of steps before reaching the top of the hill
Artificial Intelligence Machine Learning
Next Class
H t l th li iHow to learn the policies
Slide 26Artificial Intelligence Machine Learning
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 21Lecture 21Reinforcement Learning
Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull