Reinforcement Learning : A Beginners Tutorial

37
REINFORCEMENT LEARNING A Beginner’s Tutorial By: Omar Enayet (Presentation Version)

description

This a presentation of a Reinforcement Learning tutorial for beginners which I worked on.

Transcript of Reinforcement Learning : A Beginners Tutorial

Page 1: Reinforcement Learning : A Beginners Tutorial

REINFORCEMENT LEARNINGA Beginner’s Tutorial

By: Omar Enayet

(Presentation Version)

Page 2: Reinforcement Learning : A Beginners Tutorial

The Problem

Page 3: Reinforcement Learning : A Beginners Tutorial

Agent-Environment Interface

Page 4: Reinforcement Learning : A Beginners Tutorial

Environment Model

Page 5: Reinforcement Learning : A Beginners Tutorial

Goals & Rewards

Page 6: Reinforcement Learning : A Beginners Tutorial

Returns

Page 7: Reinforcement Learning : A Beginners Tutorial

Credit-Assignment Problem

Page 8: Reinforcement Learning : A Beginners Tutorial

Markov Decision Process

t

An MDP is defined by < S, A, p, r, >S - set of states of the environmentA(s) – set of actions possible in state s - probability of transition from s

- expected reward when executing a in s - discount rate for expected reward

Assumption: discrete time t = 0, 1, 2, . . .

sr

s sr

s. . .t a

t +1t +1

t +1a

rt +2

t +2t +2

at +3

t +3. . .

t +3a

rsa

pss'a

Page 9: Reinforcement Learning : A Beginners Tutorial

Value Functions

Page 10: Reinforcement Learning : A Beginners Tutorial

Value Functions

Page 11: Reinforcement Learning : A Beginners Tutorial

Value Functions

Page 12: Reinforcement Learning : A Beginners Tutorial

Optimal Value Functions

Page 13: Reinforcement Learning : A Beginners Tutorial

Exploration-Exploitation Problem

Page 14: Reinforcement Learning : A Beginners Tutorial

Policies

Page 15: Reinforcement Learning : A Beginners Tutorial

Elementary Solution Methods

Page 16: Reinforcement Learning : A Beginners Tutorial

Dynamic Programming

Page 17: Reinforcement Learning : A Beginners Tutorial

Perfect Model

Page 18: Reinforcement Learning : A Beginners Tutorial

Bootstrapping

Page 19: Reinforcement Learning : A Beginners Tutorial

Generalized Policy Iteration

Page 20: Reinforcement Learning : A Beginners Tutorial

Efficiency of DP

Page 21: Reinforcement Learning : A Beginners Tutorial

Monte-Carlo Methods

Page 22: Reinforcement Learning : A Beginners Tutorial

Episodic Return

Page 23: Reinforcement Learning : A Beginners Tutorial

Advantages over DP•No Model

•Simulation OR part of Model

•Focus on small subset of states

•Less Harmed by violations of Markov Property

Page 24: Reinforcement Learning : A Beginners Tutorial

First Visit VS Every-Visit

Page 25: Reinforcement Learning : A Beginners Tutorial

On-Policy VS Off-Policy

Page 26: Reinforcement Learning : A Beginners Tutorial

Action-value instead of State-value

Page 27: Reinforcement Learning : A Beginners Tutorial

Temporal-Difference Learning

Page 28: Reinforcement Learning : A Beginners Tutorial

Advantages of TD Learning

Page 29: Reinforcement Learning : A Beginners Tutorial

SARSA (On-Policy)

Page 30: Reinforcement Learning : A Beginners Tutorial

Q-Learning (Off-Policy)

Page 31: Reinforcement Learning : A Beginners Tutorial
Page 32: Reinforcement Learning : A Beginners Tutorial

Actor-Critic Methods(On-Policy)

Page 33: Reinforcement Learning : A Beginners Tutorial

R-Learning (Off-Policy)>>Average Expected reward per time-step

Page 34: Reinforcement Learning : A Beginners Tutorial

Eligibility Traces

Page 35: Reinforcement Learning : A Beginners Tutorial
Page 36: Reinforcement Learning : A Beginners Tutorial
Page 37: Reinforcement Learning : A Beginners Tutorial

REFERENCES

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.

Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003

SLIDES FOR READING WITH : Omar Enayet – Reinforcement Learning : A

Beginner’s Tutorial - 2009