Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic...

17
Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Transcript of Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic...

Page 1: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Lecture 5: Windy Frozen LakeNondeterministic world!

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

S

Windy Frozen Lake

Page 3: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Deterministic VS Stochastic (nondeterministic)

• In deterministic models the output of the model is fully determined by the parameter values and the initial conditions initial conditions

• Stochastic models possess some inherent randomness. - The same set of parameter values and initial conditions will lead to an

ensemble of different outputs.

Page 4: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Deterministic

Page 5: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Stochastic (non-deterministic)

Page 6: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Stochastic (non-deterministic) worlds

• Unfortunately, our Q-learning (for deterministic worlds) does not work anymore

• Why not?

Page 7: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Our previous Q-learning does not work

Score over time: 0.0165

Page 8: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Why does not work in stochastic (non-deterministic) worlds?

a

s

Page 9: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Stochastic (non-deterministic) world

• Solution?- Listen to Q (s`) (just a little bit)

- Update Q(s) little bit (learning rate)

• Like our life mentors- Don’t just listen and follow one mentor

- Need to listen from many mentors

Page 10: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

http://m.kauppalehti.fi/uutiset/your-career-needs-many-mentors--not-just-one/gp3Q4rTp

Page 11: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Stochastic (non-deterministic) world

a

s

Page 12: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Learning incrementally

• Learning rate, -

Q(s, a) r + �max

a0Q(s0, a0)

↵ = 0.1↵

Q(s, a) Q(s, a) + [r + �max

a0Q(s0, a0)]

Page 13: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Learning with learning rate

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Q(s, a) r + �max

a0Q(s0, a0)

Page 14: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Learning with learning rate

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Q(s, a) Q(s, a) + ↵[r + �max

a0Q(s0, a0)�Q(s, a)]

Q(s, a) r + �max

a0Q(s0, a0)

Page 15: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Q-learning algorithm

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Page 16: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Convergence

Machine Learning, Tom Mitchell, 1997

ˆQ(s, a) (1� ↵) ˆQ(s, a) + ↵[r + �max

a0ˆQ(s0, a0)]

Page 17: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined

Next

Lab: Stochastic worlds