raining windy cloudy snowing raining windy cloudysunny snowing.
Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic...
Transcript of Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic...
![Page 1: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/1.jpg)
Lecture 5: Windy Frozen LakeNondeterministic world!
Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>
![Page 2: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/2.jpg)
S
Windy Frozen Lake
![Page 3: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/3.jpg)
Deterministic VS Stochastic (nondeterministic)
• In deterministic models the output of the model is fully determined by the parameter values and the initial conditions initial conditions
• Stochastic models possess some inherent randomness. - The same set of parameter values and initial conditions will lead to an
ensemble of different outputs.
![Page 4: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/4.jpg)
Deterministic
![Page 5: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/5.jpg)
Stochastic (non-deterministic)
![Page 6: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/6.jpg)
Stochastic (non-deterministic) worlds
• Unfortunately, our Q-learning (for deterministic worlds) does not work anymore
• Why not?
![Page 7: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/7.jpg)
Our previous Q-learning does not work
Score over time: 0.0165
![Page 8: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/8.jpg)
Why does not work in stochastic (non-deterministic) worlds?
a
s
![Page 9: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/9.jpg)
Stochastic (non-deterministic) world
• Solution?- Listen to Q (s`) (just a little bit)
- Update Q(s) little bit (learning rate)
• Like our life mentors- Don’t just listen and follow one mentor
- Need to listen from many mentors
![Page 10: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/10.jpg)
http://m.kauppalehti.fi/uutiset/your-career-needs-many-mentors--not-just-one/gp3Q4rTp
![Page 11: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/11.jpg)
Stochastic (non-deterministic) world
a
s
![Page 12: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/12.jpg)
Learning incrementally
• Learning rate, -
Q(s, a) r + �max
a0Q(s0, a0)
↵ = 0.1↵
Q(s, a) Q(s, a) + [r + �max
a0Q(s0, a0)]
![Page 13: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/13.jpg)
Learning with learning rate
Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max
a0Q(s0, a0)]
Q(s, a) r + �max
a0Q(s0, a0)
![Page 14: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/14.jpg)
Learning with learning rate
Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max
a0Q(s0, a0)]
Q(s, a) Q(s, a) + ↵[r + �max
a0Q(s0, a0)�Q(s, a)]
Q(s, a) r + �max
a0Q(s0, a0)
![Page 15: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/15.jpg)
Q-learning algorithm
Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max
a0Q(s0, a0)]
![Page 16: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/16.jpg)
Convergence
Machine Learning, Tom Mitchell, 1997
ˆQ(s, a) (1� ↵) ˆQ(s, a) + ↵[r + �max
a0ˆQ(s0, a0)]
![Page 17: Lecture 5: Windy Frozen Lake Nondeterministic world! · 2017-10-02 · Deterministic VS Stochastic (nondeterministic) •In deterministic models the output of the model is fully determined](https://reader034.fdocuments.net/reader034/viewer/2022042918/5f5f56057b1c9255ab36ea3e/html5/thumbnails/17.jpg)
Next
Lab: Stochastic worlds