Stochastic Planning in Games: an AlphaGo Case-studyTommaso Macrì | 23.03.20 | 18 AlphaGo: the 4...
Transcript of Stochastic Planning in Games: an AlphaGo Case-studyTommaso Macrì | 23.03.20 | 18 AlphaGo: the 4...
||
Tommaso Macrì
23.03.20 1
Stochastic Planning in Games: an AlphaGo Case-study
Tommaso Macrì
|| 23.03.20Tommaso Macrì 2
AlphaGo beats 18-time world champion Lee Sedol 4 games to 1
|| 23.03.20Tommaso Macrì 3
The Game of Go
|| 23.03.20Tommaso Macrì 4
The Game of Go, RulesAim of the game is to surround more territory than the opponent
|| 23.03.20Tommaso Macrì 5
Comparison with Chess
• ~250 legal moves per position• ~150 moves per game
• ~35 legal moves per position• ~80 moves per game
|| 23.03.20Tommaso Macrì 6
Artificial Intelligence perspective
Go game has challenged artificial intelligence researchers for many decades
|| 23.03.20Tommaso Macrì 7
Mastering the Game of Go: a Major Breakthrough for AI
2016
|| 23.03.20Tommaso Macrì 8
Planning and RL
|| 23.03.20Tommaso Macrì 9
Planning vs Learning
In Learning we use Experience Generated by the Environment (not simulated)
In Planning, we use the simulated experience to update the value function and policy
|| 23.03.20Tommaso Macrì 10
Planning vs Learning: the Dyna-Q example
Direct RL
Planning
Reinforcement Learning, Sutton et al.
|| 23.03.20Tommaso Macrì 11
Planning vs Learning: pros and cons
Direct vs Undirect Learning
Reinforcement Learning, Sutton et al.
|| 23.03.20Tommaso Macrì 12
MCTS(Monte Carlo Tree Search)
|| 23.03.20Tommaso Macrì 13
Monte Carlo Tree Search
Reinforcement Learning, Sutton et al.
|| 23.03.20Tommaso Macrì 14
Monte Carlo Tree Search
https://en.wikipedia.org/wiki/Monte_Carlo_tree_search
|| 23.03.20Tommaso Macrì 15
The AlphaGo Breakthrough
|| 23.03.20Tommaso Macrì 16
AlphaGo
2016
|| 23.03.20Tommaso Macrì 17
AlphaGo
• Go is a perfect information game.• Compute optimal value function?
• Tree search = b^d• b = 250; d = 150
• Exhaustive search is infeasible
|| 23.03.20Tommaso Macrì 18
AlphaGo: the 4 Neural Networks
Forward Propagation: 2µs 3ms
Accuracy: 57%24%
Mastering the game of go with deep neural networks and tree search, David Silver et al.
|| 23.03.20Tommaso Macrì 19
AlphaGo: multiple outputs for the policy and single for the value
Mastering the game of go with deep neural networks and tree search, David Silver et al.
|| 23.03.20Tommaso Macrì 20
AlphaGo: combining Policy Network and Value Network with MCTS
Mastering the game of go with deep neural networks and tree search, David Silver et al.
|| 23.03.20Tommaso Macrì 21
AlphaGo Zero
2017
|| 23.03.20Tommaso Macrì 22
AlphaGo Zero: Only one Neural Network for Policy and Value
|| 23.03.20Tommaso Macrì 23
AlphaGo Zero: Using MCTS to select moves throughout self-play
Mastering the game of go without human knowledge, David Silver et al.
|| 23.03.20Tommaso Macrì 24
Alpha Zero2018
|| 23.03.20Tommaso Macrì 25
Alpha Zero: Learning and MCTS do not assume symmetry Alpha Go and Alpha Go Zero assumed symmetry for:• Training data augmentation• Bias removal in Monte Carlo evaluations
Alpha Zero considers also the “Drawn” outcome
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, David Silver et al.
|| 23.03.20Tommaso Macrì 26
Alpha Zero: The Algorithm ArchitectureAlpha Go Zero, Alpha Zero: • one Neural Network (Policy + Value)• Monte Carlo Tree Search
Alpha Go Zero• Wait for an iteration to conclude to update NN• Compare the new policy to the best
Alpha Zero• Update the NN continuously• Always generate Self-Play with the latest NN
|| 23.03.20Tommaso Macrì 27
Results: AlphaGo beats the European Go Champion
Mastering the game of go with deep neural networks and tree search, David Silver et al.
|| 23.03.20Tommaso Macrì 28
AlphaGo Zero: better performances than AlphaGo
Mastering the game of go without human knowledge, David Silver et al.
|| 23.03.20Tommaso Macrì 29
AlphaGo Zero: Learns human expert moves and beyond
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, David Silver et al.
|| 23.03.20Tommaso Macrì 30
Alpha Zero: program applied to Chess and Shogi
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, David Silver et al.
|| 23.03.20Tommaso Macrì 31
Conclusions
https://deepmind.com/blog/article/alphazero-shedding-new-light-grand-games-chess-shogi-and-go
Stochastic Planning in Games: an AlphaGo Case-studyTommaso Macrì