AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch...
Transcript of AlphaGo andMonteCarloTreeSearch - University of Washington5/30/17 1 AlphaGo andMonteCarloTreeSearch...
5/30/17
1
AlphaGo and Monte Carlo Tree Search
Machine Learning for Big Data CSE547/STAT548, University of Washington
Sham Kakade
1©Sham Kakade 2017
A feat previously thought to be (a decade?) away!!!
9/28/16
12
2011
25
http://www.youtube.com/watch?v=WFR3lOm_xhE
2016
26
AlphaGo deep RL defeats Lee Sedol (4-1)
5/30/17
2
Chess vs. Alpha Go
1997, AI named “Deep Blue” beat chess world champion.
Search space: 𝑏": 𝑏 = 35, 𝑑 = 80Search space: 𝑏": 𝑏 = 250, 𝑑 = 150
Key Points
• Given current game state, – how to decide next action “a”?– How to evaluate each legal action?
5/30/17
3
Monte Carlo Tree Search (MCTS)• A popular heuristic search algorithm for game play– By lots of simulations and select the most visited action.
Evaluation
Node value: Wining rate
Key point: how to calculate this valueOne playout to simulate the game
Monte Carlo Tree Search (MCTS)• AlphaGo
– Key point: how to calculate the value for each action (path)
5/30/17
4
The visited number for this edge
The mean value of all simulations passing through it.
Winning rate
Winning rate is predicted by deep reinforcement learning andpredicted by fast rollout playout
Action probability predicted by supervised and reinforcement learning
V(s) is estimated in two different ways.
(Fast) Methods to evaluate V
30 million games
30 million self-‐play games
5/30/17
5
Thanks!
• ML for big data: – lots of different methods/challenges in the wild.– Participate in the ML community.
• Posters: – Looking forward to the poster session.
• Have a great summer!
• Some slides modified from: prlab.tudelft.nl/sites/default/files/deepmind_go_nature.pptx
Acknowledgments
©Sham Kakade 2017