Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000...

16

Transcript of Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000...

Page 1: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul
Page 2: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Game tree complexity = bd

Brute force search intractable:

1. Search space is huge

2. “Impossible” for computers to evaluate who is winning

Why is Go hard for computers to play?

Page 3: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Convolutional neural network

Page 4: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Value networkEvaluation

Position

s

v (s)

Page 5: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Policy network Move probabilities

Position

s

p (a|s)

Page 6: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Neural network training pipeline

Human expertpositions

Supervised Learningpolicy network

Self-play data Value networkReinforcement Learningpolicy network

Page 7: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Policy network: 12 layer convolutional neural network

Training data: 30M positions from human expert games (KGS 5+ dan)

Training algorithm: maximise likelihood by stochastic gradient descent

Training time: 4 weeks on 50 GPUs using Google Cloud

Results: 57% accuracy on held out test data (state-of-the art was 44%)

Supervised learning of policy networks

Page 8: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Policy network: 12 layer convolutional neural network

Training data: games of self-play between policy network

Training algorithm: maximise wins z by policy gradient reinforcement learning

Training time: 1 week on 50 GPUs using Google Cloud

Results: 80% vs supervised learning. Raw network ~3 amateur dan.

Reinforcement learning of policy networks

Page 9: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Value network: 12 layer convolutional neural network

Training data: 30 million games of self-play

Training algorithm: minimise MSE by stochastic gradient descent

Training time: 1 week on 50 GPUs using Google Cloud

Results: First strong position evaluation function - previously thought impossible

Reinforcement learning of value networks

Page 10: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Exhaustive search

Page 11: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Reducing depth with value network

Page 12: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Reducing breadth with policy network

Page 13: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Evaluating AlphaGo against computers

Zen

Crazy Stone

3000

2500

2000

1500

1000

500

0

3500

AlphaGo (N

ature v13)

Pachi

Fuego

Gnu

Go

4000

4500 AlphaGo (Seoul v18)

9p7p5p3p1p9d

7d

5d

3d

1d1k

3k

5k

7k

Professional dan (p)

Amateur

dan (d)Beginner kyu (k)

Page 14: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

AlphaGo (Mar 2016)

AlphaGo (Oct 2015)

Crazy Stone and Zen

Lee Sedol (9p)Top player of past decade

Fan Hui (2p)3-times reigning Euro Champion

Amateur humans

Computer Programs Calibration Human Players

Beats

Beats

Nature match

KGS

DeepMind challenge match

Beats

Beats

5-0

4-1

Page 15: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

What’s Next?

Page 16: Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Demis Hassabis