Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000...

Game tree complexity = bd

Brute force search intractable:

1. Search space is huge

2. “Impossible” for computers to evaluate who is winning

Why is Go hard for computers to play?

Convolutional neural network

Value networkEvaluation

Position

Policy network Move probabilities

Position

p (a|s)

Neural network training pipeline

Human expertpositions

Supervised Learningpolicy network

Self-play data Value networkReinforcement Learningpolicy network

Policy network: 12 layer convolutional neural network

Training data: 30M positions from human expert games (KGS 5+ dan)

Training algorithm: maximise likelihood by stochastic gradient descent

Training time: 4 weeks on 50 GPUs using Google Cloud

Results: 57% accuracy on held out test data (state-of-the art was 44%)

Supervised learning of policy networks

Policy network: 12 layer convolutional neural network

Training data: games of self-play between policy network

Training algorithm: maximise wins z by policy gradient reinforcement learning

Training time: 1 week on 50 GPUs using Google Cloud

Results: 80% vs supervised learning. Raw network ~3 amateur dan.

Reinforcement learning of policy networks

Value network: 12 layer convolutional neural network

Training data: 30 million games of self-play

Training algorithm: minimise MSE by stochastic gradient descent

Training time: 1 week on 50 GPUs using Google Cloud

Results: First strong position evaluation function - previously thought impossible

Reinforcement learning of value networks

Exhaustive search

Reducing depth with value network

Reducing breadth with policy network

Evaluating AlphaGo against computers

Crazy Stone

AlphaGo (N

ature v13)

4500 AlphaGo (Seoul v18)

9p7p5p3p1p9d

Professional dan (p)

Amateur

dan (d)Beginner kyu (k)

AlphaGo (Mar 2016)

AlphaGo (Oct 2015)

Crazy Stone and Zen

Lee Sedol (9p)Top player of past decade

Fan Hui (2p)3-times reigning Euro Champion

Amateur humans

Computer Programs Calibration Human Players

Nature match

DeepMind challenge match

What’s Next?

Demis Hassabis

Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000...

Documents

Transcript of Game tree complexity - ICML · Evaluating AlphaGo against computers Crazy Stone Zen 3000 2500 2000...

Idigoras y Pachi autonomías

Research Introspection “ICML does ICML”

Manual Pren i Veli Ya Pachi 2

ICML 2014 CLUB - Online Clustering of Bandits Poster, 31st ICML, JMLR

AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜

Lubrication Certification Requirements (ICML MLA)strategicreliabilitysolutions.com/wp-content/uploads/...Lubrication Certification Requirements (ICML MLA) 4 ICML – MLA I Level I,

Tema1 biomoleculas inorganica pachi 2003

How AlphaGo Works

ATRL (ICML)

Power point trabajo de lengua pachi

Learning to Groove ICML

Cuaderno de pachi

AlphaGo vs AlphaGo€¦ · · 2016-09-12AlphaGo vs AlphaGo Game 3: “Freedom” ... strategy fashioned over time through many selfplay games. ... compensates for Black's losses

DeepBlue, AlphaGo, and AI?

Go in numbers€¦ · Evaluating Seoul AlphaGo against computers Crazy Stone Zen 3000 2500 2000 1500 1000 500 0 3500 AlphaGo (Nature v13) Pachi Fuego Gnu Go 4000 4500 AlphaGo (Seoul

Find Pachi Video Proposal

AlphaGo in Depth

Pachi martinez-presentacion-spam

AlphaGo Paper

AlphaGo Zero 解説