AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D.

AI techniques for the game of Go

Erik van der Werf

Universiteit Maastricht / ReSound Algorithm R&D

+9 +9 +3 +9 +9 +9 +9 -9 -3 +9 -1 +9

(+9) (-9)(+3)

(+9)

+9 +9 +3 +9 +9 +9 +9 -9 -3 +9 -1 +9

(+9) (-9)(+3)

(+9)

Contents

Introduction Searching techniques

The Capture Game Solving Go on Small Boards

Learning techniques Move Prediction Learning to Score Predicting Life & Death Estimating Potential Territory

Summary of results Conclusions

The game of Go

Deceivingly simple rules Black and White move in turns A move places a stone on the board Surrounded stones are captured Direct repetition is forbidden (Ko-rule) The game is over when both players pass The player controlling most intersections

wins

Some basic terminology

Block - connected stones of one colour (no diagonal connections)

Liberty

- adjacent empty intersection

Eye - surrounded region providing a safe liberty

Group - stones of one colour controlling a local region

Alive - group that cannot be captured

Dead - group that can eventually be captured

Computer Go

Even the best Go programs have no chance against strong amateurs

Human players superior in area’s such as pattern recognition spatial reasoning Learning

Computer programs

20 15 10 5 4 3 2 1

1 2 3 4 5 6 7 8 9

Kyu (student)

Dan

Level: weak strong

Master Professional

Handicap stones

Computer programs

20 15 10 5 4 3 2 1

1 2 3 4 5 6 7 8 9

Kyu (student)

Dan

Level: weak strong

Master Professional

Handicap stones

Playing strength

29 stones handicap

Problem statement

How can Artificial Intelligence techniques be used to improve the strength of Go

programs?

We focused on

Searching techniques & Learning techniques

Searching techniques

Very successful for other board games Evaluate positions by ‘thinking ahead’ Research

Recognizing positions ‘that are irrelevant’ Fast heuristic evaluations Provably correct knowledge Move ordering (the best moves first) Re-use of partial results from the search

process

The Capture Game

Simplified version of Go First to capture a stone wins the game Passing not allowed

Detecting final positions trivial (unlike normal Go)

Search method Iterative Deepening Principal Variation Search Enhanced transposition table Move ordering using shared tables for both

colours for killer and history heuristic

Heuristic evaluation for the capture game

Based on four principles:

1. Maximize liberties2. Maximize territory

3. Connect stones4. Make eyes

Low order liberties (max. distance 3)

Euler number (objects – holes)

Fast computation using a bit-board representation

Solutions for the Capture Game

All boards up to 5x5 were solved Winner decided by board-size parity

Will initiative take over at 6 x 6?

Board Winner

Depth Time (s)

Nodes (log10)

2 2 W 4 0 1.8

3 3 B 7 0 3.2

4 4 W 14 1 5.7

5 5 B 19 395 8.4

6 6 ? >23 >106 >12

Solution for 55 (Black wins)

Solution for 44 (White wins)

Solutions for the Capture Game on 6x6

Starting position Stable Crosscut

Winner Black Black

Depth 26 (+5) 15 (+4)

Nodes (log10) 11 8.0

Time (s) 8.3 105 (10 days) 185

Initiative takes over at 6 6

Solving Go on Small Boards

Iterative Deepening Principal Variation Search Enhanced transposition table Exploit board symmetry Internal unconditional bounds Effective move ordering

Evaluation function Heuristic component

Similar to the capture game Provably correct component

Benson’s algorithm for recognizing unconditional life extended with detection of unconditional territory

Recognizing Unconditional Territory

1. Find regions surrounded by unconditionally alive stones of one colour

2. Find interior of the regions (eyespace)3. Remove false eyes4. Contract eyespace around defender stones5. Count maximum sure liberties (MSL)

MSL<2 Unconditionally territory.Otherwise Play it out.

Solutions for Small Boards

Board Result Depth Time (s) Nodes (log10)

2 2 draw 5 n.a. 2.1

3 3 B+9 11 n.a. 3.5

4 4 B+2 21 3.3 (s) 5.8

5 5 B+25 23 2.7 (h) 9.2

Value of opening moves on 5x5

(3,2)(2,2) (3,3)

Learning techniques

Successful in several related domains Heuristic knowledge can be ‘learned’ from

analysis of human games Research

Representation & Generalization Learn maximally from limited number of

examples Pros and cons of different architectures Clever use of available domain knowledge

Move prediction

Many moves in Go conform to local patterns which can be played almost reflexively

Train a MLP network to rank moves Use move-pairs {expert , random} extracted from

human game records Training attempts to rank expert moves first

otherwise

vvvvError

erer

0

)( 2

expert move

ev

random move

rv

otherwise

vvvvError

erer

0

)( 2

expert move

ev

random move

rv

random move

rv

ROI surface (points)

Err

or (

%)

Move Prediction - Representation

Selection of raw features:• Edge• Liberties• Captures• Last move

• Stones• Ko• Liberties after• Nearest stones

Remove symmetry by canonical ordering & colour reversal

High-dimensional representation suffers from curse of dimensionality

=> Apply linear feature extraction to reduce dimensionality

Move Prediction - Feature Extraction

Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA)

Move-Pair Analysis (MPA) Linear projection maximizing the expected quadratic

distance between pairs Weakness: ignores global features

Modified Eigenspace Separation Transform (MEST)

Linear projection on eigenvectors with largest absolute eigenvalues of the correlation difference matrix

Good results using combination of MEST & MPA

Standard techniques, sub-optimal for ranking

Human & Computer Performance Compared

Game 1

Game 2

Game 3

Average

3 dan 96.7 91.5 89.5 92.4

2 dan 95.8 95.0 97.0 95.9

2 kyu 95.0 91.5 92.5 92.9

MP* 90.0 89.4 89.5 89.6

2 kyu 87.5 90.8 n.a. 89.3

5 kyu 87.5 84.4 85.0 85.5

8 kyu 87.5 85.1 86.5 86.3

13 kyu 83.3 75.2 82.7 80.2

14 kyu 76.7 83.0 80.5 80.2

15 kyu 80.0 73.8 82.0 78.4

Black must choose between two red intersections

Performance on professional 19×19 games

Ranking

Perf.

First 25 %

Top-3 45 %

Top-20 80 %

moves

Cum

ulat

ive

perf

orm

ance

(%

)

Learning to Score

Using archives of (online) Go servers, such as NNGS, for ML is non-trivial because of :

1. Missing information: Only a single numeric result is given. The status of individual board-points is not available.

2. Unfinished games: Humans resign early or do not even finish the game at all

3. Bad moves

To overcome 1&2, we need reliable final scores Large dataset created: 18k labeled final 9x9

positions Several tricks were used to identify dubious scores A few thousand positions scored/verified manually

The scoring method

1. Classify life & death for all blocks2. Remove dead blocks3. Mark empty intersections using flood-

fills or distance to nearest remaining colour

4. (Optional) recursively update representation to take adjacent block status into account; return to 1

Blocks to Classify

For final positions there are 3 types of blocks:

1. Alive (O): at border of own territory2. Dead (X): inside the opponents territory3. Irrelevant (?): removal does not change area

score We only train on blocks of type 1 and 2 !

? ?

?

?

? ?

?

?

? ?

?

?

Representation of the blocks

Direct features of the block• Size• Perimeter• Adjacent opponent stones• 1st, 2nd, 3rd - order liberties• Protected liberties• Auto-atari liberties• Adjacent opponent blocks• Local majority (MD < 3)• Centre of mass• Bounding box size

Adjacent fully accessible CERs • Number of regions• Size• Perimeter• Split points

Adjacent partially accessible CERs• Number of partially accessible regions• Accessible size• Accessible perimeter• Inaccessible size • Inaccessible perimeter • Inaccessible split points

Disputed territory• Direct liberties of the block in disputed territory• Liberties of all friendly blocks in disputed

territory• Liberties of all enemy blocks in disputed

territory

Directly adjacent eyespace• Size• Perimeter

Optimistic chain• Number of blocks• Size• Perimeter• Split points• Adjacent CERs• Adjacent CERs with eyespace• Adjacent CERs, fully accessible from at least 1 block• Size of adjacent eyespace• Perimeter of adjacent eyespace• External opponent liberties

Opponent blocks (3x) (1) Weakest directly adjacent opponent block (weakest = block with the fewest

liberties) (2) 2nd weakest directly adjacent opponent block (3) Weakest opponent block adjacent or sharing liberties with the block’s optimistic

chain• Perimeter• Liberties• Shared liberties• Split points• Perimeter of adjacent eyespace

Recursive features• Predicted value of strongest adjacent friendly block• Predicted value of weakest adjacent opponent block• Predicted value of second weakest adjacent opponent block• Average predicted value of weakest opponent block’s optimistic chain• Adjacent eyespace size of the weakest opponent block’s optimistic chain• Adjacent eyespace perimeter of the weakest opponent block’s optimistic chain

Scoring Performance

Blocks (direct/recursive classification)

Training Size

(blocks)

Direct error

(%)

2-step error

(%)

3-step error

(%)

4-step error

(%)

1,000 1.93 1.60 1.52 1.48

10,000 1.09 0.76 0.74 0.72

100,000 0.68 0.43 0.38 0.37

Full board (4-step recursive classification) Incorrect score: 1.1% = better than the average rated NNGS

player (~7 kyu) Incorrect winner: 0.5% = comparable to the average NNGS player

Average absolute score difference: 0.15 points

Life & Death during the game

Predict whether blocks of stones can be captured

Perfect predictions not possible in non-final positions!

Approximate the a posteriori probability that a blockwill be alive at the end of the game

4 Block types First 3 types identified from final position (as before) 4th type: blocks captured during the game -> dead Irrelevant blocks not used during training!

Representation extended with 5 additional featuresPlayer to move, Ko , Distance to ko, Nr. of black/white stones on the

board

Black blocks50% alive

Performance over the game

MLP, 25 hidden units, 175,000 training examplesAverage prediction error: 11.7%

Estimating Potential Territory

Why estimate territory?1. For predicting the score (potential territory)

Main purpose: to build an evaluation function May also be used to adjust strategy (e.g., play safe when

ahead)

2. To detect safe regions (secure territory)Main purpose: forward pruning (risky unless provably correct)

Our main focus is on (1) potential territory

We investigate: Direct methods, known or derived from literature ML methods, trained on game records Enhancements with (heuristic) knowledge of L&D

Direct methods

1. Explicit control2. Direct control3. Distance-based control4. Influence based control (~ numerical

dilations)5. Bouzy’s method (numerical dilations +

erosions)6. Combinations 5+3 or 5+4

Enhancements use knowledge of Life & Death to remove dead stones (or reverse their colour)

ML methods

Simple representation Intersections in ROI:

Colour {+1 black, -1 white, 0 empty}

Enhanced representation Intersections in ROI:

Colour x Prob.(Alive) Edge Colour of nearest stone Colour of nearest living stone

Prob.(Alive) obtained from pre-trained MLP

predicted colour +1 sure black 0 neutral - 1 sure white

features

Performance at various levels of confidence

Predicting the winner (percentage correct)

Predicting the score (absolute error)

Summary: Searching Techniques

The capture game Simplified Go rules (who captures the first stone wins) boards up to 6x6 solved

Go on small boards Normal Go rules First program in the world to have solved 5x5 Go

Perfect solutions up to ~30 intersections Heuristic knowledge required for larger

boards

Summary: Learning Techniques 1

Move prediction Very good results (strong kyu level) Strong play is possible with limited

selection of moves

Scoring final positions Excellent classification Reliable training data

Summary: Learning Techniques 2

Predicting life and death Good results Most important ingredient for accurate

evaluation of positions during the game

Estimating potential territory Comparison of non-learning and learning

methods Best results with learning methods

Conclusions

Knowledge is the most important ingredient to improve Go programs

Searching techniques Provably correct knowledge sufficient for solving

small problems up to ~30 intersections Heuristic knowledge essential for larger problems

Learning techniques Heuristic knowledge learned quite well from

games Learned heuristic knowledge at least at the level

of reasonably strong kyu players

Questions?

?More information at: http://erikvanderwerf.tengen.nl/

Email: tengen.nl@erik

AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D.

Documents

Transcript of AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D.