Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing...
Transcript of Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing...
![Page 1: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/1.jpg)
CIG07 Hawaii Honolulu 1
Old-fashioned Computer Go vs Monte-Carlo Go
Bruno BouzyParis Descartes University, France
CIG07 Tutorial April 1st 2007
Honolulu, Hawaii
![Page 2: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/2.jpg)
CIG07 Hawaii Honolulu2
Outline
Computer Go (CG) overviewRules of the gameHistory and main obstaclesBest programs and competitions
Classical approach: divide and conquer Conceptual evaluation functionGlobal move generationCombinatorial Game Theory
New approach: Monte-Carlo Tree Search (MCTS)Simple approach: depth-1 Monte-CarloMCTSUCT
Adaptations of UCT 9x9 boardsScaling up to 19x19 boardsParallelization
Future of Computer Go
![Page 3: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/3.jpg)
CIG07 Hawaii Honolulu3
Rules overview through a game(opening 1)
Black and White move alternately by putting one stone on an intersection of the board.
![Page 4: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/4.jpg)
CIG07 Hawaii Honolulu4
Rules overview through a game(opening 2)
Black and White aims at surrounding large « zones »
![Page 5: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/5.jpg)
CIG07 Hawaii Honolulu5
Rules overview through a game(atari 1)
A white stone is put into « atari » : it has only one liberty (empty intersection) left.
![Page 6: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/6.jpg)
CIG07 Hawaii Honolulu6
Rules overview through a game(defense)
White plays to connect the one-liberty stone yielding a four-stone white string with 5 liberties.
![Page 7: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/7.jpg)
CIG07 Hawaii Honolulu7
Rules overview through a game(atari 2)
It is White’s turn. One black stone is atari.
![Page 8: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/8.jpg)
CIG07 Hawaii Honolulu8
Rules overview through a game(capture 1)
White plays on the last liberty of the black stone which isremoved
![Page 9: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/9.jpg)
CIG07 Hawaii Honolulu9
Rules overview through a game(human end of game)
The game ends when the two players pass.In such position, only experienced players can pass.
![Page 10: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/10.jpg)
CIG07 Hawaii Honolulu10
Rules overview through a game(contestation 1)
White contests the black « territory » by playing inside.Black answers, aiming at capturing the invading stone.
![Page 11: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/11.jpg)
CIG07 Hawaii Honolulu11
Rules overview through a game(contestation 2)
White contests black territory, but the 3-stone white string has one liberty left
![Page 12: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/12.jpg)
CIG07 Hawaii Honolulu12
Rules overview through a game(follow up 1)
Black has captured the 3-stone white string
![Page 13: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/13.jpg)
CIG07 Hawaii Honolulu13
Rules overview through a game(follow up 2)
White is short on liberties…
![Page 14: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/14.jpg)
CIG07 Hawaii Honolulu14
Rules overview through a game(follow up 3)
Black suppresses the last liberty of the 9-stone stringConsequently, the white string is removed
![Page 15: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/15.jpg)
CIG07 Hawaii Honolulu15
Rules overview through a game(follow up 4)
Contestation is going on on both sides. White has captured four black stones
![Page 16: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/16.jpg)
CIG07 Hawaii Honolulu16
Rules overview through a game(concrete end of game)
The board is covered with either stones or « eyes »The two players pass
Black: 44
White: 37
Komi: 7.5
White wins…
by 0.5 point!
![Page 17: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/17.jpg)
CIG07 Hawaii Honolulu17
History (1/2)
First go program (Lefkovitz 1960)First machine learning work (Remus 1963)Zobrist hashing (Zobrist 1969)First two computer go PhD thesis
Potential function (Zobrist 1970)Heuristic analysis of Go trees (Ryder 1970)
First-program architectures: influence-function basedSmall boards (Thorpe & Walden 1964)Interim2 program (Wilcox 1979)G2 program (Fotland 1986)Life and death (Benson 1988)Pattern-based program: Goliath (Boon 1990)
![Page 18: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/18.jpg)
CIG07 Hawaii Honolulu18
History (2/2)
Combinatorial Game Theory (CGT)ONAG (Conway 1976), Winning ways (Conway & al 1982)Mathematical Go (Berlekamp 1991)Go as a sum of local games (Muller 1995)
Machine learningAutomatic acquisition of tactical rules (Cazenave 1996)Neural network-based evaluation function (Enzenberger 1996)
Cognitive modelling(Bouzy 1995)(Yoshikawa & al 1997)
![Page 19: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/19.jpg)
CIG07 Hawaii Honolulu19
Main obstacles (1/2)
CG witnesses AI improvements1994: Chinook beat Marion Tinsley (Checkers)1997: Deep Blue beat Kasparov (Chess)1998: Logistello >> best human (Othello)(Schaeffer, van den Herik 2002)
Combinatorial complexityB: branching factor, L: game length,BL estimation :Go (10400) > Chess(10123) > Othello(1058) > Checkers(1032)
![Page 20: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/20.jpg)
CIG07 Hawaii Honolulu20
Main obstacles (2/2)
2 main obstacles :Global tree search impossibleNon terminal position evaluation hard
Medium level (10th kyu)
Huge effort since 1990 :Evaluation function,Break down the position into sub-positions (Conway, Berlekamp),Local tree searches,pattern-matching, knowledge bases.
![Page 21: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/21.jpg)
CIG07 Hawaii Honolulu21
Competitions
Ing Cup (1987-2001)FOST Cup(1995-1999)Gifu Challenge (2001-)Computer Olympiads (1990;2000-)Monthly KGS tournaments (2005-)Computer Go ladder (Pettersen 1994-)Yearly continental tournaments
AmericanEuropean
CGOS (Computer Go Operating System 9x9)
![Page 22: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/22.jpg)
CIG07 Hawaii Honolulu22
Best 19x19 programs
Go++Go++IngIng, Gifu, FOST, Olympiads, Gifu, FOST, Olympiads
HandtalkHandtalk (=(=GoemateGoemate))IngIng, FOST, Olympiads, FOST, Olympiads
KCC KCC IgoIgoFOST, GifuFOST, Gifu
HarukaHaruka??
Many Faces of GoMany Faces of GoIngIng
Go IntellectGo IntellectIngIng, Olympiads, Olympiads
GNU GoGNU GoOlympiadsOlympiads
![Page 23: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/23.jpg)
CIG07 Hawaii Honolulu23
Indigo
Indigowww.math-info.univ-paris5.fr/~bouzy/INDIGO.html
International competitions since 2003:Computer Olympiads:
2003: 9x9: 4/10, 19x19: 5/11 2004: 9x9: 4/9, 19x19: 3/5 (bronze) ☺2005: 9x9: 3/9 (bronze) ☺, 19x19: 4/72006: 19x19: 3/6 (bronze) ☺
Kiseido Go Server (KGS):« open » and « formal » tournaments.
Gifu Challenge:2006: 19x19: 3/17 ☺
CGOS 9x9
![Page 24: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/24.jpg)
CIG07 Hawaii Honolulu24
End of the overview
Computer Go (CG) overviewRules of the gameHistory and main obstaclesBest programs and competitions
Classical approach: divide and conquer Conceptual evaluation functionGlobal move generationCombinatorial Game Theory
New approach: Monte-Carlo Tree Search (MCTS)Simple approach: depth-1 Monte-CarloMCTS UCT
Adaptations of UCT 9x9 boardsScaling up to 19x19 boardsParallelization
Future of Computer Go
![Page 25: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/25.jpg)
CIG07 Hawaii Honolulu25
Divide-and-conquer approach (start)
Break-downWhole game (win/loss; score)Goal-oriented sub-games String capture (shicho)
Connections, Dividers, Eyes, Life and DeathLocal searches
Alfa-beta and enhancementsPN-search, Abstract Proof Search, lambda-search
Local resultsCombinatorial-Game-Theory-based
Main feature:If Black plays first, if White plays first
(>, <, *, 0, {a|b}, …)Global Move choice
Depth-0 global search:Temperature-based: *, {a|b}
Shallow global search
![Page 26: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/26.jpg)
CIG07 Hawaii Honolulu26
A Go position
![Page 27: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/27.jpg)
CIG07 Hawaii Honolulu27
Basic concepts, local searches, and combinatorial games (1/2)
Block capture
|| 0
First player wins
![Page 28: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/28.jpg)
CIG07 Hawaii Honolulu28
Basic concepts, local searches, and combinatorial games (2/2)
Connections:
>0 >0
|| 0
Dividers:
|| 0
12 3
4
2 1
113
2 1
1 1
![Page 29: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/29.jpg)
CIG07 Hawaii Honolulu29
Influence function
Based on dilation (and erosion)
![Page 30: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/30.jpg)
CIG07 Hawaii Honolulu30
Group building
Initialisation:Group = string
Influence function:Group = connected compound
Process:Groups are merged with connector >
Result:
![Page 31: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/31.jpg)
CIG07 Hawaii Honolulu31
Group status
Unstable groups:
Dead group:
![Page 32: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/32.jpg)
CIG07 Hawaii Honolulu32
Conceptual Evaluation Function pseudo-code
While dead groups are being detected,
perform the inversion and aggregation processes
Return the sum of
the “value” of each intersection of the board
(+1 for Black, and –1 for White)
![Page 33: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/33.jpg)
CIG07 Hawaii Honolulu33
A Go position conceptual evaluation
![Page 34: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/34.jpg)
CIG07 Hawaii Honolulu34
A Go position
![Page 35: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/35.jpg)
CIG07 Hawaii Honolulu35
Local move generation
Depends on the abstraction level
Pattern-based
X
X XY
X
![Page 36: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/36.jpg)
CIG07 Hawaii Honolulu36
« Quiet » global move generation
AB
CD E
FG
H
I
![Page 37: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/37.jpg)
CIG07 Hawaii Honolulu37
« Fight-oriented » global move generation
C
AB
E
D
F
G
![Page 38: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/38.jpg)
CIG07 Hawaii Honolulu38
Divide and conquer approach (move choice)
Two strategies using the divide and conquer approach
Depth-0 strategy, global move evaluation
Local tree searches result based
Domain-dependent knowledge
No conceptual evaluation
GNU Go, Explorer, Handtalk (?)
Shallow global tree search using a conceptual evaluation function
Many Faces of Go, Go Intellect, Indigo2002.
![Page 39: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/39.jpg)
CIG07 Hawaii Honolulu39
Divide and conquer approach (+ and -)
UpsidesFeasible on current computersLocal search « precision »Local result accuracy based on anticipationFast execution
DownsidesThe breakdown-stage is not proved to be correctBased on domain-dependent knowledgeThe sub-games are not independentTwo-goal-oriented moves are hardly consideredData structure updating complexity
![Page 40: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/40.jpg)
CIG07 Hawaii Honolulu40
End of “classical” part
Computer Go (CG) overviewRules of the gameHistory and main obstaclesBest programs and competitions
Classical approach: divide and conquer Conceptual evaluation functionGlobal move generationCombinatorial Game Theory
New approach: Monte-Carlo Tree Search (MCTS)
Simple approach: depth-1 Monte-CarloMCTS, UCT
Adaptations of UCT 9x9 boardsScaling up to 19x19 boardsParallelization
Future of Computer Go
![Page 41: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/41.jpg)
CIG07 Hawaii Honolulu41
Monte Carlo and Computer games (start)
Games containing elements of chance:
Backgammon (Tesauro 1989-),
Games with hidden information:
Poker (Billings & al. 2002),
Scrabble (Sheppard 2002).
![Page 42: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/42.jpg)
CIG07 Hawaii Honolulu42
Monte Carlo and complete information games
(Abramson 1990) model of terminal node evaluation based on simulations
Applied to 6x6 Othello
(Brügmann 1993) simulated annealing
Two move sequences (one used by Black, one used by White)
« all-moves-as-first » heuristic
Gobble
![Page 43: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/43.jpg)
CIG07 Hawaii Honolulu43
Monte-Carlo and Go
Past and recent history(Brugmann 1993), (Bouzy & Helmstetter 2003) ,Min-max and MC Go (Bouzy 2004),Knowledge and MC Go (Bouzy 2005),UCT (Kocsis & Szepesvari 2006),UCT-like (Coulom 2006),
Quantitative assessment:σ (9x9) ~= 351 point precision: N ~= 1,000 (68%), 4,000 (95%)5,000 up to 10,000 9x9 games / second (2 GHz)few MC evaluations / second
![Page 44: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/44.jpg)
CIG07 Hawaii Honolulu44
Evaluation:Launch N random gamesEvaluation = mean of terminal position evaluations
Depth-one greedy algorithm:For each move,
Launch N random games starting with this move Evaluation = mean of terminal position evaluations
Play the move with the best meanComplexity:
Monte Carlo: O(NBL)Tree search: O(BL)
Monte Carlo and Computer Games (basic)
![Page 45: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/45.jpg)
CIG07 Hawaii Honolulu45
An explicit terminal position
The board is covered with either stones or « eyes »The score is easy to compute
![Page 46: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/46.jpg)
CIG07 Hawaii Honolulu46
Monte-Carlo and Computer Games (strategies)
Greedy algorithm improvement: confidence interval update[m - Rσ/N1/2, m + Rσ/N1/2 ]R: parameter.
Progressive pruning strategy :First move choice: randomly,Prune move inferior to the best move,(Billings al 2002, Sheppard 2002, Bouzy & Helmstetter ACG10 2003)
Upper bound strategy:First move choice : argmax (m + Rσ/N1/2 ),No pruningIntEstim (Kaelbling 1993), UCB (Auer & al 2002)
Lower bound strategy
![Page 47: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/47.jpg)
CIG07 Hawaii Honolulu47
Progressive Pruning strategy
Are there unpromising moves ?
Move 1
Move 2
Current best
Move 3
Move 4
Can be prunedMove value
![Page 48: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/48.jpg)
CIG07 Hawaii Honolulu48
Monte-Carlo and Computer Games (pruning strategy)
The root is expanded
Example
Random games are launched on child nodes
![Page 49: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/49.jpg)
CIG07 Hawaii Honolulu49
Monte-Carlo and Computer Games (pruning strategy)
After several games, some child nodes are pruned
Example
![Page 50: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/50.jpg)
CIG07 Hawaii Honolulu50
Monte-Carlo and Computer Games (pruning strategy)
After other random games, one move is left…And the algorithm stops.
Example
![Page 51: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/51.jpg)
CIG07 Hawaii Honolulu51
Upper bound strategy (1/5)
Which move to select ?
Move 1
Move 2Current best mean
Move 3Current best upper bound
Move 4
Move value
![Page 52: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/52.jpg)
CIG07 Hawaii Honolulu52
Upper bound strategy (2/5)
The « best » move has received a GOOD reward:
Move 1
Move 2Current best mean
Move 3STILL Current best upper bound
Move 4
Move value
![Page 53: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/53.jpg)
CIG07 Hawaii Honolulu53
Upper bound strategy (3/5)
The « best » move receives GOOD REWARDS ON AVERAGE:
Move 1NEW current best upper bound
Move 2Current best mean
Move 3Old best upper boundIts upper bound slightly decreases
Move 4
Move value
![Page 54: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/54.jpg)
CIG07 Hawaii Honolulu54
Upper bound strategy (4/5)
The « best » move has received a BAD reward:
Move 1NEW current best upper bound
Move 2Current best mean
Move 3Old best upper boundIts mean value has merely decreased.
Move 4
Move value
![Page 55: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/55.jpg)
CIG07 Hawaii Honolulu55
Upper bound strategy (5/5)
Even if fhe « best » move receives good rewards…It does not stay the best.
If fhe « best » move receives bad rewards…It does not stay the best.
ConclusionUpper bound strategy favours exploration.« Optimistic under uncertainty ».Can be used when losingUsed in UCT
![Page 56: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/56.jpg)
CIG07 Hawaii Honolulu56
Lower bound strategy (1/5)
Which move to select ?
Move 1
Move 2Current best lower bound
Move 3
Move 4
Move value
![Page 57: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/57.jpg)
CIG07 Hawaii Honolulu57
Lower bound strategy (2/5)
The « best » move has received a GOOD reward:
Move 1
Move 2STILL Current best lower boundIts mean value has merely increased
Move 3
Move 4
Move value
![Page 58: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/58.jpg)
CIG07 Hawaii Honolulu58
Lower bound strategy (3/5)
The « best » move receives GOOD REWARDS:
Move 1
Move 2STILL Current best lower boundIts upper bound slightly increases
Move 3
Move 4Move value
![Page 59: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/59.jpg)
CIG07 Hawaii Honolulu59
Lower bound strategy (4/5)
The « best » move has received sufficiently BAD rewards:
Move 1NEW current best lower bound
Move 2Old best lower boundIts mean value has decreased
Move 3
Move 4
Move value
![Page 60: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/60.jpg)
CIG07 Hawaii Honolulu60
Lower bound strategy (5/5)
The « best » move does not stay the best…
… only if it receives bad rewards
Conclusion
Lower bound strategy favours exploitation.
« Pessimistic under uncertainty ».
Can be used when winning.
Not used in UCT.
![Page 61: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/61.jpg)
CIG07 Hawaii Honolulu61
Depth-one Monte-Carlo Go (pros and cons)
Results:Move quality increases with computer power ☺Robust evaluation ☺Global (statistical) search ☺
Way of playing:Good global sense ☺, local tactical weakness –
Easy to program ☺Rules of the games only, No break down of the position into sub-positions, No conceptual evaluation function.
![Page 62: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/62.jpg)
CIG07 Hawaii Honolulu62
Multi-Armed Bandit Problem (1/2)
(Berry & Fristedt 1985, Sutton & Barto 1998, Auer & al 2002)
A player plays the Multi-armed bandit problemHe selects an arm to pushStochastic reward depending on the selected armFor each arm, the reward distribution is unknownGoal: maximize the cumulated reward over timeExploitation vs exploration dilemma
Main algorithmsε-greedy, Softmax,IntEstim (Kaelbling 1993)UCB (Auer & al 2002)POKER (Vermorel 2005)
![Page 63: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/63.jpg)
CIG07 Hawaii Honolulu63
Multi-Armed Bandit Problem (2/2)
MCTS & MAB similaritiesAction choiceStochastic reward (0 1 or numerical)Goal: choose the best action
MCTS & MAB: two main differences
Online or offline reward ?MAB: cumulated online rewardMCG: offline
Online rewards counts nothingReward provided later by the game outcome
MCG: Superposition of MAB problems1 MAB problem = 1 tree node
![Page 64: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/64.jpg)
CIG07 Hawaii Honolulu64
Monte-Carlo Tree Search (MCTS) (start)
Goal: appropriate integration of MC and TS
TS: alfa-beta like algorithms, best-first algorithmsMC: uncertainty management
UCT: UCB for Trees (Kocsis & Szepesvari 2006)
Spirit: superpositions of UCB (Auer & al 2002)Downside: Tree growing left unspecified
MCTS frameworkMove selection (Chaslot & al) (Coulom 2006)Backpropagation (Chaslot & al) (Coulom 2006)Expansion (Chaslot & al) (Coulom 2006)Simulation (Bouzy 2005) (Wang & Gelly 2007)
![Page 65: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/65.jpg)
CIG07 Hawaii Honolulu65
Move Selection in UCT
UCB (Auer & al 2002)
Move eval = mean + C * sqrt(log(t)/s)
= Upper Confidence interval Bound
t: number of simulations of the parent node
s: number of simulations of the child node
![Page 66: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/66.jpg)
CIG07 Hawaii Honolulu66
Backpropagation
Node evaluation:“Average” back-up = average over simulations going through this node“Min-Max” back-up = Max (resp Min) evaluations over child nodes“Robust max” = Max number of simulations going through this node
Good properties of MCTS:With “average” back-up and UCT move selection,
the root evaluation converges to the “min-max” evaluation when the number of simulations goes to infinity
“Average” back-up is used at every node
“Robust max” can be used at the end of the process to complete properly
![Page 67: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/67.jpg)
CIG07 Hawaii Honolulu67
Node expansion and management
Strategy
Every nodes in the simulation --
One node per simulation
Few nodes per simulation according to domain dependent probabilities
Use of a Transposition Table (TT)
Merge sets of samples to obtain a better precision
When hash collision: link the nodes in a list
![Page 68: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/68.jpg)
CIG07 Hawaii Honolulu68
MCTS():While time,
PlayOutTreeBasedGame (list)outcome = PlayOutRandomGame()Update nodes (list, outcome)
Play the move with the best mean
PlayOutTreeBasedGame (list)node = getNode(position)While node do
Add node to list.M = Select move (node)Play move (M)node = getNode(position)
node = new Node()Add node to list.
Monte-Carlo Tree Search (pseudo-code)
![Page 69: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/69.jpg)
CIG07 Hawaii Honolulu69
Upper Confidence for Trees (UCT)(1)
A first random game is launched, and its outcome is kept carefully
1
![Page 70: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/70.jpg)
CIG07 Hawaii Honolulu70
Upper Confidence for Trees (UCT)(2)
A first child node is created.
![Page 71: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/71.jpg)
CIG07 Hawaii Honolulu71
Upper Confidence for Trees (UCT)(3)
The outcome of the random game is backed up.
1
1
![Page 72: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/72.jpg)
CIG07 Hawaii Honolulu72
Upper Confidence for Trees (UCT)(4)
At the root, unexplored moves still exist.
A second game is launched, starting with an unexplored move.
0
1
1
![Page 73: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/73.jpg)
CIG07 Hawaii Honolulu73
Upper Confidence for Trees (UCT)(5)
A second node is created and the outcome is backed-up to compute means.
1/2
10
![Page 74: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/74.jpg)
CIG07 Hawaii Honolulu74
Upper Confidence for Trees (UCT)(6)
All legal moves are explored, the corresponding nodes are created, and their means computed.
2/4
1 0 01
![Page 75: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/75.jpg)
CIG07 Hawaii Honolulu75
Upper Confidence for Trees (UCT)(7)
For the next iteration, a node is greedily selected with the UCT move selection rule:
Move eval = mean + C * sqrt(log(t)/s)
2/4
1 0 01
(In the continuation of this example, for a simplicity reason, let us consider C=0).
![Page 76: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/76.jpg)
CIG07 Hawaii Honolulu76
Upper Confidence for Trees (UCT)(8)
A random game starts from this node.
0.5
1 0 01
2/4
1 0 01
0
![Page 77: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/77.jpg)
CIG07 Hawaii Honolulu77
Upper Confidence for Trees (UCT)(9)
A node is created.
2/5
1 0 01/2
0
![Page 78: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/78.jpg)
CIG07 Hawaii Honolulu78
Upper Confidence for Trees (UCT)(10)
The process repeats…
2/6
1/2 0 01/2
00
![Page 79: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/79.jpg)
CIG07 Hawaii Honolulu79
Upper Confidence for Trees (UCT)(11)
… several times …
3/7
1/2 0 02/3
00 1
![Page 80: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/80.jpg)
CIG07 Hawaii Honolulu80
Upper Confidence for Trees (UCT)(12)
… several times …
3/8
1/2 0 02/4
0/20 1
0
![Page 81: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/81.jpg)
CIG07 Hawaii Honolulu81
Upper Confidence for Trees (UCT)(13)
… in a best first manner …
3/9
1/3 0 02/4
0/20 1
0
0
![Page 82: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/82.jpg)
CIG07 Hawaii Honolulu82
Upper Confidence for Trees (UCT)(14)
… until timeout.
4/10
1/3 0 03/5
1/30 10
0 1
![Page 83: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/83.jpg)
CIG07 Hawaii Honolulu83
Half of “part two”
Computer Go (CG) overviewRules of the gameHistory and main obstaclesBest programs and competitions
Classical approach: divide and conquer Conceptual evaluation functionGlobal move generationCombinatorial Game Theory
New approach: Monte-Carlo Tree Search (MCTS)Simple approach: depth-1 Monte-CarloMCTSUCT
Adaptations of UCT 9x9 boardsScaling up to 19x19 boardsParallelization
Future of Computer Go
![Page 84: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/84.jpg)
CIG07 Hawaii Honolulu84
Adaptations of UCT
The “adaptations” are various...
UCT formula tuning (C tuning, “UCB-tuned”)Exploration-exploitation balanceOutcome = Territory score or win-loss information ?Doubling the random game numberTransposition Table
Have or not have, Keep or not keepUpdate nodes of transposed sequences
Use grand-parent informationSimulated games
Capture, 3x3 patterns, Last-move heuristic, Move number, «Mercy» rule
Speeding upOptimizing the random gamesPonderingMulti-processor computersDistribution over a (local) network
![Page 85: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/85.jpg)
CIG07 Hawaii Honolulu85
Assessing an adaptation
Self-playFirst and easy testFew hundred games per night% of winsRisk of evolving into a wrong direction
Against one differently designed programGNU Go 3.6Open source with GTP (Go Text Protocol)Few hundred games per night% of winsRisk of over-fitting
Against several differently designed programsCGOS (Computer Go Operating System)Real testELO rating improvement9x9Slow process
![Page 86: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/86.jpg)
CIG07 Hawaii Honolulu86
CGOS rankings on 9x9
ELO ratings on 6 march 2007MoGo 3.2 2320MoGo 3.4 10k 2150Lazarus 2090Zen 2050AntiGo 2030Valkyria 2020MoGo 3.4 3k 2000Irene (=Indigo) 1970MonteGnu 1950firstGo 1920NeuroGo 1860GnuGo 1850Aya 1820…Raw UCT 1600?…AnchorMan 1500…Raw MC 1200?…ReadyToGo 1000?…
![Page 87: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/87.jpg)
CIG07 Hawaii Honolulu87
Move selection formula tuning
Using UCB
Move eval = mean + C * sqrt(log(t)/s)
What is the best value of C ?
Result: 60-40%
Using “UCB-tuned” (Auer & al 2002)The formula uses the variance V:
Move eval = mean + sqrt(log(t)*min(1/4,V)/s)
Result: “substantially better” (Wang & Gelly 2007)
No need to tune C
![Page 88: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/88.jpg)
CIG07 Hawaii Honolulu88
Exploration vs exploitation
General ideaExplore
at the beginning of the process, or when losing
Exploit near the end, or when winning
Argmax over the child nodes with their...
Mean valueNumber of random games performed (i.e. « robust-max »)Result: Mean value vs robust-max = +5%
Diminishing C linearly in the remaining time
Inspired by (Vermorel & al 2005)Result: +5%
![Page 89: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/89.jpg)
CIG07 Hawaii Honolulu89
Which kind of outcome ?
2 kinds of outcomes
Win-Loss Information (WLI): 0 or 1Territory Score: integer between -81 and +81Combination of Both TS + Bonus*WLI
Resulting statistical information
WLI: probability of winning ++TS: territory expectation
ResultsAgainst GNU Go
TS: 0%WLI: +15%TS+WLI: +17% (with bonus = 45)
![Page 90: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/90.jpg)
CIG07 Hawaii Honolulu90
The diminishing return experiment
Doubling the number of simulations
N = 100,000
Results:
2N vs N: 60-40%
4N vs 2N: 58-42%
![Page 91: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/91.jpg)
CIG07 Hawaii Honolulu91
Transposition table (1)
Have or not have ?
Zobrist number
TT access time << random simulation timeHashTable collision solved with a linked list or records
Interest: merging two node information for the same positionUnion of samplesMean value refined
Result: 60-40%
Keep or not keep TT info from one move to the next ?
Result: 70-30%
![Page 92: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/92.jpg)
CIG07 Hawaii Honolulu92
Transposition table (2a)
Update nodes of transposed sequences
If no capture occurs in a sequence of moves, then Black moves could have been played in a twist orderWhite moves as well
There are « many » sequences that are transposed from the sequence actually played out
Up: one simulation updates much more nodes that the nodes the actual sequence gets through
Down: most of these « transposed » nodes do not existIf you create them: memory explosion occursIf you don't: the effect is lowered.
Result: 65-35%
![Page 93: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/93.jpg)
CIG07 Hawaii Honolulu93
Transposition table (2b)
Which nodes to update ?
Actual
Sequence:
ACBD
Nodes:
Virtual
Sequences:
BCAD, ADBC, BDAC
Nodes:
B
A B
B
DD
D C
CC
AA
![Page 94: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/94.jpg)
CIG07 Hawaii Honolulu94
Grand-parent information (1/2)
Mentioned by (Wang & Gelly 2007)
A move is associated to an intersection
Use statistical information available in nodes associated to thesame intersection
For...
Initializing mean values
Ordering the node expansion
Result: 52-48%
![Page 95: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/95.jpg)
CIG07 Hawaii Honolulu95
Grandparent information (2/2)
Given its ancestors, estimate the value of a new node ?
Idea:move B’ is similar to move B because of their same locationnew.value = this.value +uncle.value – grandFather.value
BA
B’
Cthis
new
grandFather
father uncle
![Page 96: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/96.jpg)
CIG07 Hawaii Honolulu96
Improvement of simulated games (1/3)
Pseudo-random games:Instead of being generated with a uniform probability,Moves are generated with a probability depending on specificdomain-dependent knowledge
Liberties of string in « atari »: Patterns 3x3:
Pseudo-random games look like go,Computed means are more significant than before ☺
![Page 97: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/97.jpg)
CIG07 Hawaii Honolulu97
Improvement of simulated games (2/3)
Features of a Pseudo-Random (PR) player3x3 pattern urgency table38 patterns (empty intersection at the center)25 dispositions with the edge#patterns = 250,000Urgency « atari »
“Automatic” playerReinforcement Learning experiments(Bouzy & Chaslot 2006)
![Page 98: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/98.jpg)
CIG07 Hawaii Honolulu98
Improvement of simulated games (3/3)
Insert knolwedge within random games:
high urgency for...
Capturing-escaping Result: 55-45%
Moves advised by 3x3 patterns Result: 60-40%
Moves located near the last move
(in the 3x3 neighbourhood)
(Wang & Gelly 2007)
Result: 60-40%
![Page 99: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/99.jpg)
CIG07 Hawaii Honolulu99
The « mercy » rule
(Hillis 2006)
Interrupt the game when the difference of captured stones is greater than a threshold
Up: random games are shortened with some confidence
Result: 51-49%
![Page 100: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/100.jpg)
CIG07 Hawaii Honolulu100
Speeding up the random games (1)
Full random on current desktop computer
50,000 rgps (Lew 2006) an exception !
20,000 rgps (commonly eared)
10,000 rgps (my program!)
Pseudo-random (with patterns and few knowledge)
5,000 rgps (my program)
Optimizing performance with profiling
Rough optimization is worthwhile
![Page 101: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/101.jpg)
CIG07 Hawaii Honolulu101
Speeding up the random games (2)
PonderingThink on the opponent timeResult: 55-45%
Parallelization on a multi-processor computerShared memory: UCT tree = TTTT locked with a semaphoreResult: 2 proc vs 1 proc : 58-42%
Parallelization over a network of computersLike the Chessbrain project (Frayn & Justiniano)One “server” manages the UCT treeN “clients” perform random gamesCommunication with messagesResult: not yet available!
![Page 102: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/102.jpg)
CIG07 Hawaii Honolulu102
While time do,
PlayOutTreeBasedGame (list)
outcome = PlayOutRandomGame()
Update nodes (list, outcome)
Play the move with the best mean
Light processes using TT
Parallelizing MCTS
Heavy and stand-aloneprocess using board information
and not the TT
![Page 103: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/103.jpg)
CIG07 Hawaii Honolulu103
Scaling up to 19x19 boards
Knowledge-based move generation
At every nodes in the tree
Local MC-searches
Restrict the random game to a « zone »
How to define zones ?Statically with domain-dependent knowledge
Result: 30-70%Statistically: proper appoach, but how ?
Warning: avoid the difficulties of the breaking-down approach
Parallelization
The promising approach
![Page 104: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/104.jpg)
CIG07 Hawaii Honolulu104
Summing up the enhancements
DetailsUCT formula tuning 60-40Exploration-exploitation balance 55-45Proba of winning vs territory expect. 65-45Transposition Table
Have or not have 60-40Keep or not keep 70-30Update nodes of transposed sequences 65-35
Use grand-parent information 52-48Simulated games
Capture, 3x3 patterns 60-40Last-move 60-40« Mercy » rule 51-49
Speeding upOptimizing the random games 60-40Pondering 51-49Multi-processor computers 58-42Distribution over a network ?
Total 99-1 ?
![Page 105: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/105.jpg)
CIG07 Hawaii Honolulu105
Almost already the end
Computer Go (CG) overviewRules of the gameHistory and main obstaclesBest programs and competitions
Classical approach: divide and conquer Conceptual evaluation functionGlobal move generationCombinatorial Game Theory
New approach: Monte-Carlo Tree Search (MCTS)Simple approach: depth-1 Monte-CarloMCTSUCT
Adaptations of UCT 9x9 boardsScaling up to 19x19 boardsParallelization
Future of Computer Go
![Page 106: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/106.jpg)
CIG07 Hawaii Honolulu106
Current results
9x9 Go: the best programs on CGOS and KGS are MCTS based
MoGo (Wang & Gelly), CrazyStone (Coulom),Valkyria (Persson), AntGo (Hillis), Indigo (Bouzy)NeuroGo (Enzenberger) is the exception
13x13 Go: ? medium interestMoGo, GNU GoOld-fashioned programs does not play
19x19 Go: the best programs are still old-fashioned
Old-fashioned go programs, GNU GoMoGo is catching up (regular successes on KGS)
![Page 107: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/107.jpg)
CIG07 Hawaii Honolulu107
Perspectives on 19x19 (1/2)
To what extent MCTS programs may surpass old-fashioned program ?
Are old-fashioned go programs all old-fashioned ?Go++ is one of the best programIs Go++ Old-fashioned or MCTS based ?
Can old-fashioned programs improve in the near future ?
Is MoGo strength mainly due to MCTS approach or to the skill of their authors ?
9x9 CGOS: MoGo is far ahead the other MCTS programs
![Page 108: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/108.jpg)
CIG07 Hawaii Honolulu108
Perspectives on 19x19 (2/2)
To what extent MCTS programs may surpass old-fashioned program ?
Is the break-down approach mandatory for scaling up MCTS up to 19x19 ?
-> rather NO
The parallelization question: may we easily distribute MCTS over a network ?
-> rather YES
![Page 109: Old-fashioned Computer Go vs Monte-Carlo Go · 2007. 7. 12. · Black answers, aiming at capturing the invading stone. CIG07 Hawaii Honolulu 11 ... Connections, Dividers, Eyes, Life](https://reader033.fdocuments.net/reader033/viewer/2022060920/60ac71c485571d6c6900ea4d/html5/thumbnails/109.jpg)
CIG07 Hawaii Honolulu109
Thank you for your attention...
http://cgos.boardspace.net/
http://www.math-info.univ-paris5.fr/~bouzy/
http://www.reiss.demon.co.uk/webgo/compgo.htm
http://computer-go.softopia.or.jp/gifu2006/
http://www.lri.fr/~gelly/MoGo.htm
http://www.cs.ualberta.ca/~emarkus/compgo_biblio/
http://remi.coulom.free.fr/CrazyStone/
My page
Go4++
Crazy Stone
Mogo
Gifu Challenge
Computer Olympiads
On line computer go bibliography
CGOS
http://www.cs.unimaas.nl/Olympiad2006/