Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.
-
Upload
ursula-gregory -
Category
Documents
-
view
218 -
download
0
Transcript of Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.
Game Playing
ECE457 Applied Artificial IntelligenceSpring 2007 Lecture #5
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2
Outline Types of games Playing a perfect game
Minimax search Alpha-beta pruning
Playing an imperfect game Real-time Imperfect information Chance
Russell & Norvig, chapter 6
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3
Game Problems Games are well-defined search problems…
Well-defined board configurations (states) Limited set of well-defined moves (actions) Well-defined victory conditions (goal) Values assigned to pieces, moves, outcomes
(cost) …that are hard to solve by searching
A search tree for chess has an average branching factor of 35
An average chess game lasts for 50 moves per player (ply)
The average search tree has 35100 nodes!
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4
Game Problems The opponent
He wants to win and make our agent lose
We have no control over his actions He prevents us from reaching the
optimal solution Introduces uncertainty in the search
We don’t know what moves the opponent will do
We will assume “perfect play” behaviour
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5
Types of Games
Deterministic Chance
Perfect informatio
n
ChessCheckers
Go
Backgammon
Monopoly
Imperfect informatio
n
StrategoBattleship
BridgePoker
Scrabble
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6
Game-Playing Strategy Our agent and the opponent play
sequentially We assume the opponent plays
perfectly Our agent cannot get to the
optimal goal The opponent won’t allow it
Our agent must find the best achievable goal
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7
Minimax Algorithm Payoff (utility) function assigns a value
to each leaf node in the tree Value then propagates up to non-leaf nodes
Two players MAX wants to maximise payoff MIN wants to minimise payoff MAX is the player currently looking for a
move (i.e. at root of tree) Payoff function
Simple 1 = win / 0 = draw / -1 = lose Complex for different victory conditions Win/lose for MAX
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8
Minimax Algorithm
X XX
X
OX O X O
X O X X O
X
X O
X
…
…
…
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9
Minimax Algorithm
MAX
MIN
MAX 3 18 5 56 1 15 42
-12
-5
3 1 -12
3
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10
Minimax Algorithm Game of Nim
Initial state: 7 matches in a pile Each player must divide a pile into two
non-empty unequal piles Player who can’t do that, loses
Payoff +1 win, -1 loss
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11+1 (max wins)
Minimax Algorithm7
4-35-26-1
3-3-13-2-24-2-15-1-1
2-2-2-13-2-1-14-1-1-1
2-2-1-1-13-1-1-1-1
2-1-1-1-1-1
MAX
MIN
MAX
MIN
MAX
MIN
-1 (max loses)
+1 (max wins)
+1
+1
+1
-1
-1-1
-1
+1
-1-1-1
The value of each node is the value of the best leaf the current player (MAX or MIN) can reach.
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12
Minimax Algorithm Generate entire game tree Compute payoff of leaf nodes For each non-leaf node, from the
lowest in the tree to the root If MAX level, then assign value of the
child with maximum payoff If MAX level, then assign value of the
child with minimum payoff At the root, select action with
maximum payoff
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13
Minimax Algorithm Complete, if tree is finite Optimal against a perfect
opponent Time complexity = O(bm) Space complexity = O(bm) But remember, b and m can be
huge For chess, b ≈ 35 and m ≈ 100
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14
Alpha-Beta Pruning MAX take the max of its children
MIN gives each child the min of its children
max(min(3,18,5),min(1,15,42),min(56,-12,-5))
We don’t need to compute the values of all the grandchildren! Only until we find a value lower than
the highest child’s value
max(min(3,18,5),min(1,?,?),min(56,-12,?))
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15
Alpha-Beta Pruning Maintain values and
is the maximum value that MAX is assured of at any point in the search
is the minimum value that MIN is assured of at any point in the search
Both computed using payoff propagated through the tree
Start with = - and = As the search goes on, the number of
possible values of and decreases When
Current path is not the result of best play by both players, so no need to explore further
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16
Alpha-Beta Pruning
MAX
MIN
MAX 3 18 5 56 1 -12
3 1 -12
3
X X X
[-, ] [-, ]
[-, ]
[-, 3][3, 3]
[-, ]
[3, ]
[-, 1]
[-, 56][-, -12]
[3, 3][3, 56]
[, ]
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17
Alpha-Beta Pruning Called as “rootvalue = Evaluate(root, -, )”Evaluate(node, , ) If node is leaf
Return payoff If node is MAX
v = - For each child of node
v = max( v, Evaluate(child, , ) Break if v = max(, v)
Return v If node is MIN
v = For each child of node
v = min( v, Evaluate(child, , ) ) Break if v = min(, v)
Return v
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18
Alpha-Beta Pruning Efficiency dependant on ordering of
children Will check each of MAX’s children until finding
one with a value higher than beta Will check each of MIN’s children until finding
one with a value lower than alpha Use heuristics to order the nodes to check
Check the highest-value children first for MAX Check the lowest-value children first for MIN
Good ordering can reduce time complexity to O(bd/2) Random ordering gives roughly O(b3d/4) Minimax is O(bd)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19
Imperfect Play Real-time or time constraints Chance Hidden information
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20
Real-Time Games Sometimes we can’t search the
entire tree Real-time games Time constraints (playing against a
clock) Tree too big (e.g. chess)
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21
Real-Time Games Evaluation function
Estimate value of a non-leaf node in the tree
Cut off search at a given level
Chess: count value of pieces, available moves, board configurations, …
X
X
O
X
O X
<
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22
Real-Time Minimax Algorithm Generate entire game tree down to
maximum number of ply Evaluate lowest nodes For each non-leaf node, from the lowest
in the tree to the root If MAX level, then assign value of the child
with maximum payoff If MAX level, then assign value of the child
with minimum payoff At the root, select action with maximum
payoff
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23
Real-Time Alpha-Beta Pruning Called as “rootvalue = Evaluate(root, -, )”Evaluate(node, , ) If node is at lowest level
Return evaluation If node is MAX
v = - For each child of node
v = max( v, Evaluate(child, , ) Break if v = max(, v)
Return v If node is MIN
v = For each child of node
v = min( v, Evaluate(child, , ) ) Break if v = min(, v)
Return v
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24
Real-Time Games: Problems Non-quiescent positions
Some board configurations cause value to change wildly
Solved with quiescence search Expand non-quiescent boards deeper, until you
reach stable “quiescent” boards Horizon effect
A “singular” move is considerably better than all others
But a damaging unavoidable move is (or can be pushed) just beyond the search depth limit (the “horizon”)
Solved with singular extension Expand singular state deeper
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25
Games of Chance Minimax requires planning for
upcoming moves If moves depend on dice rolls,
random draws, etc., planning is impossible
We need to add all possible outcomes in the tree!
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26
Recall
3 18 5 56 1 15 42
-12
-5
3 1 -12
3
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27
Expectiminimax
0.80.15
0.05
0.80.15
0.05
0.80.15
0.05
163 -7 1 25 -8 -12 -25 58
4.45 4.15 -10.45
4.45MAX has already rolled the dice and has three possible moves
Then, MIN rolls the dice
And MIN picks an action based on the roll result
There are three possible outcomes to the roll
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28
Problems with Expectiminimax
0.80.15
0.05
0.80.15
0.05
0.80.15
0.05
163 -7 1 25 -8 -12 -25 800
4.45 4.15 26.65
26.65
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29
Problems with Expectiminimax Time complexity: O(bmnm)
n is the number of possible outcomes of a chance node
Recall: minimax is O(bm) Trees can grow very large very
quickly Minimax & pruning limits search to likely
sequences of actions given perfect play With randomness, there is no likely
sequence of actions
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30
Imperfect Information Algorithms so far require knowing
everything about the game In some games, information about the
opponent is hidden Cards in poker, pieces in Stratego, etc.
We could approximate hidden information to random events The probability that the opponent has a
flush, the probability that a piece is a bomb, etc.
Then use expectiminimax to get best action
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31
Imperfect Information List all possible outcomes, then
average best action overall Can lead to irrational behaviour! Possible cases:
Road 1 leads to money, road 2-a leads to gold, road 2-b leads to death (rational action is road 2, then a)
Road 1 leads to money, road 2-a leads to death, road 2-b leads to gold (rational action is road 2, then b)
But the real situation is: Road 1 leads to money, road 2 leads
to gold or death (rational action is road 1)
1 2
a b
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32
Imperfect Information It’s a useful approximation, but it’s not
exact! Hidden information not the same as random
events Need to handle information
Gather information Plan based on what information we will have at
a given point in the future Leads to more rational behaviour
Acting to gain information Acting to give information to partners Acting to conceal information from the
opponents We will learn to do that later in the course
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33
IBM Deep Blue First chess computer to defeat
a reigning world champion (Garry Kasparov) under normal chess tournament constraints in 1997
Relied on brute hardware search power 30 processors for the search 480 custom VLSI chess
processors for move generation and ordering, and leaf node evaluation
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34
IBM Deep Blue Searched a minimax tree
100-200M states per second, maximum 330M Average 6 to 16 ply, maximum 40 ply Decide which moves are worth expanding,
giving priority to singular expansion and chess threats
Null-window alpha-beta pruning Alpha-beta pruning but limited to a “window”
of moves rather than the entire tree Faster and easier to implement on hardware Approximate, can only returns bounds on the
minimax value Allows for a highly non-uniform, more
selective and human-like search of the tree
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35
IBM Deep Blue Two board evaluation heuristics Fast evaluation to get a quick
approximate value Considers piece position value
Slow evaluation to get an exact value Considers 8,000 features Includes common chess concepts and
specific Kasparov strategies Features have programmable weights
learned automatically from 700,000 grandmaster games and fine-tuned manually by a chess grandmaster
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36
Assumptions Utility-based agent Environment
Fully observable Deterministic Sequential Static Discrete / Continuous Single agent
ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 37
Assumptions Updated Utility-based agent Environment
Fully observable / Partially observable (approximation)
Deterministic / Strategic / Stochastic Sequential Static / Semi-dynamic Discrete / Continuous Single agent / Multi-agent