An Introduction to Dynamic Games

8/3/2019 An Introduction to Dynamic Games

1/105

An Introduction to Dynamic Games

A. Haurie

J. B. Krawczyk


2/105


3/105

Contents

Chapter I. Foreword 5I.1. What are dynamic games? 5I.2. Origins of this book 5I.3. What is different in this presentation 6

Part 1. Foundations of Classical Game Theory 7

Chapter II. Elements of Classical Game Theory 9II.1. Basic concepts of game theory 9

II.2. Games in extensive form 10II.3. Additional concepts about information 15II.4. Games in normal form 17II.5. Exercises 20

Chapter III. Solution Concepts for Noncooperative Games 23III.1. Introduction 23III.2. Matrix games 24III.3. Bimatrix games 32III.4. Concave m-person games 38III.5. Correlated equilibria 45

III.6. Bayesian equilibrium with incomplete information 49III.7. Appendix on Kakutani fixed-point theorem 53III.8. Exercises 53

Chapter IV. Cournot and Network Equilibria 57IV.1. Cournot equilibrium 57IV.2. Flows on networks 61IV.3. Optimization and equilibria on networks 62IV.4. A convergence result 69

Part 2. Repeated and sequential Games 73

Chapter V. Repeated Games and Memory Strategies 75V.1. Repeating a game in normal form 76V.2. Folk theorem 79V.3. C ollusive equilibrium in a repeated Cournot game 82V.4. Exercises 85

Chapter VI. Shapleys Zero Sum Markov Game 87VI.1. Process and rewards dynamics 87

3


4/105

4 CONTENTS

VI.2. Information structure and strategies 87VI.3. Shapleys-Denardo operator formalism 89

Chapter VII. Nonzero-sum Markov and Sequential Games 93VII.1. Sequential games with discrete state and action sets 93VII.2. Sequential games on Borel spaces 95

VII.3. Application to a stochastic duopoloy model 96Index 101

Bibliography 103


5/105

CHAPTER I

Foreword

I.1. What are dynamic games ?

Dynamic Games are mathematical models of the interaction between indepen-dent agents who are controlling a dynamical system. Such situations occur in militaryconflicts (e.g., duel between a bomber and a jet fighter), economic competition (e.g.,investments in R&D for computer companies), parlor games (chess, bridge). Theseexamples concern dynamical systems since the actions of the agents (also called play-

ers) influence the evolution over time of the state of a system (position and velocityof aircraft, capital of know-how for Hi-Tech firms, positions of remaining pieces ona chess board, etc). The difficulty in deciding what should be the behavior of theseagents stems from the fact that each action an agent takes at a given time will influencethe reaction of the opponent(s) at later time. These notes are intended to present thebasic concepts and models which have been proposed in the burgeoning literature ongame theory for a representation of these dynamic interactions.

I.2. Origins of this book

These notes are based on several courses on Dynamic Games taught by the au-thors, in different universities or summer schools, to a variety of students in engineer-ing, economics and management science. The notes use also some documents preparedin cooperation with other authors, in particular B. Tolwinski [ 63] and D. Carlson.

These notes are written for control engineers, economists or management scien-tists interested in the analysis of multi-agent optimization problems, with a particularemphasis on the modeling of competitive economic situations. The level of mathemat-ics involved in the presentation will not go beyond what is expected to be known bya student specializing in control engineering, quantitative economics or managementscience. These notes are aimed at last-year undergraduate, first year graduate students.

The Control engineers will certainly observe that we present dynamic games as anextension of optimal whereas economists will see also that dynamic games are onlya particular aspect of the classical theory of games which is considered to have beenlaunched by J. Von Neumann and O. Morgenstern in the 40s1. The economic models

1The book [66] is an important milestone in the history of Game Theory.

5


6/105

6 I. FOREWORD

of imperfect competition that we shall repeatedly use as motivating examples, have amore ancient origin since they are all variations on the original Cournot model [10],proposed in the mid 18-th century . An interesting domain of application of dynamicgames, which is described in these notes, relates to environmental management. Theconflict situations occurring in fisheries exploitation by multiple agents or in policy co-ordination for achieving global environmental (e.g., in the control of a possible global

warming effect) are well captured in the realm of this theory.

The objects studied in this book will be dynamic. The term dynamic comes fromGreek [powerful], [power], [strength] and meansof or pertaining to force producing motion2. In an every day context, dynamic isan attribute of a phenomenon that undergoes a time-evolution. So, in broad terms,dynamic systems are such that change in time. They may evolve endogenously likeeconomies or populations, or change their position and velocity like a car. In thefirst part of these notes, the dynamic models presented are discrete time. This meansthat the mathematical description of the dynamics uses difference equations, in thede terministic context and discrete time Markov processes in the stochastic one. In

a second part of these notes, the models will use a continuous time paradigm wherethe mathematical tools representing dynamics are differential equations and diffusionprocesses.

Therefore the first part of the notes should be accessible, and attractive, to stu-dents who have not done advanced mathematics. However, the second part involvessome developments which have been written for readers with a stronger mathematicalbackground.

I.3. What is different in this presentation

A course on Dynamic Games, accessible to both control engineering and econom-ics or management science students, requires a specialized textbook. Since we empha-size the detailed description of the dynamics of some specific systems controlled by theplayers we have to present rather sophisticated mathematical notions, related to theory.This presentation of the dynamics must also be accompanied by an introduction to thespecific mathematical concepts of game theory. The originality of our approach is inthe mixing of these two branches of applied mathematics.

There are many good books on classical game theory. A nonexhaustive list in-

cludes [47], [58], [59], [3], and more recently [22], [19] and [40]. However, they donot introduce the reader to the most general dynamic games. There is a classic book[4] that covers extensively the dynamic game paradigms, however, readers without astrong mathematical background will probably find that book difficult. This text istherefore a modest attempt to bridge the gap.

2Interestingly, dynasty comes from the same root. See Oxford English Dictionary.


7/105

Part 1

Foundations of Classical Game Theory


8/105


9/105

CHAPTER II

Elements of Classical Game Theory

Dynamic games constitute a subclass of mathematical models studied in what isusually called game theory. It is therefore proper to start our exposition with thosebasic concepts of classical game theory which provide the fundamental tread of thetheory of dynamic games. For an exhaustive treatment of most of the definitions ofclassical game theory see e.g., [47], [58], [22], [19] and [40].

II.1. Basic concepts of game theory

In a game we deal with many concepts that relate to the interactions betweenagents. Below we provide a short and incomplete list of those concepts that will befurther discussed and explained in this chapter.

Players. They compete in the game. A player1 can be an individual, a set ofindividuals (or a team, a corporation, a political party, a nation, a pilot of anaircraft, a captain of a submarine, etc. )

A move or a decision is a players action. In the terminology of control the-

ory2

, a move is the implementation of a players control. Information. Games will be said to have an information structure or pattern

depending on what the players know about the game and its history whenthey decide their moves. The information structure can vary considerably.For some games, the players do not remember what their own and opponentsactions have been. In other games, the players can remember the currentstate of the game (a concept to be elucidated later) but not the history ofmoves that led to this state. In other cases, some players may not know whoare the competitors and even what are the rules of the game (an imperfect andincomplete information for sure). Finally there are games where each player

1Political correctness promotes the usage of gender inclusive pronouns they and their. How-

ever, in games, we will frequently have to address an individual players action and distinguish it from a

collective action taken by a set of several players. As far as we know, in English, this distinction is only

possible through usage of the traditional grammar gender exclusive pronouns: possessive his, her

and personal he, she. In this book, to avoid confusion, we will refer to a singular genderless agent

as he and to the agents possessions as his.2We refer to control theory, since, as said earlier, dynamic games can be viewed as a mixture of

control and game paradigms.

9


10/105

10 II. ELEMENTS OF CLASSICAL GAME THEORY

has a perfect and complete information i.e., everything concerning the gameand its history is known to each player.

A players pure strategy is a rule that associates a players move with the in-formation available to him at the time when he decides which move to choose.

A players mixed strategy is a probability measure on the space of his purestrategies. We can also view a mixed strategy as a random draw of a pure

strategy. A players behavioral strategy is a rule which defines a random draw of the

admissible move as a function of the information available3. These strategiesare intimately linked with mixed strategies and it has been proved early [33]that the two concepts coincide for many games.

Payoffs are real numbers measuring desirability of the possible outcomes ofthe game e.g., the amounts of money the players may win or loose. Othernames of payoffs can be: rewards, performance indices or criteria, utilitymeasures, etc.

The above list refers to elements of games in relatively imprecise common languageterms. More rigorous definitions can be given, for the above notions. For this weformulate a game in the realm of decision analysis where decision trees give a repre-sentation of the dependence of outcomes on actions and uncertainties. This will bedone in the next section.

II.2. Games in extensive form

A game in extensive form is defined on a graph. A graph is a set of nodes connectedby arcs as shown in Figure II.1. In a game tree, the nodes indicate game positions that

dB

rrrrrj

E

ddd

FIGURE II.1 . Node and arcs

correspond to precise histories of play. The arcs correspond to the possible actionsof the player who has the right to move in a given position. To be meaningful, thegraph representing a game must have the structure of a tree. This is a graph where allnodes are connected but there are no cycles. In a tree there is a single node withoutparent, called the root and a set of nodes without descendants, the leaves. Thereis always a single path from the root to any leaf. The tree represents a sequence of

3A similar concept has been introduced in control theory under the name of relaxed controls.


11/105

II.2. GAMES IN EXTENSIVE FORM 11

actions and random perturbations which influence the outcome of a game played by aset of players.

d!

Eeeeeeeeeeed

B

rrrrrj

E

dB

rrrrrj

E

d

B

rrrrrj

E

ddd

ddd

dd

d

FIGURE II.2. A tree

II.2.1. Description of moves, information and randomness. A game in exten-sive form is described by a set of players that includes a particular player called Naturewhich always plays randomly. A set of positions of the game correspond to the nodes

on the tree. There is a unique move history leading from the root node to each gameposition represented by a node. At each node one particular player has the right tomove i.e., he has to select a possible action from an admissible set represented by thearcs emanating from the node, see Figure II.2.

The information that each player disposes of at the nodes where he has to selectan action defines the information structure of the game. In general, the player may not

know exactly at which node of the tree the game is currently located. More exactly, hisinformation is of the following form:

he knows that the current position of the game is within a given sub-

set of nodes; however, he does not know, which specific node it is.

This situation will be represented in a game tree as follows:


12/105


d B

rrrrrj

E

d B

rrrrrj

E

d B

rrrrrj

E

pppppppppppppppp

FIGURE II.3. Information set

In Figure II.3 a set of nodes is linked by a dotted line. This will be used to denotean information set. Notice that there is the same number of arcs emanating from eachnode of the set. The player selects an arc, knowing that the node is in the informationset but ignoring which particular node in this set has been reached.

When a player selects a move, this corresponds to selecting an arc of the graphwhich defines a transition to a new node, where another player has to select his move,

etc. Among the players, Nature is playing randomly i.e., Natures moves are selectedat random.

The game has a stopping rule described by terminal nodes of the tree (the leaves).Then the players are paid their rewards, also called payoffs.

Figure II.4 shows the extensive form of a two-player one-stage game with simulta-neous moves and a random intervention of Nature. We also say that this game has thesimultaneous move information structure. It corresponds to a situation where Player 2does not know which action has been selected by Player 1 and vice versa. In this fig-ure the node marked D1 corresponds to the move of Player 1, the nodes marked D2correspond to the move of Player 2.

The information of the second player is represented by the doted line linking thenodes. It says that Player 2 does not know what action has been chosen by Player 1.The nodes marked E correspond to Natures move. In that particular case we assumethat three possible elementary events are equiprobable. The nodes represented by darkcircles are the terminal nodes where the game stops and the payoffs are collected.


13/105

II.2. GAMES IN EXTENSIVE FORM 13

P1 !

a11

ee

eeeeeeee

a21

ppppppppppppppppppppP2

P2

a12

ddddd

a22

a12

ddddd

a22

E

B1/3

rrrr

rj

E

1/3

ss

s

[payoffs]

[payoffs]

[payoffs]

E

B1/3

rrrrrj

E

1/3

sss

[payoffs]

[payoffs]

[payoffs]

E

B1/3

rrrrrj

E

1/3

sss

[payoffs]

[payoffs]

[payoffs]

E

B1/3

rrrrrj

E

1/3

ss

s

[payoffs]

[payoffs]

[payoffs]

FIGURE II. 4. A game in extensive form

This representation of games is inspired from parlor games like chess, poker,bridge, etc. , which can be, at least theoretically, correctly described in this frame-work. In such a context, the randomness of Nature s play is the representation of cardor dice draws realized in the course of the game.

Extensive form provides indeed a very detailed description of the game. We stressthat if the player knows the node the game is located at, he knows not only the currentstate of the game but he also remembers the entire game history.

However, extensive form is rather non practical to analyze even simple games be-cause the size of the tree increases very fast with the number of steps. An attempt toprovide a complete description of a complex game like bridge , using extensive form,


14/105


would lead to a combinatorial explosion. Nevertheless extensive form is useful in con-ceptualizing the dynamic structure of a game. The ordering of the sequence of moves,highlighted by extensive form, is present in most games.

There is another drawback of the extensive form description. To be represented asnodes and arcs, the histories and actions have to be finite or enumerable. Yet in many

models we want to deal with actions and histories that are continuous variables. Forsuch models, we need different methods for problem description. Dynamic games willprovide us with such methods. As extensive forms, dynamic games theory is about se-quencing of actions and reactions. In dynamic games, however, different mathematicaltools are used for the representation of the game dynamics. In particular, differentialand/or difference equations are utilized to represent dynamic processes with continu-ous state and action spaces. To a certain extent, dynamic games do not suffer frommany of extensive form deficiencies.

II.2.2. Comparing random perspectives. Due to Natures randomness, the play-ers will have to compare, and choose from, different random perspectives in their de-cision making. The fundamental decision structure is described in Figure II.5. If theplayer chooses action a1 he faces a random perspective of expected value 100. If hechooses action a2 he faces a sure gain of 100. If the player is risk neutral he will beindifferent between the two actions. If he is risk averse he will choose action a2, if heis a risk lover he will choose action a1. In order to represent the attitude toward risk ofa decision maker Von Neumann and Morgenstern introduced the concept of cardinalutility [66]. If one accepts the axioms4 of utility theory then a rational player shouldtake the action which leads toward the random perspective with the highest expectedutility . This will be called the principle of maximization of expected utility.

4There are several classical axioms (see e.g., [40]) formalizing the properties of a rational agents

utility function. To avoid the introduction of too many new symbols some of the axioms will be formu-

lated in colloquial rather than mathematical form.

(1) Completeness. Between two utility measures u1 and u2 either u1 u2 or u1 u2(2) Transitivity. Between three utility measures u1, u2 and u3 if u1 u2 and u2 u3 then

u1 u3.(3) Relevance. Only the possible actions are relevant to the decision maker.

(4) Monotonicity or a higher probability outcome is always better. I.e., ifu1 > u2 and 0 < 1, then u1 + (1 )u2 > u1 + (1 )u2.

(5) Continuity. Ifu1 > u2 and u2 > u3, then there exists a number such that 0 1, andu2 = u1 + (1 )u3.

(6) Substitution (several axioms). Suppose the decision maker has to choose between two alter-native actions to be taken after one of two possible events occurs. If in each event he prefers

the first action then he must also prefer it before he learns which event has occurred.

(7) Interest. There is always something of interest that can happen.

If the above axioms are jointly satisfied, the existence of a utility function function is guaranteed. In

the rest of this book we will assume that this is the case and that agents are endowed with such utility

functions (referred to as VNM utility functions), which they maximize. It is in this sense that the subjects

treated in this book are rational agents.


15/105

II.3. ADDITIONAL CONCEPTS ABOUT INFORMATION 15

D

a1

dddd

ddddd

a2

Ee.v.=100

1/3

ddddddddd

1/3E

1/3

100

0

100

200

FIGURE II. 5. Decision in uncertainty

This solves the problem of comparing random perspectives. However this also in-troduces a new way to play the game. A player can set up a random experiment in

order to generate his decision. Since he uses utility functions the principle of maxi-mization of expected utility permits him to compare deterministic action choices withrandom ones.

As a final reminder of the foundations of utility theory let us recall that the VonNeumann-Morgenstern utility function is defined up to an affine transformation5 of re-wards. This says that the player choices will not be affected if the utilities are modifiedthrough an affine transformation.

II.3. Additional concepts about information

What is known by the players who interact in a game is of paramount importancefor what can be considered a solution to the game. Here, we refer briefly to the con-cepts ofcomplete and perfect information and other types of information patterns.

5An affine transformation is of the form y = a + bx.


16/105


II.3.1. Complete and perfect information. The information structure of a gameindicates what is known by each player at the time the game starts and at each of hismoves.

Complete vs incomplete information. Let us consider first the information availableto the players when they enter a game play. A player has complete information if he

knows

who the players are the set of actions available to all players all player possible outcomes.

A game with common knowledge is a game where all players have complete infor-mation and all players know that the other players have complete information. Thissituation is sometimes called the symmetric information case.

Perfect vs imperfect information. We consider now the information available to aplayer when he decides about specific move. In a game defined in extensive form, ifeach information set consists of just one node, then we say that the players have perfectinformation. If this is not the case the game is one ofimperfect information.

EXAMPLE II.3.1. A game with simultaneous moves as e.g., the one shown in Fig-ure II.4, is of imperfect information.

II.3.2. Perfect recall. If the information structure is such that a player can always

remember all past moves he has selected, and the information he has obtained fromand about the other players, then the game is one ofperfect recall. Otherwise it is oneofimperfect recall.

II.3.3. Commitment. A commitment is an action taken by a player that is bindingon him and that is known to the other players. In making a commitment a player canpersuade the other players to take actions that are favorable to him. To be effectivecommitments have to be credible. A particular class of commitments are threats.

II.3.4. Binding agreement. Binding agreements are restrictions on the possibleactions decided by two or more players, with a binding contract that forces the imple-mentation of the agreement. Usually, to be binding an agreement requires an outsideauthority that can monitor the agreement at no cost and impose on violators sanctionsso severe that cheating is prevented.


17/105

II.4. GAMES IN NORMAL FORM 17

II.4. Games in normal form

II.4.1. Playing games through strategies. Let M = {1, . . . , m} be the set ofplayers. Apure strategy j for Playerj is a mapping, which transforms the informationavailable to Player j at a decision node i.e., a position of the game where he is making

a move, into his set of admissible actions. We call strategy vector the m-tuple =()j=1,...m. Once a strategy is selected by each player, the strategy vector is definedand the game is played as if it were controlled by an automaton6.

An outcome expressed in terms of utility to Player j, j M is associated withall-player strategy vector . If we denote by j the set of strategies for Player j thenthe game can be represented by the m mappings

Vj : 1 j m IR, j M

that associate a unique (expected utility) outcome Vj() for each player j M witha given strategy vector in 1 j m. One then says that the game is

defined in normal or strategic form.

II.4.2. From extensive form to strategic or normal form. We consider a simpletwo-player game, called matching pennies7. The rules of the game are as follows:

The game is played over two stages. At first stage each player chooseshead (H) or tail (T) without knowing the other players choice. Thenthey reveal their choices to one another. If the coins do not match,Player 1 wins $5 and Payer 2 wins -$5. If the coins match, Player 2wins $5 and Payer 1 wins -$5. At the second stage, the player who

lost at stage 1 has the choice of either stopping the game (Q - quit)or playing another penny matching with the same type of payoffs asin the first stage. So, his second stage choices are (Q, H, T).

This game is represented in extensive form in Figure II.6. A dotted line connects thedifferent nodes forming an information set for a player. The player who has the moveis indicated on top of the graph.

In Table II.1 we have identified the 12 different strategies that can be used by eachof each of the two players in the game of matching pennies. Each player moves twice.In the first move the players have no information; in the second move they know what

have been the choices made at first stage.

In this table, each line describes the possible actions of each player, given theinformation available to this player. For example, line 1 tells that Player 1 having

6The idea of playing games through the use of automata will be discussed in more details when we

present the folk theorem for repeated games in Part 2.7This example is borrowed from [22].


18/105


t0

t

ttttttt

t

tppppppppppppppppppppppppppppppqQ

tt

B

zts

B

$$$$$$

$$$$X

ttpppp$$$

$$$$$$

$X

Ett$$$$$$

$$$$X

zspppptt EI

Eq

I

rrrrrjt

t$$$$$$$$$$Xzsppppt Eq

t EI

Ezspppptt

Ezrrrrrrr

rrrj

z

-10,10

0,0

0,0

-10,10-5,50,0

10,-10

10,-10

0,05,-5

0,0

10,-10

10,-10

0,0

5,-5

-10,10

0,0

0,0

-10,10

-5,5

H

T

H

T

H

T

Q

H

T

Q

H

T

H

T

H

T

HT

H

T

H

T

H

T

H

T

H

T

Q

H

T

Q

H

T

P1 P2 P1 P2 P1

FIGURE II. 6. The extensive form tree of the matching pennies game

played H in round 1 and Player 2 having played H in round 1, Player 1, knowingthis information would play Q at round 2. If Player 1 has played H in round 1 andPlayer 2 has played T in round 1 then Player 1, knowing this information would play

H at round 2. Clearly this describes a possible course of action for Player 1.

The 12 possible courses of actions are listed in Table II.1. The situation is similar(actually, symmetrical) for Player 2.

Table II.2 represents the payoff matrix obtained by Player 1 when both playerschoose one of the 12 possible strategies.


19/105

II.4. GAMES IN NORMAL FORM 19

Strategies of Player 1 Strategies of Player 2

1st scnd move 1st scnd movemove if Player 2 move if Player 1

has played has played

H T H T

1 H Q H H H Q2 H Q T H T Q3 H H H H H H4 H H T H T H5 H T H H H T6 H T T H T T7 T H Q T Q H8 T T Q T Q T9 T H H T H H

10 T T H T H T11 T H T T T H12 T T T T T T

TABLE II.1. List of strategies

1 2 3 4 5 6 7 8 9 10 11 12

1 -5 -5 -5 -5 -5 -5 5 5 0 0 10 10

2 -5 -5 -5 -5 -5 -5 5 5 10 10 0 0

3 -10 0 -10 0 -10 0 5 5 0 0 10 10

4 -10 0 -10 0 -10 0 5 5 10 10 0 0

5 0 -10 0 -10 0 -10 5 5 0 0 10 10

6 0 -10 0 -10 0 -10 5 5 10 10 0 0

7 5 5 0 0 10 10 -5 -5 -5 -5 -5 -5

8 5 5 10 10 0 0 -5 -5 -5 -5 -5 -5

9 5 5 0 0 10 10 -10 0 -10 0 -10 0

10 5 5 10 10 0 0 -10 0 -10 0 -10 0

11 5 5 0 0 10 10 0 -10 0 -10 0 -10

12 5 5 10 10 0 0 0 -10 0 -10 0 -10

TABLE II.2. Payoff matrix for Player 1

So, the extensive form game given in Figure II.6 and played with the strategiesindicated above, has now been represented through a 12 12 payoff matrix to Player 1(Table II.2). Since what is gained by Player 1 is lost by Player 2 we say this is a zero-sum game. Hence there is no need, here, to repeat the payoff matrix construction for

Player 2 since it is the negative of the previous one. In a more general situation i.e.,a nonzero-sum game, a specific payoff matrix will be constructed for each player.The two payoff matrices will be of the same dimensions where the number of rows andcolumns correspond to the number of strategies available to Player 1 and 2 respectively.


20/105


II.4.3. Mixed and behavior strategies.

II.4.3.1. Mixing strategies. Since a player evaluates outcomes according to hisVNM-utility functions (remember footnote II.2.2, page 14) he can envision to mixstrategies by selecting one of them randomly, according to a lottery that he will define.This introduces one supplementary chance move in the game description.

For example, if Player j has p pure strategies jk , k = 1, . . . , p he can select thestrategy he will play through a lottery which gives a probability xjk to the pure strategyjk , k = 1, . . . , p. Now the possible choices of action by Player j are elements of theset of all the probability distributions

Xj =

xj = (xjk)k=1,...,p|xjk 0,

pk=1

xjk = 1.

We note that the set Xj is compact and convex in IRp. This is important for proving

existence of solutions to these games (see Chapter III).

II.4.3.2. Behavior strategies. A behavior strategy is defined as a mapping whichassociates, with the information available to Player j at a decision node where he ismaking a move, a probability distribution over his set of actions.

The difference between mixed and behavior strategies is subtle. In a mixed strat-egy, the player considers the set of possible strategies and picks one, at random, accord-ing to a carefully designed lottery. In a behavior strategy the player designs a strategythat consists of deciding at each decision node, according to a carefully designed lot-tery. However, this design is contingent upon the information available at this node. Insummary, we can say that a behavior strategy is a strategy that includes randomness at

each decision node. A famous theorem [33], which we give without proof, establishesthat these two ways of introducing randomness in the choice of actions are equivalentin a large class of games.

THEOREM II.4.1. In an extensive game of perfect recall all mixed strategies canbe represented as behavior strategies.

II.5. Exercises

II.5.1. A game is given in extensive form in Figure II.7. Present the game innormal form.


21/105

II.5. EXERCISES 21

P1

P2P2

1,-1 2,-2 3,-3 4,-7 5,-5 6,-6

1 2

1 2 3 1 2 3

FIGURE II. 7. A game in extensive form.

II.5.2. Consider the payoff matrix given in Table II.2. If you know that it rep-resents a payoff matrix of a two player game, can you create a unique game tree andformulate a unique corresponding extensive form of the game?


22/105


23/105

CHAPTER III

Solution Concepts for Noncooperative Games

III.1. Introduction

To speak of a solution concept for a game one needs to deal with the game de-scribed in strategic or normal form. A solution to an m-player game will thus be aset of strategy vectors that have attractive properties expressed in terms of payoffsreceived by the players.

It should be clear from Chapter II that a game theory problem can admit differentsolutions depending on how the game is defined and, in particular, on what informa-tion the players dispose of. In this chapter we propose and discuss different solutionconcepts for games described in normal form. We shall be mainly interested in nonco-operative games i.e., situations where the players select their strategies independently.

Recall that an m-person game in normal form is defined by the following data{M, (j), (Vj) for j M} where M is the set of players, M = {1, 2,...,m}. Foreach player j M, j is the set of strategies (also called the strategy space ). SymbolVj , j M, denotes the payoff function that assigns a real number Vj() to a strategyvector 1 2 m. We shall study different classes of games in normal

form.

The first category is constituted by the so-called two-player zero-sum matrix gamesthat describe conflict situations where there are two players and each of them has afinite choice of pure strategies. Moreover, what one player gains the other playerlooses, which explains why the games are called zero-sum.

The second category also consists of two player games, again with a finite purestrategy set for each player, but then the payoffs are not zero-sum. These are thenonzero-sum matrix games or bimatrix games.

The third category are concave games where the number of players can be morethan two and the assumption of the action space finiteness is dropped. This categoryencompasses the previous classes of matrix and bimatrix games. We will be able toprove for concave games nice existence, uniqueness and stability results for a nonco-operative game solution concept called equilibrium.

23


24/105

24 III. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES

III.2. Matrix games

III.2.1. Security levels.

DEFINITION III.2.1. A game is zero-sum if the sum of the players payoffs is alwayszero. Otherwise the game is nonzero-sum. A two-player zero-sum game is also called

a duel.

DEFINITION III.2.2. A two-player zero-sum game in which each player has only afinite number of actions to choose from is called a matrix game.

Let us explore how matrix games can be solved. We number the players 1 and2 respectively. Conventionally, Player 1 is the maximizer and has m (pure) strategies,say i = 1, 2,...,m, and Player 2 is the minimizer and has n strategies to choose from,say j = 1, 2,...,n. If Player 1 chooses strategy i while Player 2 picks strategy j, thenPlayer 2 pays Player 1 the amount aij

1. The set of all possible payoffs that Player1 can obtain is represented in the form of the m n matrix A with entries aij for

i = 1, 2,...,m and j = 1, 2,...,n. Now, the element in the ith row and jth columnof the matrix A corresponds to the amount that Player 2 will pay Player 1 if the latterchooses strategy i and the former chooses strategy j. Thus one can say that in thegame under consideration, Player 1 (the maximizer) selects rows of A while Player 2(the minimizer) selects columns of that matrix. As the result of the play, as said above,Player 2 pays Player 1 the amount of money specified by the element of the matrix inthe selected row and column.

EXAMPLE III.2.1. Consider a game defined by the following matrix:3 1 84 10 0

What strategy should a rational player select?

A first line of reasoning is to consider the players security levels. It is easy to see thatif Player 1 chooses the first row, then, whatever Player 2 does, Player 1 will get a payoffequal to at least 1 (util2). By choosing the second row, on the other hand, Player 1 risksgetting 0. Similarly, by choosing the first column Player 2 ensures that he will not haveto pay more than 4, while the choice of the second or third column may cost him 10or 8, respectively. Thus we say that Player 1s security level is 1 which is ensured bythe choice of the first row, while Player 2s security level is 4 and it is ensured by thechoice of the first column. Notice that

1 = maxi minj aij

and4 = min

jmaxi

aij.

1Negative payments are allowed. We could have said also that Player 1 receives the amount aij and

Player 2 receives the amount aij .2A util is the utility unit.


25/105

III.2. MATRIX GAMES 25

From this observation, the strategy which ensures that Player 1 will get at least thepayoff equal to his security level is called his maximin strategy. Symmetrically, thestrategy which ensures that Player 2 will not have to pay more than his security levelis called his minimax strategy.

LEMMA III.2.1. In any matrix game the following inequality holds

(III.1) maxi

minj

aij minj

maxi

aij.

Proof: The proof of this result is based on the remark that, since both security levelsare achievable, they necessarily satisfy the inequality (III.1). More precisely, let (i, j)and (i, j) be defined by

(III.2) aij = maxi

minj

aij

and

(III.3) aij = minj

maxi

aij

respectively. Now consider the payoffaij. For any k and l, one has

minj

akj akl maxi

ail(III.4)

Then, by construction, and applying (III.4) with k = i and l = j, we get

maxi

(minj

aij) aij minj

(maxi

aij).

QED.

An important observation is that if Player 1 has to move first and Player 2 acts

having seen the move made by Player 1, then the maximin strategy is Player 1s bestchoice which leads to the payoff equal to 1. If the situation is reversed and it is Player2 who moves first, then his best choice will be the minimax strategy and he will haveto pay 4. Now the question is what happens if the players move simultaneously. Thecareful study of the example shows that when the players move simultaneously theminimax and maximin strategies are not satisfactory solutions to this game. Noticethat the players may try to improve their payoffs by anticipating each others strategy.In the result of that we will see a process which, in some cases, is not converging toany stable solution. For example such an instability occurs in the matrix game that wehave introduced in example III.2.1

Consider now another example.

EXAMPLE III.2.2. Let the matrix game A given as follows 10 15 2020 30 40

30 45 60

.

Can we find satisfactory strategy pairs?


26/105


It is easy to see that

maxi

minj

aij = max{15, 30, 45} = 15

and

minj

maxi

aij = min{30, 15, 60} = 15

and that the pair of maximin and minimax strategies is given by

(i, j) = (1, 2).

That means that Player 1 should choose the first row while Player 2 should select thesecond column, which will lead to the payoff equal to -15.

In the above example, we can see that the players maximin and minimax strategiessolve the game in the sense that the players will be best off if they use these strategies.

III.2.2. Saddle points. Let us explore in more depth this class of strategies that

has solved the above zero-sum matrix game.DEFINITION III.2.3. If, in a matrix game A = [aij]i=1,...,m;j=1,...,n , there exists a

pair(i, j) such that, for all i = 1, . . . , m andj = 1, . . . , n

(III.5) aij aij aij

we say that the pair(i, j) is a saddle point in pure strategies for the matrix game.

As an immediate consequence of that definition we obtain that, at a saddle point ofa zero-sum game, the security levels of the two players are equal, i.e.,

maxi

minj

aij = minj

maxi

aij = aij.

What is less obvious is the fact that, if the security levels are equal then there exists asaddle point.

LEMMA III.2.2. If, in a matrix game, the following holds

maxi

minj

aij = minj

maxi

aij = v

then the game admits a saddle point in pure strategies.

Proof: Let i and j be a strategy pair that yields the security level payoffs v (respec-tively v) for Player 1 (respectively Player 2). We thus have for all i = 1, . . . , m and

j = 1, . . . , naij min

jaij = max

iminj

aij(III.6)

aij maxi

aij = minj

maxi

aij .(III.7)

Since

maxi

minj

aij = minj

maxi

aij = aij = v


27/105


28/105


strategies of Player 1 constitutes a simplex3 in the space IRm. This is illustrated inFigure III.1 for m = 3. Similarly the set of mixed strategies of Player 2 is a simplex inIRm.

T

x3

)

x1

Ex2

1

1

1

FIGURE III.1. The simplex of mixed strategies

The interpretation of a mixed strategy, say x, is that Player 1, chooses his purestrategy i with probability xi, i = 1, 2,...,m. Since the two lotteries defining therandom draws are independent events, the joint probability that the strategy pair (i, j)be selected is given by xi yj . Therefore, with each pair of mixed strategies (x, y) wecan associate an expected payoff given by the quadratic form in x and y (where thesuperscript T denotes the transposition operator on a matrix):

m

i=1n

j=1xiyjaij = x

TAy.

One of the first important result of game theory proved in [65] is the following theorem.

THEOREM III.2.1. Any matrix game has a saddle point in the class of mixed strate-gies, i.e., there exist probability vectors x andy such that

maxx

miny

xTAy = miny

maxx

xTAy = (x)TAy = v

3A simplex is, by construction, the smallest closed convex set that contains n + 1 points in IRn.


29/105


where v is called the value of the game.

We shall not repeat the complex proof given by von Neumann. Instead we willshow that the search for saddle points can be formulated as a linear programmingproblem (LP). A well known duality property in LP implies the saddle point existence

result.


30/105


III.2.4. Algorithms for the computation of saddle-points. Saddle points in mixedstrategies can be obtained as solutions of specific linear programs. It is easy to showthat, for any matrix game, the following two relations hold4:

(III.8) v = maxx

miny

xTAy = maxx

minj

m

i=1xiaij

and

(III.9) z = miny

maxx

xTAy = miny

maxi

nj=1

yjaij

These two relations imply that the value of the matrix game can be obtained by solvingany of the following two linear programs:

(1) Primal problem

max v

subject to

v mi=1

xiaij, j = 1, 2,...,n

1 =mi=1

xi

xi 0, i = 1, 2,...m

(2) Dual problem

min z

subject to

z n

j=1

yjaij , i = 1, 2,...,m

1 =n

j=1

yj

yj 0, j = 1, 2,...n

The following theorem relates the two programs together.

THEOREM III.2.2. (Von Neumann [65]): Any finite two-person zero-sum matrix

game A has a value.

4Actually this is already a linear programming result. When the vector x is given, the expression

miny xTAy, with the simplex constraints i.e., y 0 and

j yj = 1 defines a linear program. The

solution of that LP can always be found at an extreme point of the admissible set. An extreme point

in the simplex corresponds to yj = 1 and the other components equal to 0. Therefore, since Player 1selects his mixed strategy x expecting the opponent to define his best reply, he can only restrict the

search for the best reply to the opponents set of pure strategies.


31/105


Proof: The value v of the zero-sum matrix game A is obtained as the common opti-mal value of the following pair of dual linear programming problems. The respectiveoptimal programs define the saddle-point mixed strategies.

Primal Dualmax v min zsubject to xTA v1T subject to A y z1

xT1 = 1 1Ty = 1x 0 y 0

where

1 =

1...1...1

denotes a vector of appropriate dimension with all components equal to 1. One needsto solve only one of the programs. The primal and dual solutions give a pair of saddlepoint strategies. QED

REMARK III.2.2. Simple n n games can be solved more easily (see [47]). Sup-pose A is an n n matrix game which does not have a saddle point in pure strategies.The players unique saddle point mixed strategies and the game value are given by:

(III.10) x =1TAD

1TAD1

(III.11) y =AD1

1TAD1

(III.12) v =detA

1TAD1

where AD is the adjoint matrix of A, det A the determinant ofA, and1 the vector ofones as before.

Let us illustrate the usefulness of the above formulae on the following example.

EXAMPLE III.2.3. We want to solve the matrix game1 0

1 2

.

The game, obviously, has no saddle point (in pure strategies). The adjoint AD is2 01 1


32/105


and 1AD = [3 1], AD1T = [2 2], 1AD1T = 4, det A = 2. Hence the best mixedstrategies for the players are:

x =

3

4,

1

4

, y =

1

2,

1

2

and the value of the play is:v =

1

2

In other words in the long run Player 1 is supposed to win .5 if he uses the first row75% of times and the second 25% of times. The Player 2s best strategy will be to usethe first and the second column 50% of times which ensures him a loss of .5 (only;using other strategies he is supposed to lose more).

III.3. Bimatrix games

III.3.1. Best reply strategies. We shall now extend the theory that we developedfor matrix games to the case of nonzero sum games. A bimatrix game convenientlyrepresents a two-person nonzero sum game where each player has a finite set of possi-ble pure strategies. In a bimatrix game there are two players, say Player 1 and Player 2who have m and n pure strategies to choose from, respectively. Now, if the players se-lect a pair of pure strategies, say (i, j), then Player 1 obtains the payoffaij and Player2 obtains bij , where aij and bij are some given numbers. The payoffs for the two play-ers corresponding to all possible combinations of pure strategies can be representedby two m n payoff matrices A and B (from here the name) with entries aij and bij ,respectively. When a

ij+ b

ij= 0, the game is a zero-sum matrix game. Otherwise,

the game is nonzero-sum. As aij and bij are the players payoff matrix entires thisconclusion agrees with Definition III.2.1, page 24.

In Remark III.2.1, page 27, in the context of (zero-sum) matrix games, we havenoticed that a pair of saddle point strategies constitute an equilibrium since no playercan improve his payoff by a unilateral strategic change. Each player has thereforechosen the best reply to the strategy of the opponent. We examine whether the conceptof best reply strategies can be also used to define a solution to a bimatrix game.

EXAMPLE III.3.1. Consider the bimatrix game defined by the following matrices

52 44 4442 46 39

and

50 44 4142 49 43

.

Examine whether there are strategy pairs that constitute best reply to each other.


33/105


34/105


III.3.3. Shortcommings of the Nash equilibrium concept.

III.3.3.1. Multiple equilibria. As noticed in Example III.3.1 a bimatrix game mayhave several equilibria in pure strategies. There may be additional equilibria in mixedstrategies as well. The nonuniqueness of Nash equilibria for bimatrix games is a se-rious theoretical and practical problem. In Example III.3.1 one equilibrium strictly

dominates the other equilibrium i.e., gives both players higher payoffs. Thus, it canbe argued that even without any consultations the players will naturally pick the strat-egy pair corresponding to matrix entry (i, j) = (1, 1). However, it is easy to defineexamples where the situation is not so clear.

EXAMPLE III.3.2. Consider the following bimatrix game(2, 1) (0, 0)(0, 0) (1, 2)

It is easy to see that this game5 has two equilibria (in pure strategies), none of which

dominates the other. Moreover, Player1 will obviously prefer the solution (1, 1), while

Player2 will rather have (2, 2). It is difficult to decide how this game should be playedif the players are to arrive at their decisions independently of one another.

III.3.3.2. The prisoners dilemma. There is a famous example of a bimatrix game,that is used in many contexts to argue that the Nash equilibrium solution is not alwaysa good solution to a noncooperative game.

EXAMPLE III.3.3. Suppose that two suspects are held on the suspicion of commit-ting a serious crime. Each of them can be convicted only if the other provides evidence

against him, otherwise he will be convicted as guilty of a lesser charge. However, by

agreeing to give evidence against the other guy, a suspect can shorten his sentence

by half. Of course, the prisoners are held in separate cells and cannot communicatewith each other. The situation is as described in Table III.1 with the entries giving the

length of the prison sentence for each suspect, in every possible situation. Notice that

in this case, the players are assumed to minimize rather than maximize the outcomeof the play.

Suspect I: Suspect II: refuses agrees to testify

refuses (2, 2) (10, 1)agrees to testify (1, 10) (5, 5)

TABLE III .1. The Prisoners Dilemma.

5The above example is classical in game theory and known as the battle-of-sexes game. In an

American and a rather sexist context, the rows represent the womans choices between going to the

theater and the football match while the columns are the mans choices between the same events. In

fact, we can well understand what a mixed-strategy solution means for this example: the couple will be

happy if the go to the theater and match in every alternative week.


35/105

III.3. BIMATRIX GAMES 35

The unique Nash equilibrium of this game is given by the pair of pure strategies(agree-to-testify, agree-to-testify) with the outcome that both suspects will spend five

years in prison. This outcome is strictly dominated by the strategy pair (refuse-to-testify, refuse-to-testify), which however is not an equilibrium and thus is not a realisticsolution of the problem when the players cannot make binding agreements.

The above example shows that Nash equilibria could result in outcomes being veryfar from efficient.

III.3.4. Algorithms for the computation of Nash equilibria in bimatrix games.

Linear programming is closely associated with the characterization and computationof saddle points in matrix games. For bimatrix games one has to rely on algorithmssolving either quadratic programming or complementarity problems, which we definebelow. There are also a few algorithms (see [3], [47]) which permit us to find anequilibrium of simple bimatrix games. We will show one for a 2 2 bimatrix gameand then introduce the quadratic programming [38] and complementarity problem [34]

formulations.

III.3.4.1. Equilibrium computation in a 2 2 bimatrix game. For a simple 2 2 bimatrix game one can easily find a mixed strategy equilibrium as shown in thefollowing example.

EXAMPLE III.3.4. Consider the game with payoff matrix given below.

(1, 0) (0, 1)(12

, 13

) (1, 0)

Compute a mixed strategy equilibrium.

First, notice that this game has no pure strategy equilibrium.

Assume Player 2 chooses his equilibrium strategy y (i.e., 100y% of times use firstcolumn, 100(1 y)% times use second column) in such a way that Player 1 (in equi-librium) will get as much payoff using first row as using second row i.e.,

y + 0(1 y) =1

2y + 1(1 y).

This is true for y = 23 .

Symmetrically, assume Player 1 is using a strategy x (i.e., 100x% of times use firstrow, 100(1 y)% times use second row) such that Player 2 will get as much payoffusing first column as using second column i.e.,

0x +1

3(1 x) = 1x + 0(1 x).


36/105


This is true for x = 14 . The players payoffs will be, respectively, (23) and (

14).

Then the pair of mixed strategies

(x, 1 x), (y, 1 y)

is an equilibrium in mixed strategies.

III.3.4.2. Links between quadratic programming and Nash equilibria in bimatrixgames. Mangasarian and Stone (1964) have proved the following result that links qua-dratic programming with the search of equilibria in bimatrix games. Consider a bima-trix game (A, B). We associate with it the quadratic program

max [xTAy + xTBy v1 v2](III.13)

s.t.

Ay v11m(III.14)

BTx v21n(III.15)

x, y 0(III.16)

xT1m = 1(III.17)

yT1n = 1(III.18)

v1, v2, IR.(III.19)

LEMMA III.3.1. The following two assertions are equivalent

(i): (x,y,v1, v2) is a solution to the quadratic programming problem (III.13)-(III.19)

(ii): (x, y) is an equilibrium for the bimatrix game.

Proof: From the constraints it follows that xTAy v1 and xTBy v2 for any feasible(x,y,v1, v2). Hence the maximum of the program is at most 0. Assume that (x, y) isan equilibrium for the bimatrix game. Then the quadruple

(x,y,v1 = xTAy,v2 = xTBy)

is feasible i.e., satisfies (III.14)-(III.19); moreover, it gives a value 0 to the objectivefunction (III.13). Hence the equilibrium defines a solution to the quadratic program-ming problem (III.13)-(III.19).

Conversely, let (x, y, v1, v

2) be a solution to the quadratic programming problem

(III.13)- (III.19). We know that an equilibrium exists for a bimatrix game (Nash theo-rem). We know that this equilibrium is a solution to the quadratic programming prob-lem (III.13)-(III.19) with optimal value 0. Hence the optimal program (x, y, v

1, v

2)

must also give a value 0 to the objective function and thus be such that

xT Ay + xT By = v

1 + v

2.(III.20)


37/105

III.3. BIMATRIX GAMES 37

For any x 0 and y 0 such that xT1m = 1 and yT1n = 1 we have, by (III.17) and

(III.18)

xTAy v1

xT By v2 .

In particular we must have

xT Ay v1

xT By v2

These two conditions with (III.20) imply

xT Ay = v1

xT By = v2 .

Therefore we can conclude that For any x 0 and y 0 such that xT1m = 1 and

yT1n = 1 we have, by (III.17) and (III.18)

xTAy xT Ay

xT By xT By

and hence, (x, y) is a Nash equilibrium for the bimatrix game. QED

III.3.4.3. A complementarity problem formulation. We have seen that the searchfor equilibria could be done through solving a quadratic programming problem. Herewe show that the solution of a bimatrix game can also be obtained as the solution of acomplementarity problem.

There is no loss in generality if we assume that the payoff matrices are m n andhave only positive entries (A > 0 and B > 0). This is not restrictive, since VNMutilities are defined up to an increasing affine transformation. A strategy for Player 1is defined as a vector x IRm that satisfies

x 0(III.21)

xT1m = 1(III.22)

and similarly for Player 2

y 0(III.23)

yT1n = 1.(III.24)

It is easily shown that the pair (x, y) satisfying (III.21)-(III.24) is an equilibrium iff

(III.25)(x

T

A y)1m A y (A > 0)

(xT

B y)1n BT x (B > 0)

i.e., if the equilibrium condition is satisfied for pure strategy alternatives only.


38/105


Consider the following set of constraints with v1 IR and v2 IR

(III.26)v11m A y

v21n BT x

andx

T

(A y v11m) = 0

yT

(BT x v21n) = 0.

The relations on the right are called complementarity constraints. For mixed strategies

(x, y) satisfying (III.21)-(III.24), they simplify to xT

A y = v1, xTB y = v2. This

shows that the above system (III.26) of constraints is equivalent to the system (III.25).

Define s1 = x/v2, s2 = y/v1 and introduce the slack variables u1 and u2, thesystem of constraints (III.21)-(III.24) and (III.26) can be rewritten

u1u2

=

1m1n

0 A

BT 0

s1s2

(III.27)

0 =

u1u2

Ts1s2

(III.28)

0 u1u2T

(III.29)

0

s1s2

.(III.30)

Introducing four obvious new variables permits us to rewrite (III.27)- (III.30) in thegeneric formulation

u = q + Ms(III.31)

0 = uT s(III.32)

u 0(III.33)

s 0,(III.34)

of a so-called a complementarity problem.

A pivoting algorithm ([34], [35]) has been proposedto solve such problems. Thisalgorithm applies also to quadratic programming , so this confirms that the solution ofa bimatrix game is of the same level of difficulty as solving a quadratic programmingproblem.

REMARK III.3.2. Once we obtain x and y , solution to (III.27)-(III.30) we shallhave to reconstruct the strategies through the formulae

x = s1/(sT1 1m)(III.35)

y = s2/(sT2 1n).(III.36)

III.4. Concave m-person games

The nonuniqueness of equilibria in bimatrix games and a fortiori in m-player ma-trix games poses a delicate problem. If there are many equilibria, in a situation whereone assumes that the players cannot communicate or enter into preplay negotiations,


39/105

III.4. CONCAVE m-PERSON GAMES 39

how will a given player choose among the different strategies corresponding to the dif-ferent equilibrium candidates? In single agent optimization theory we know that strictconcavity of the (maximized) objective function and compactness and convexity of theconstraint set lead to existence and uniqueness of the solution. The following questionthus arises

Can we generalize the mathematical programming approach to a

situation where the optimization criterion is a Nash-Cournot equi-

librium? can we then give sufficient conditions for existence and

uniqueness of an equilibrium solution?

The answer has been given by Rosen in a seminal paper [54] dealing with concavem-person game.

A concave m-person game is described in terms of individual strategies repre-sented by vectors in compact subsets of Euclidian spaces (IRmj for Player j) and by

payoffs represented, for each player, by a continuous functions which is concave w.r.t.his own strategic variable. This is a generalization of the concept of mixed strategies,introduced in previous sections. Indeed, in a matrix or a bimatrix game the mixedstrategies of a player are represented as elements of a simplex i.e., a compact con-vex set, and the payoffs are bilinear or multilinear forms of the strategies, hence, foreach player the payoff is concave w.r.t. his own strategic variable. This structure isthus generalized in two ways: (i) the strategies can be vectors constrained to be in amore general compact convex set and (ii) the payoffs are represented by more generalcontinuous-concave functions.

Let us thus introduce the following game in strategic form

Each player j M = {1, . . . , m} controls the action uj Uj where Uj isa compact convex subset of IRmj , whith mj a given integer. Player j receivesa payoffj(u1, . . . , uj, . . . , um) that depends on the actions chosen by all theplayers. One assumes that the reward function j : U1 Uj, Um IR is continuous in each ui and concave in uj .

A coupled constraint is defined as a proper subsetU ofU1 Uj Um. The constraint is that the joint action u = (u1, . . . , um) must be in U.

DEFINITION III.4.1. An equilibrium , under the coupled constraintU is defined asa decision m-tuple (u1, . . . , u

j , . . . , u

m) U such that for each playerj M

j(u1, . . . , uj , . . . , u

m) j(u

1, . . . , uj, . . . , u

m)(III.37)

for all uj Uj s.t. (u

1, . . . , uj , . . . , u

m) U.(III.38)

REMARK III.4.1. The consideration of a coupled constraint is a new feature. Noweach players strategy space may depend on the strategy of the other players. This

may look awkward in the context of nonocooperative games where the players cannot

enter into communication or cannot coordinate their actions. However the concept is


40/105


mathematically well defined. We shall see later on that it fits very well some interesting

aspects of environmental management. One can think for example of a global emission

constraint that is imposed on a finite set of firms that are competing on the same market.

This environmental example will be further developed in forthcoming chapters.

III.4.1. Existence of coupled equilibria.DEFINITION III.4.2. A coupled equilibrium is a vectoru such that

(III.39) j(u) = max

uj{j(u

1, . . . , uj, . . . , u

m)|(u

1, . . . , uj , . . . , u

m) U}.

At such a point no player can improve his payoff by a unilateral change in his strategy

which keeps the combined vector in U.

Let us first show that an equilibrium is actually defined through a fixed point condi-tion. For that purpose we introduce a so-called global reaction function : UU IRdefined by

(III.40) (u,v, r) =m

j=1

rjj(u1, . . . , vj, . . . , um),

where the coefficients rj > 0, j = 1, . . . , m are arbitrary given positive weights. Theprecise role of this weighting scheme will be explained later. For the moment wecould take as well rj 1. Notice that, even ifu and v are in U, the combined vectors(u1, . . . , vj , . . . , um) are element of a larger set in U1 . . . Um. This function iscontinuous in u and concave in v for every fixed u. We call it a reaction function sincethe vector v can be interpreted as composed of the reactions of the different players tothe given vector u. This function is helpful as shown in the following result.

LEMMA III.4.1. Letu

U be such that(III.41) (u,u, r) = max

uU(u,u, r).

Then u is a coupled equilibrium.

Proof: Assume u satisfies (III.41) but is not a coupled equilibrium i.e., does notsatisfy (III.39). Then, for one player, say , there would exist a vector

u = (u1, . . . , u, . . . , u

m) U

such that(u

1, . . . , u, . . . , u

m) > (u).

Then we shall also have (u, u) > (u,u) which is a contradiction to (III.41). QED

This result has two important consequences.

(1) It shows that proving the existence of an equilibrium reduces to proving thata fixed point exists for an appropriately defined reaction mapping (u is thebest reply to u in (III.41));


41/105


(2) it associates with an equilibrium an implicit maximization problem , definedin (III.41). We say that this problem is implicit since it is defined in terms ofthe very solution u that it characterizes.

To make more precise the fixed point argument we introduce a coupled reaction map-ping.

DEFINITION III.4.3. The point to set mapping

(III.42) (u, r) = {v|(u,v, r) = maxwU

(u,w, r)}.

is called the coupled reaction mapping associated with the positive weighting r. A

fixed point of(, r) is a vectoru such thatu (u, r).

By Lemma III.4.1 a fixed point of(, r) is a coupled equilibrium.

THEOREM III.4.1. For any positive weighting r there exists a fixed point of (, r)i.e., a pointu s.t. u (u, r). Hence a coupled equilibrium exists.

Proof: The proof is based on the Kakutani fixed-point theorem that is given in theAppendix of section III.7. One is required to show that the point to set mapping isupper semicontinuous. This is an easy consequence of the concavity of the game andcompactness of all constraint sets Uj , j = 1, . . . , m and U. QED

REMARK III.4.2. This existence theorem is very close, in spirit, to the theorem ofNash. It uses a fixed-point result which is topological and not constructive i.e.,

it does not provide a computational method. However, the definition of a normalisedequilibrium introduced by Rosen establishes a link between mathematical program-

ming and concave games with coupled constraints.

III.4.2. Normalized equilibria.

III.4.2.1. Kuhn-Tucker multipliers. Suppose that the coupled constraint (III.39)can be defined by a set of inequalities

(III.43) hk(u) 0, k = 1, . . . , p

where hk : U1 Um IR, k = 1, . . . , p, are given concave functions. Letus further assume that the payoff functions j() as well as the constraint functionshk() are continuously differentiable and satisfy the constraint qualification conditionsso that Kuhn-Tucker multipliers exist for each of the implicit single agent optimizationproblems defined below.

Assume all players other than Player j use their strategy ui , i M, i = j. Thenthe equilibrium conditions (III.37)-(III.39) define a single agent optimization problemwith concave objective function and convex compact admissible set. As usual, wedenote [uj, uj] the decision vector where all players i other than j play u

i while


42/105


Player j uses uj . Under the assumed constraint qualification assumption there exists avector of Kuhn-Tucker multipliers j = (jk)k=1,...,p such that the Lagrangean

(III.44) Lj([uj, uj], j) = j([u

j , uj ]) +k=1...p

jkhk([uj, uj])

verifies, at the optimum

0 =

ujLj([u

j, uj ], j)(III.45)

0 j(III.46)

0 = jkhk([uj, uj ]) k = 1, . . . , p .(III.47)

DEFINITION III.4.4. We say that the equilibrium is normalized if the different mul-tipliers j forj M are colinear with a common vector0, namely

(III.48) j =1

rj0

where the coefficients rj > 0, j = 1, . . . , m are weights given to the players.

Actually, this common multiplier 0 is associated with the implicit mathematicalprogramming problem

(III.49) maxuU

(u,u, r)

to which we associate the Lagrangean

(III.50) L0(u, 0) = jM

rjj([uj , uj ]) +

k=1...p

0khk(u).

and the first order necessary conditions

0 =

uj{rjj(u

) +

k=1,...,p

0khk(u)}, j M(III.51)

0 0(III.52)

0 = 0khk(u) k = 1, . . . , p .(III.53)

III.4.2.2. An economic interpretation. The multiplier, in a mathematical program-ming framework, can be interpreted as a marginal cost associated with the right-hand

side of the constraint. More precisely it indicates the sensitivity of the optimum solu-tion to marginal changes in this right-hand-side. The multiplier permits also a pricedecentralization in the sense that, through an ad-hoc pricing mechanism the optimiz-ing agent is induced to satisfy the constraints. In a normalized equilibrium, the shadowcost interpretation is not so apparent; however, the price decomposition principle isstill valid. Once the common multiplier has been defined, with the associated weight-ing rj > 0, j = 1, . . . , m, the coupled constraint will be satisfied by equilibrium


43/105


seeking players, when they use as payoffs the Lagrangeans

Lj([uj, uj], j) = j([u

j, uj]) +1

rj

k=1...p

0khk([uj, uj]),

j = 1, . . . , m .(III.54)

The common multiplier permits then an implicit pricing of the common constraintso that it remains compatible with the equilibrium structure. Indeed, this result to beuseful necessitates uniqueness of the normalized equilibrium associated with a givenweighting rj > 0, j = 1, . . . , m. In a mathematical programming framework, unique-ness of an optimum results from strict concavity of the objective function to be max-imized. In a game structure, uniqueness of the equilibrium will result from a morestringent strict concavity requirement, called by Rosen strict diagonal concavity.

III.4.3. Uniqueness of equilibrium. Let us consider the so-calledpseudo-gradientdefined as the vector

(III.55) g(u, r) =

r1

u1 1(u)r2

u2

2(u)...

rm

umm(u)

We notice that this expression is composed of the partial gradients of the differentpayoffs with respect to the decision variables of the corresponding player. We alsoconsider the function

(III.56) (u, r) =m

j=1

rjj(u)

DEFINITION III.4.5. The function (u, r) is diagonally strictly concave on U if,for everyu1 andu2 in U, the following holds

(III.57) (u2 u1)Tg(u1, r) + (u1 u2)Tg(u2, r) > 0.

A sufficient condition that (u, r) be diagonally strictly concave is that the sym-metric matrix [G(u, r) + G(u, r)T] be negative definite for any u1 inU, where G(u, r)is the Jacobian ofg(u0, r) with respect to u.

THEOREM III.4.2. If (u, r) is diagonally strictly concave on the convex set U,with the assumptions insuring existence of K.T. multipliers, then for every r > 0 there

exists a unique normalized equilibrium

Proof. We sketch below the proof given by Rosen [54]. Assume that for somer > 0 we have two equilibria u1 and u2. Then we must have

h(u1) 0(III.58)

h(u2) 0(III.59)


44/105


and there exist multipliers 1 0, 2 0, such that

1T

h(u1) = 0(III.60)

2T

h(u2) = 0(III.61)

and for which the following holds true for each player j M

rjujj(u1) + 1T

ujh(u1) = 0(III.62)

rjujj(u2) + 2

T

ujh(u2) = 0.(III.63)

We multiply (III.62) by (u2 u1)T and (III.63) by (u1 u2)T and we sum togetherto obtain an expression + = 0, where, due to the concavity of the hk and theconditions (III.58)-(III.61)

=jM

pk=1

{1k(u2 u1)Tujhk(u

1) + 2k(u1 u2)Tujhk(u

2)}

jM

{1T

[h(u2) h(u1)] + 2T

[h(u1) h(u2)]}

=jM

{1T

h(u2) + 2T

h(u1)} 0,(III.64)

and

=jM

rj [(u2 u1)Tujj(u

1) + (u1 u2)Tujj(u2)].(III.65)

Since (u, r) is diagonally strictly concave we have > 0 which contradicts + = 0.QED

III.4.4. A numerical technique. The diagonal strict concavity property that yieldedthe uniqueness result of theorem III.4.2 also provides an interesting extension of thegradient method for the computation of the equilibrium. The basic idea is to project,at each step, the pseudo gradient g(u, r) on the constraint setU = {u : h(u) 0} (letus call g(u, r) this projection) and to proceed through the usual steepest ascent step

u+1 = u + g(u, r).

Rosen shows that, at each step the step size > 0 can be chosen small enough forhaving a reduction of the norm of the projected gradient. This yields convergence ofthe procedure toward the unique equilibrium.

III.4.5. A variational inequality formulation.

THEOREM III.4.3. Under assumptions IV.1.1 the vector q = (q1, . . . , qm) is a

Nash-Cournot equilibrium if and only if it satisfies the following variational inequality

(III.66) (q q)Tg(q) 0,

where g(q) is the pseudo-gradient at q with weighting 1.


45/105

III.5. CORRELATED EQUILIBRIA 45

Proof: Apply first order necessary and sufficient optimality conditions for each playerand aggregate to obtain (III.66). QED

REMARK III.4.3. The diagonal strict concavity assumption is then equivalent tothe property of strong monotonicity of the operator g(q) in the parlance of varia-tional inequality theory.

III.5. Correlated equilibria

Aumann, [2], has proposed a mechanism that permits the players to enter intopreplay arrangements so that their strategy choices could be correlated, instead of beingindependently chosen, while keeping an equilibrium property. He called this type ofsolution correlated equilibrium.

III.5.1. Example of a game with correlated equlibria.

EXAMPLE III.5.1. This example has been initially proposed by Aumann [2]. Con-sider a simple bimatrix game defined as follows

c1 c2r1 5, 1 0, 0r2 4, 4 1, 5

This game has two pure strategy equilibria (r1, c1) and (r2, c2) and a mixed strategyequilibrium where each player puts the same probality 0.5 on each possible pure strat-

egy. The respective outcomes are shown in Table III.2 Now, if the players agree to play

(r1, c1) : 5,1(r

2, c

2) : 1,5

(0.5 0.5, 0.5 0.5) : 2.5,2.5TABLE III .2. Outcomes of the three equilibria

by jointly observing a coin flip and playing (r1, c1) if the result is head, (r2, c2) ifit is tail then they expect the outcome 3, 3 which is a convex combination of the two

pure equilibrium outcomes.

By deciding to have a coin flip deciding on the equilibrium to play, a new type of

equilibrium has been introduced that yields an outcome located in the convex hull of

the set of Nash equilibrium outcomes of the bimatrix game. In Figure III.2 we have

represented these different outcomes. The three full circles represent the three Nash

equilibrium outcomes. The triangle defined by these three points is the convex hull of

the Nash equilibrium outcomes. The empty circle represents the outcome obtained by

agreeing to play according to the coin flip mechanism.

It is easy to see that this way to play the game defines an equilibrium for the exten-

sive game shown in Figure III.3. This is an expanded version of the initial game where,


46/105


E

T t

ttd

(3,3)(2.5,2.5)

(1,5)

(5,1)dd

dd

dd

dd

FIGURE III.2. The convex hull of Nash equilibria

in a preliminary stage, Nature decides randomly the signal that will be observed by

the players6. The result of the coin flip is a public information, in the sense that

t!

eeeeeet

Head

TailtU

w

U

wt

t

t

t

ppppppppppppppppp

ppppppppppppppppp

I

q

I

q

I

q

I

q

c1

c1

c1

c1

c2

c2

c2

c2

r1

r1

r2

r2

tt

tt

t

t

tt

(5,1)

(0,0)

(4,4)

(1,5)

(5,1)

(0,0)

(4,4)

(1,5)

0.5

0.5

FIGURE III.3. Extensive game formulation with Nature playing first

it is shared by all the players. One can easily check that the correlated equilibrium is

a Nash equilibrium for the expanded game.

6Dotted lines represent information sets of Player 2.


47/105

III.5. CORRELATED EQUILIBRIA 47

c1 c2r1

13

0r2

13

13

FIGURE III.4. Probabilities of signals

We now push the example one step further by assuming that the players agree to

play according to the following mechanism: A random device selects one cell in the

game matrix with the probabilities shown in Figure III.4.

When a cell is selected, then each player is told to play the corresponding pure

strategy. The trick is that a player is told what to play but is not told what the rec-

ommendation to the other player is. The information received by each player is not

public anymore. More precisely, the three possible signals are (r1, c1), (r2, c1),(r2, c2). When Player 1 receives the signal play r2 he knows that with a probability12

the other player has been told play c1 and with a probability12

the other player has

been told play c2. When Player 1 receives the signal play r1 he knows that witha probability 1 the other player has been told play c1. Consider now what Player 1can do, if he assumes that the other player plays according to the recommendation. If

Player 1 has been told play r2 and if he plays so he expects

1

2 4 +

1

2 1 = 2.5;

if he plays r1 instead he expects

1

2 5 +

1

2 0 = 2.5

so he cannot improve his expected reward. If Player 1 has been told play r1 and

if he plays so he expects 5 , whereas if he plays r2 he expects 4. So, for Player 1,obeying to the recommendation is the best reply to Player 2 behavior when he himself plays according to the suggestion of the signalling scheme. Now we can repeat the

verification for Player 2. If he has been told play c1 he expects

1

2 1 +

1

2 4 = 2.5;

if he plays c2 instead he expects

1

2 0 +

1

2 5 = 2.5

so he cannot improve. If he has been told play c2 he expects 5 whereas if he playsc1

instead he expects 4 so he is better off with the suggested play. So we have checkedthat an equilibrium property holds for this way of playing the game. All in all, each

player expects1

3 5 +

1

3 1 +

1

3 4 = 3 +

1

3from a game played in this way. This is illustrated in Figure III.5 where the black spade

shows the expected outcome of this mode of play. Auman called it a correlated equi-

librium. Indeed we can now mix these equilibria and still keep the correlated equi-


48/105


E

T t

ttd

(3,3)(2.5,2.5)

(1,5)

(5,1)dd

dd

dd

dd

(3.33,3.33)

FIGURE III .5. The dominating correlated equilibrium

librium property, as indicated by the doted line on Figure III.5; also we can construct

an expanded game in extensive form for which the correlated equilibrium constructed

as above defines a Nash equilibrium (see Exercise 3.6).

In the above example we have seen that, by expanding the game via the adjunction ofa first stage where Nature plays and gives information to the players, a new classof equilibria can be reached that dominate, in the outcome space, some of the originalNash-equilibria. If the random device gives an information which is common to allplayers, then it permits a mixing of the different pure strategy Nash equilibria and theoutcome is in the convex hull of the Nash equilibrium outcomes. If the random devicegives an information which may be different from one player to the other, then thecorrelated equilibrium can have an outcome which lies outside of the convex hull ofNash equilibrium outcomes.

III.5.2. A general definition of correlated equilibria. Let us give a general defi-nition of a correlated equilibrium in an m-player normal form game. We shall actuallygive two definitions. The first one describes the construct of an expanded game witha random device distributing some pre-play information to the players. The seconddefinition which is valid for m-matrix games is much simpler although equivalent.

III.5.2.1. Nash equilibrium in an expanded game. Assume that a game is de-scribed in normal form, with m players j = 1, . . . , m, their respective strategy setsj and payoffs Vj(1, . . . , j , . . . , m). This will be called the original normal formgame.

Assume the players may enter into a phase of pre-play communication duringwhich they design a correlation device that will provide randomly a signal called pro-

posed mode of play. Let E = {1, 2, . . . , L} be the finite set of the possible modes ofplay. The correlation device will propose with probability () the mode of play .The device will then give the different players some information about the proposedmode of play. More precisely, let Hj be a class of subsets ofE, called the information


49/105

III.6. BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION 49

structure of Player j. Player j, when the mode of play has been selected, receives aninformation denoted hj() Hj . Now, we associate with each player j a meta strategydenoted j : Hj j that determines a strategy for the original normal form game,on the basis of the information received. All this construct, is summarized by the data(E, {()}E, {hj() Hj}jM,E, {j : Hj j}jM) that defines an expandedgame.

DEFINITION III.5.1. The data (E, {()}E, {hj() Hj}jM,E, {j : Hj

j}jM) defines a correlated equilibrium of the original normal form game if it is aNash equilibrium for the expanded game i.e., if no player can improve his expected

payoff by changing unilaterally his meta strategy i.e., by playing j(hj()) instead ofj (hj()) when he receives the signal hj() Hj .(III.67)E

()Vj([

j (hj()),

Mj(hMj())] E

()Vj([j(hj()),

Mj(hMj())].

III.5.2.2. An equivalent definition form-matrix games. In the case of an m-matrixgame, the definition given above can be replaced with the following one, which ismuch simpler

DEFINITION III.5.2. A correlated equilibrium is a probability distribution (s)over the set of pure strategies S = S1 S2 Sm such that, for every Playerj andany mapping j : Sj Sj the following holds

(III.68)sS

(s)Vj([sj, sMj]) sS

(s)Vj([(sj), sMj]).

III.6. Bayesian equilibrium with incomplete information

Up to now we have considered only games where each player knows everythingconcerning the rules, the players types (i.e., their payoff functions, their strategy sets)etc. .. We were dealing with games of complete information. In this section we lookat a particular class of games with incomplete information and we explore the class ofso-called Bayesian equilibria.

III.6.1. Example of a game with unknown type for a player. In a game of in-complete information, some players do not know exactly what are the characteristicsof other players. For example, in a two-player game, Player 2 may not know exactlywhat the payoff function of Player 1 is.

EXAMPLE III.6.1. Consider the case where Player 1 could be of two types, called1 and 2 respectively. We define below two matrix games, corresponding to the two

possible types respectively:


50/105


t!

eeeeeet1

t2

U

w

U

wt

t

t

t

Iq

I

q

I

q

I

q

tt

tt

tt

tt

pppppppppppppppppppppppppppppppppppppppp

(0,-1)

(2,0)

(2,1)

(3,0)

(1.5,-1)

(3.5,0)

(2,1)

(3,0)

c1

c1

c1

c1

c2

c2

c2

c2

r1

r1

r2

r2

p1

p2

FIGURE III.6. Extensive game formulation of the game of incomplete information

1 c1 c2r1 0, 1 2, 0

r2 2, 1 3, 0

2 c1 c2r1 1.5, 1 3.5, 0

r2 2, 1 3, 0Game 1 Game 2

If Player 1 is of type 1, then the bimatrix game 1 is played; if Player 1 is of type 2,then the bimatrix game 2 is played; the problem is that Player 2 does not know the type

of Player 1.

III.6.2. Reformulation as a game with imperfect information. Harsanyi, in[25], has proposed a transformation of a game ofincomplete information into a gamewith imperfect information. For the example III.6.1 this transformation introduces a

preliminary chance move, played by Nature which decides randomly the type i

ofPlayer 1. The probability of each type, denoted p1 and p2 = 1 p1 respectively, rep-resents the beliefs of Player 2, given here in terms of prior probabilities, about facinga player of type 1 or 2. One assumes that Player 1 knows also these beliefs, the priorprobabilities are thus common knowledge. The information structure7 in the associatedextensive game shown in Figure III.6, indicates that Player 1 knows his type when

7The dotted line in Figure III.6 represents the information set of Player 2.


51/105

III.6. BAYESIAN EQUILIBRIUM WITH INCOMPLETE INFORMATION 51

deciding, whereas Player 2 does not observe the type neither, indeed in that game ofsimultaneous moves, the action chosen by Player 1. Call xi (respectively 1 xi) theprobability of choosing r1 (respectively r2) by Player 1 when he implements a mixedstrategy, knowing that he is of type i. Call y (respectively 1 y) the probability ofchoosing c1 (respectively c2) by Player 2 when he implements a mixed strategy.

We can define the optimal response of Player 1 to the mixed strategy (y, 1 y) ofPlayer 2 by solving8

maxi=1,2

a1

i1y + a1

i2 (1 y)

if the type is 1,

maxi=1,2

a2

i1y + a2

i2 (1 y)

if the type is 2.

We can define the optimal response of Player 2 to the pair of mixed strategy (xi, 1xi), i = 1, 2 of Player 1 by solving

maxj=1,2

p1(x1b1

1j + (1 x1)b1

2j) + p2(x2b2

1j + (1 x2)b2

2j).

Let us rewrite these conditions with the data of the game illustrated in Figure III.6.First consider the reaction function of Player 1

1 max{0y + 2(1 y), 2y + 3(1 y)}

2 max{1.5y + 3.5(1 y), 2y + 3(1 y)}.

We draw in Figure III.7 the lines corresponding to these comparisons between twolinear functions. We observe that, if Player 1s type is 1, he will always choose r2,

E

T

rrrrrrrrrr

pppppppppppppppppppp

p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p

0 1

y

r1

An Introduction to Dynamic Games

Documents

Transcript of An Introduction to Dynamic Games