Pure equilibria of repeated games with public observation · PDF filePure equilibria of...

17
Int J Game Theory (1998) 27:93-109 International Pure equilibria of repeated games with public observation Tristan Tomala* CERMSEM, Universit~Paris-I Panth6on-Sorbonne, 90 rue de Tolbiac, 75013 Paris, France (e-mail: tomala@univ-paris 1. fr) ReceivedAugust 1995/Finalversion May 1997 Abstract. The set of payoffs associated to pure uniform equilibria of a re- peated game with public observation is characterized in terms of the one shot-game. The key of the result is first, a study of the effect of undetectable deviations and second, the definition of new types of punishments using approachability techniques. Key words: Repeated games, folk theorem, public signals 1. Introduction A repeated game with public observation, is an infinite repetition of a one- shot game where after each stage, all players observe the same signal which is a deterministic function of the joint action. The players rely on all the information collected along the play to decide what to do at the current stage. Their objective is to maximize their long-run average payoff. The case where all players observe all joint actions chosen at previous stages is called the standard case and leads to the well known folk theorem (for a sur- vey, see Sorin, 1992). This theorem states that an outcome of the repeated game is supported by an equilibrium if and only if it is feasible and individu- ally rational. In a context of non-observable actions, the notions of feasibility and indi- vidual rationality have to be reconsidered. Through the study of the two- player case and of particular information functions, Lehrer (1989, 1990, 1992) showed that feasibility was not the good requirement. At equilibrium, all * I wish to thank Prof. J. Abdou for his supervision.

Transcript of Pure equilibria of repeated games with public observation · PDF filePure equilibria of...

Page 1: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Int J Game Theory (1998) 27:93-109 International

Pure equilibria of repeated games with public observation

Tristan Tomala*

CERMSEM, Universit~ Paris-I Panth6on-Sorbonne, 90 rue de Tolbiac, 75013 Paris, France (e-mail: tomala@univ-paris 1. fr)

Received August 1995/Final version May 1997

Abstract. The set of payoffs associated to pure uniform equilibria of a re- peated game with public observation is characterized in terms of the one shot-game. The key of the result is first, a study of the effect of undetectable deviations and second, the definition of new types of punishments using approachability techniques.

Key words: Repeated games, folk theorem, public signals

1. Introduction

A repeated game with public observation, is an infinite repetition of a one- shot game where after each stage, all players observe the same signal which is a deterministic function of the joint action. The players rely on all the information collected along the play to decide what to do at the current stage. Their objective is to maximize their long-run average payoff. The case where all players observe all joint actions chosen at previous stages is called the standard case and leads to the well known folk theorem (for a sur- vey, see Sorin, 1992). This theorem states that an outcome of the repeated game is supported by an equilibrium if and only if it is feasible and individu- ally rational.

In a context of non-observable actions, the notions of feasibility and indi- vidual rationality have to be reconsidered. Through the study of the two- player case and of particular information functions, Lehrer (1989, 1990, 1992) showed that feasibility was not the good requirement. At equilibrium, all

* I wish to thank Prof. J. Abdou for his supervision.

Page 2: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

94 T. Tomala

players must play in such a way that they do not have, at any stage, a profit- able and undetectable deviation. In case of detectable deviation, a player is punished to his minmax level. This provides a generalization of the folk theorem in cases where there is no doubt about the identity of the deviator. In the n-player case, Lehrer (1990) studies repeated games with semi-standard information, i.e. where the set of actions for player i is endowed with a parti- tion and when player i chooses an action, the element of the partition to which this action belongs is publicly announced. In this case, each player announces a signal which depends on his own action only. Thus, if a deviation is de- tected, all players detect it simultaneously (because the signals are public) and the identity of the deviator is publicly known. Another leading paper in the field of repeated games with public signals is Fudenberg et al. (1994). They show in a discounted setup that if every deviation can be statistically detected and if the deviator can be identified (again statistically), the folk theorem can be generalized to such a situation. The main goal of the present paper is to consider information structures for which some deviation may be undetectable and even when a deviation is detected, the identity of the deviator may be unknown.

With a general public signal, when a deviation is detected, several players may be responsible for it. Consider for example a repeated oligopolistic com- petition with more than three firms and where after each stage, a firm observes the total supply in the market, ff this supply is found to be different from the one prescribed by the strategy, a single firm will not know who of his oppo- nents did deviate. The question is then to define the right punishments, which will not be of minmax type anymore. In Tomala (1996), we did propose a solution for specific public signals. The main idea was to describe punishments directed against several players at the same time. We generalize these ideas here.

The main goal of this paper is to characterize the set of pure uniform equilibrium payoffs. We first look at undetectable deviations and their im- plications in terms of payoffs. The method follows Lehrer (1990). Let ~ be the set of payoff vectors u for which there is a sequence of joint actions Xp such that the associated payoff vector 9(Xp) converges to u and such that for each player, the maximal gain for deviating from Xp without being detected, goes to zero as p goes to infinity. An equilibrium payoff needs to be a convex combi- nation of payoffs in 9 .

Second, since a deviation may be attributed to a subset of players, we define, for each subset of players S, the set of payoffs V(S) generated by strategies which can be attributed to any member of S, i.e. such that for any sequence of observations, each player in S has a strategy that induces these observations. For each payoff vector u in V(S), we design a punishing strat- egy that holds each player i in S down to a quantity less than his prescribed payoff u i. This construction uses Blackwell's (1956) approachability theory and its use in the context of repeated games with non-observable actions by Kohlberg (1973).

These definitions enable us to build equilibrium strategies, using this modification of feasibility and these punishments.

The paper considers pure strategies and compact action sets. This setup captures the case of finite action sets and pure strategies, as well as the case of finite action sets, behavior strategies, and observation of the lottery on signals. This type of assumption is already met in the literature (Benoit and Krishna,

Page 3: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria of repeated games with public observation 95

1985). There is a technical reason for restricting ourselves to pure strategies. An essential technique in undiscounted repeated games with non-observable actions is the use of statistical inference. A basic element of our result is the computing, after a detectable deviation, of the set of potential deviators. With mixed strategies, it would be necessary to infer statistically on this set. This seems to be quite difficult with a general information function.

The second section of this paper presents the formal model. In section 3, we give properties of the undetectable deviations and provide a generaliza- tion of the notion of feasibility following Lehrer (1990). Section 4 is devoted to the definitions of the punishments. The main theorem is given in section 5 as well as the equilibrium strategy. The last section is devoted to concluding remarks.

2. The model

Let G be a strategic game, G = (N; (xi) ieN; (gi)ieN), w h e r e N = {1, . . . ,n} is a finite set of players, X i is the set of actions for player i, i ~ N, and gi : I ~ i e N X i ~ IR is the payoff function for player i. The product HieN.,~fi is

denoted by X and 9 : X - ~ ~ U denotes the vector payoff function (~i)i~N. Al l action sets are assumed to be topological compact spaces and 9 is assumed to be continuous. Without loss of generality, we assume Vie N, Vx e X, #i(x) >_ o.

Let ( be a public signalling function, i.e. ~: X ~ A, where A is the set of public signals. A is assumed to be compact and g to be continuous.

Given a strategic game G and a signalling function E, we define an infi- nitely repeated game Foo (g). At each stage, the players play the game G and if the joint action xt is played at stage t, all players are told g(xt). Each player remembers what he observed at previous stages. Let Hi, t > 1, be the Carte- sian product of A, t times with itself and let H0 be a singleton. Ht is the set of public histories (i.e. available to all players) up to time t. Let H = Ut_>0 Ht be the set of all histories and let H~ be the set of all sequences of public signals. A pure strategy for player i in F~ (C) is a mapping a i : H ~ X i. We may write %i for the restriction of ai to Ht-a. Let ~ be the set of pure strat- egies for player i in Foo (~).

Every joint pure strategy a = (tTi)i~U induces in a natural way a unique play Z(cr) = (xt(a))t>_l, where xt(o-) is the joint action played at stage t. We denote by e(a) the string of public signals induced by a, i.e. e(a) = (at(a))t_>l , where at(a) = E(xt(a)). Let 9~(o') = 9i(xt(a)) be the payoff for player i at stage t.

We define ~-(a) 1 / T E ~ 1 it7 = = 9, ( ) , the average payoff for player i at stage T. When y~.(q) converges as.T goes to infinity, we denote the limit by 7i(o') . Let y(a) -- (y'(a))i~ s if all y'(a) s are defined. In the sequel - i will stand for all players but i. If a is a joint strategy, O "-i is (~rJ)j~i and if x is a joint action, x -~ i s (xJ)j~i.

The following equilibrium notion is due to Sorin (for details see Sorin, 1992). A N-tuple of strategies a is a uniform equilibrium if:

i) 7(a) is well defined, and ii) Ve > O, 3To such that VT > To, Vi e N, Vz i ~ ~ ' , y~.(z i, a -i) ~ ~.(a) q- e.

Page 4: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

96 T. Tomala

Let Eoo(() be the set of uniform equilibrium payoffs, i.e. E ~ ( ( ) = {7(a)lo-is a uniform equilibrium}. When no confusion may arise, we will simply write E~. The main issue of this paper is to describe E~ (Y) as a func- tion of the basic data G and f.

R emark 2.1: The strategies in a repeated game with imperfect monitoring usually depend on the signals observed by the player as well as on his own previous moves. The strategies we have defined are often called public strat- egies (see e.g. Fudenberg et al., 1994). However, since we deal with pure strategies only, this is no restriction. Our repeated game is of effectively per- fect recall (Dalkey, 1953): knowing his strategy and the previous public sig- nals, a player can compute the moves he did play. Hence, we do not need to assume explicitly that he remembers them.

Remark also that knowing the public history and the joint strategy that is to be played, a player knows what his opponents are going to play after this history, given that they are not deviating. Thus, as we look at Nash equilibria where only unilateral deviations are considered, a deviating player will always know what his opponents are going to play at the next stage. This remark will be useful to define some deviations in the sequel.

3. Undetectable deviations

In this section, we will investigate the properties of the equilibrium strategies on the equilibrium path. This will lead to a generalization of the notion of feasibility. The main point is that with imperfect monitoring, players can de- viate from their equilibrium actions without changing the signal. Such devia- tions are called undetectable. At equilibrium, an undetectable deviation must not be profitable, therefore it is necessary that each player plays an (almost) best response among his undetectable deviations at each stage. The following definitions will be useful to handle this phenomenon.

Definition 3.1. Vx ~ X, y i ( x ) = {yi ~ x i lE(y i , x - i ) = f ( x ) }

y i ( x ) is the set of actions of player i which induce, against x -i, the same signal as x i. A deviation of player i which plays yi ~ y i ( x ) when x is to be played, will be undetectable. We define now the set of feasible payoffs that can be obtained by joint moves for which undetectable deviation are almost not profitable.

Definition 3.2. i) Let ~ be the set o f payo f f vectors u for which there is x ~ X with g(x) = u such that for all i, maxyi~ re(x) g i (S , x -i) - gi(x) < e ii) Let ~ be the set o f payo f f vectors u f o r which there is a sequence o f positive numbers (et)t_>l with limtet = 0 and a sequence o f vectors (ut)t>l such that ut is in ~ and limt ut = u.

A payoff vector is in ~ , if it can be obtained by a joint action such that undetectable deviations from this action are profitable by at most e. A payoff

Page 5: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria of repeated games with public observation 97

in @ is a limit of payoffs in N~ as e goes to zero. In the general case, N does not coincide with No.

Remarks 3.3: 1) N is a compact subset of g(X). 2) In the finite case, N = g(D), where D is the set of points x e X such that Vie N, gi(x)= maxyier,(x)gi(yi, x-i), i.e. N is equal to N0. However, this is not true in the general compact case since the function x~-* maxyi~ r,(x) gi(Y i, x-i) need not be continuous. 3) I f x is a Nash equilibrium of G, then g(x) ~ 9.

Proposition 3.4. Eoo c co

We provide here an example of a two-player repeated game where co ~ c co g(X) and co N r co g(X). This example is quoted from Lehrer (1990).

Example 3.5." The one-shot game is:

Yl Y2 Y3

X 1 2,2 5,1 7,0

x2 1,5 4,4 6,3

x3 0,7 3,6 5,5

with the following information function:

Yl Y2 y3

xl a c c

x2 b d d

x3 b d d

By simple computation, we find co @ = co{(2, 2); (1,5); (5, 1); (4.4)}.

The proof of proposition 3.4 follows Lehrer (1990). We start with a lemma.

Lemma 3.6: Let U be an open neighborhood of 9. There is an e > 0 such that, Vueg(X) \U, V x s g - l ( u ) , 3ieN, 3yi6 yi(x) such that gi(yi, x -i) > gi(x) + ~.

Proof" Otherwise there is a sequence (Xp)p>_l with, Vp, g(xp) ~ g(X)\U, and such that VisN , max/eyi(xo) gi(yi, x j ) <_gi(Xp)-[-lip. Since g(X)kg is compact, g(Xp) has a convergent subsequence. Let u be its limit, we have u e 9, a contradiction. 0

Proof of Proposition 3.4." Let u = ?(a), with a a uniform equilibrium. Assume u r co 9 . We will find i e N and z i a profitable deviation for player i.

Page 6: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

98 T. Tomala

Let K be a hyperplane separating u and co @ such that K divides ]R N in two parts, K - and K +, with the following properties: K + is open, co N ~ K +, d(u, K) > 0, d(co N, K) > 0, where d( . , . ) is the Euclidean distance on IR N.

Let M = { t l g t ( a ) E K - }. M has a positive lower density, i.e. l iminfT IM c~ {1 , . . . , T } I / T > 0, since otherwise, one can find a subsequence of 7T(a) which converges in K +. Since K + is an open set such that co @ c K +, from lemma 3.6, 3e > 0 such that Vt e M, qi e N and 3yit ~ yi(xt(a)) with gi(y[, xti(o.)) > gi(xt(~7)) ..~ ,g.

Following Lehrer (1990), we partition M into n subsets (Mi)i~u. Mi is defined as follows: t ~ Mi if and only if,

- t r M / f o r all j < i, and ~y[ e y i (x t (a ) ) such that ~i(y[ ,xt i ( ty)) > gi(xt(a)) -~-e.

This deafly defines a partition of M. Moreover it is easy to see that there is i in N such that Mi has a positive upper density, i.e. lim supr [Mi c~ {1 , . . . , T}I~T = t / > 0. Without loss of generality, we assume i = 1. We can now define zl a profitable strategy of player 1 as follows:

- At any stage t E MI, z 1 plays against xt(a) a y) s.t.

g l ( y ] , x ; l ( q ) ) > gl (xt (q)) - [ -~

- z 1 plays like 0 -1 in all other cases.

From the definition of z 1, we have 0~(171 , O " - I ) m_ 0~(O'), t h u s 1 "1 does not affect the signals and the other players do not change their moves. Hence Vt > 1, x t l ( z l ,a -1 ) : x t l (a) . Thus V t > l , g~('cl,a-1) > g~(a), and Vt 6 M1, gl (zl, a - l ) _> gtt (a) + e. It follows that, lim supr 71(z 1, a -1) _> u 1 + et/, which is a contradiction with a being a uniform equilibrium.

We prove now that, for every point u in co ~, there is a strategy a with payoff u, such that no player has an undetectable profitable deviation, i.e. no player can benefit from deviating without changing the signal.

Proposit ion 3.7. For all u in co ~, there is a with y(a) = u which has the fol- lowing properties: Ve > O, 3T0 such that VT >_ To, Vi ~ N, Vz i with ~(z i, a -i) : ~(a), one has y~('C i, a -i) ~ y~(CY) -[- e.

Proof" Let u ~ co ~. There is f2 an+l \ q)q=l' 2q ~> O, Eq,~q : 1 and (Xq, p)p> 1 with limp{maxyi~ ri(xq, A gi(yi, x~,ip) _ gi(Xq,p) } : 0, such that u = ~ q 2q limp g(Xq,p). Divide the set of stages N* into consecutive segments Bp such that the p-th segment Bp is of length p2. Each Bp is then divided into blocks Cq,p such that Ilfq,pl/IBpl- 2q[ < l ip. The strategy a is defined such that the players play Xq,p on Cq,p, independently of the history, i.e.

t ~ Cq,p =~ Vht_ 1 ~ nt_l ,a t i (h t_l ) = X~,p.

Page 7: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria of repeated games with public observation 99

i) y(a) = u. The average payoff on the block Bp is E q I(Cq,pl/lBpl)g(Xq,p) which con- verges to u. Since the length of Bp relative to the total length of its prede- cessors goes to zero, and since the payoffs are uniformly bounded, we have liST [[7r(a)- 7~(r)(a)l [ = 0, where ~0(T)= }--~fp(__r)[Bp] with p ( T ) chosen such that T e Bp(r)+l. Y~(r) is an average of the average payoffs on the Bp's. Hence, this converges to u, and thus 7(0-) = u. ii) Let T i be a strategy of player i with e(T i, a -i) - e(a). The payoff for player i at stage t is at most maxyiEri(x~(o) ) gi(yi,xt(a)7~). For t > 1, there are q ,p such that t s Cq,p. Let et=maxyisYi(xq.p)gt(y' ,X~,p)--gi(Xq,p). F r o m the choice o f Xq,p, l i n l t gt = O. Hence, Vt _> 1, g[(z i, a -i) <_ g[(a) + et. The result follows.

We have just defined strategies such that no player has a profitable un- detectable deviation. In the sequel, we will define punishments. The use of the previous strategy together with the punishments will form an equilibrium. We shall distinguish between the two-player case and the "at least three"- player case. The specificity of the two-player case is the following: whatever is the signalling structure, when player i plays a detectable deviation, all his opponents know who is the deviator. This is not the case with at least three players.

4. Punishments for INI >_ 3

The previous section was devoted to the study of non-detectable deviations and to strategies from which no such a deviation is profitable. Let us now consider detectable deviations. Suppose that along the play, at a given stage, the public signal is different from the one planned by the initial strategy. In the general case, this deviation can be attributed to several players. These players are the potential deviators. In an equilibrium strategy, all these players have to be punished. If the punishment misses one of them, he could deviate in order to induce the punishing phase and gain through this procedure. We see then that punishments will be directed against a subset of players.

For each subset S of N (we prefer not to use the word coalition, since no coalitional power is involved here), we will define the payoff set V(S) that any player i, i ~ S, can force by playing a strategy which can be attributed to any member of S. For S a non-empty subset of N, denote X s = I I i~s x i , and X N = X. We fix now such a S.

Definition 4.1. Vx ~ X , Z s ( x ) = {z ~ XSlV(i, j ) ~ S • S, : ( z i, x -i) : : ( z : , x ~ ) }.

Z s ( x ) is the set of S-tuple of actions such that, deviating from x i to z i in- duce the same signal against x, for all i in S. This defines a compact valued correspondence. In the following, Z} (x ) will denote the projection on X / of Z s ( x ) , i.e. Z~(x) is the set of one-shot deviations of player i against x which can be attributed to any member of S. The fundamental property of these sets is the following. When x has to be played and when i deviates from x i to z i in Z~(x) , any other player j in S could induce the same signal. Thus all players of S are equally suspected. Along the play, the identity of the deviator will be progressively revealed (but not always completely).

Page 8: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

100 T. Tomala

Remarks 4.2: 1) Vx e X, Z{ii(x) = X i, 2) Vx ~ X, I-Ills r i(x) c Zs(x).

Notation 4.3: For x e X and z in Zs(x), denote f~(x,z) = gi(zi, x-i), and fs(x , z) = (f~(x, z))i~ s.

Let now (s be a selection of Zs (W e may write (s ~ Zs for short) and (~ its projection on j(i. Such a selection (} represents a way of choosing a one-shot deviation as a function of the joint action prescribed by the strategy. It induces a deviation in the repeated game such that, any other player j in S could de- viate and induce the same string of signals. We will thus study the effect of the deviation of player i, which plays at each stage and for all x, (s(X) against x.

Definition 4.4. 1) V(S, (s) = U6{fs(x, (s(X))Ix ~ X} + Ns+ x NN\S. 2) V(S) = O.~seZs V(S, ~s) (U6 denotes the topological closure of the convex hull and IRS+ is the set o f vec- tors in IR s with non-negative coordinates).

To complete the definitions, let V(0) = ]R N. V(S, (s) is the closed convex hull of the set of vectors u for which there is x

in X such that for all i in S, gi((is(x), x -i) < u i. This means that if u is the desired equilibrium payoff, playing such an x will punish any member of S if he plays ( ) (x) instead of x ~. Thus V(S) represents the set of payoffs for which the deviations of the type "playing z i in Zs(x) instead of x i'' are punishable.

These sets can be seen as a generalized notion of individual rationality appropriate to this context. In the full-monitoring case, it is enough to con- sider individually rational payoffs i.e. where each player receives at least his minmax level. In case of unilateral deviation, the deviator is identified and punished to his minmax level forever. Here since some deviations can be at- tributed to several players, minmax punishments do not support equilibria anymore. However, all equilibrium payoffs are individually rational in the classical sense. We can check this by looking at the sets V({i}).

Remark 4. 5: 1) Let x be a Nash equilibrium of G, VS c N, g(x) ~ V(S). 2) Vie N, V({i}) = {u e ]RN[u i 2> vi}, where v i = minx-i m a x x i gi(xi ,x-i) .

The first point is obvious. For the second point, consider the selection of Z{i} (x) which is a best reply to x -i. The result follows.

Proposition 4.6. VS c N, E~ ~ V(S)

Example 4. 7." This example is a three-player repeated game where the set V({1,2}) is not trivial, and V({1, 3}) = V({2, 3}) = co g(X) + N~ x N.

Each player has three actions: X 1 = { X l , X 2 , X 3 } , X -2 = {YJ,Y~,Y3}, X 3 =

{zl,z2,z3}. We assume that g3 = 0, g( . , . ,z2) = (0,4,0), g( . , . ,z3) = (4,0,0), and furthermore, ~( . , . , z2) = f ( . , . , z3) = co, a constant. When 3 plays zl, the payoffs of 1 and 2 are:

Page 9: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria ofrepeated games with public observation

Yl Y2 Y3

xl 1,1 2,1 1,2

x2 2,1 1,2 2,1

x3 1,2 2,1 1,2

101

and the signal is given by:

Yl Y2 Y3

x I a b c

X2 b c b

x3 c b c

Z{1,2} (Xl, Yl, Zl) is computed as follows. Remark that: E(xl,Y2,Zl) = E(x2,Yl,Zl) = b

yl,Zl) = c, thus, a n d #(xl, Y3,Zl) = ~(X3,

Z{1,2}(Xl, Yl, Zl) = {(x2, Y2); (x3, Y3)}

and so on for all possible (x, y, z). V({1,2}) is then easy to compute and we find,

V({1,2}) = co{(2, 1), (1,2)} + R2+ x IR

Note that the payoff (1,1) is feasible and individually rational but cannot be an equilibrium payoff. Remark also that in this example, the signal reveals the payoff vector. This situation is not typical but leads to simpler computations.

Proof of Proposition 4.6." Let u �9 E~o, and a a uniform equilibrium with payoff u. For S a non-empty subset of N and ffs a selection of Zs, we will prove that u ~ V(S, ~s). For i e S, let T i be the strategy of player i such that,

Yh ~ H, ~i(h) = ~(a(h) )

Since ~s is a selection of Zs, we have for all h in H and i, j in S, f(a-i(h), Ti(h)) = f(a-J(h), TJ(h)). Thus, the deviation vi of player i induces against a the same string of signals as the deviation ~J of player j. Therefore, V(i, j) e S x S, ~(~i, a-i) = ~(~j ~r-J). Hence, V(i, j) �9 S x S, any player k ~ i and k r j cannot tell whether player i is playing Ti o r player j is playing r J, He will thus reply in the same way under both deviations. Thus, Vt > 0, xk(Tit\ , a-i'~] = xk(TJ, G-J)" It follows that there is a string of joint actions (xt)t>_l, such that Vi �9 S, X(z i, a -i) = ((is(xt), xT*)t>l. We have then Vie S,

T T �9 �9 " i i " i y~(C,a -~) = 1 / r Z g ((s(Xt),xt') = 1 / r E f~(x t ,~s (Xt ) )

t=l t=l

Page 10: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

102 T. Tomala

Since a is a uniform equilibrium, Ve > 0, 3T0 such that VT > To, Vi e S, y!c(~i, a - i) <_ u i + e. Let u s + e s be the vector of IR s whose i-th component is u ~ + e. The previous inequality says that 1 / T ~ r = l fs(xt , Q(x,)) <_ u s + e s

(here, _< is the natural order on IRS). Thus, u s + e s ~ c o { f s ( x , ( s (X))]x e X } + lRS+. Since this holds for all e > 0, u s ~ Ud{ f s (x , ( s ( X ) ) l x ~ X } + IRS+, which concludes the proof.

In this proof, we consider two players i a n d j and a player k r i and k ~ j. Player k cannot tell who of i o r j is deviating. This is specific to games with at least three players. Actually, the two-player case is the only one where the identity of the deviator is always completely revealed.

We will now use the set V ( S ) to define punishments directed against all members of S. These punishments will play a major role in the construction of the equilibrium strategy.

Notat ion 4.8." For S c N , denote ~ s I-lies ~ i and, for a a N-tuple of strategies,

~s(o') = {v s ~ ~ s IV(i,j ) ~ S • S, ~(~-i o.-i) : 0~(27j ' G-j) }

Let Y'~(o-) be the projection of ~s(O') on ~ i . ~ ( a ) is the set of strategies of player i such that, against a, the public

information revealed along the play about the identity of the deviator tells that all members of S are suspected and none of them will be exonerated at any stage.

Proposition 4.9. L e t S be a non-empty subset o f N. For all u in V( S) , there is a such that, Ve > O, 3To with V T > To, Vi ~ S, VT i ~ ~ ( ~ ) , ~(.(i ~7-i) ~ U i _~_ g.

This means that if a deviation is detected and if S is found to be the set of potential deviators, any single player in S will be punished by o- unless he reveals more information about his identity. The principle is to evaluate at each stage the maximal average payoff (that is compatible with the observa- tions) received by each potential deviator. We will then try to control this quantity and push it down to a level less than the equilibrium payoff.

The construction of this strategy is deeply inspired by the optimal strategy for the uninformed player in zero-sum repeated games with incomplete infor- mation. We follow Kohlberg (1973) and apply the principles of approchabilty of Blackwell (1956). Like in Kohlberg (1973), we have to adapt the ap- proaching strategy to games with non-observable payoffs. In a context of zero-sum repeated games with incomplete information, Kohlberg considers in each state of the world, the maximum payoff compatible with the observations and applies approachability on this payoff vector. This is the key of the defini- tion of an optimal strategy for the uninformed player. Although it would have been nice to apply Blackwell's theorem to an auxiliary game, we will redefine the approaching strategy and follow Blackwell's proof with a slight modifica- tion. There are two reasons for that. First, we deal only with pure strategies (while Blackwell dealt with mixed strategies) and second, the definition of the auxiliary game is unclear. We will approach the set u s - IR s , and all players

Page 11: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria of repeated games with public observation 103

will be requested to help (even members of S). What we know is that a mem- ber of S may disobey but we do not know which member. It is thus difficult to define who should be the approaching player in an auxiliary two-player game with vector payoffs. However, the usual techniques of approachability adapt very well to our setup. For details on approachability, the reader is referred to Mertens et al. (1994). We follow the proof of theorem 4.3, p. 125. We first prove a lemma, which shows that we are quite under the hypothesis of this theorem, i.e. for u in V(S) , the set u s - lRS+ is approachable.

We fix now S and u �9 V(S) . We denote u for u s to keep light notations.

Notation 4.10: 1) For u E IR s, C(u) = u - lRS+. 2) For x �9 X , R(x) = co{ fs (x , zs) lzs ~ Zs (x ) } . 3) For ~0 r C(u), let w(~0) be the closest point to ~0 in C(u).

For all 2 in ]0,1], let Kx be the hyperplane through g~ = 2~0 + (1 - 2)w(~o) and perpendicular to the segment [~0, w(~0)]. Let K ) be the closed half-space defined by K2 such that ~0 �9 K) , and K~- be its complementary.

Remark 4.11." If v �9 K +, then v + IRS+ c K) .

This is because Kz separates (0 and u - IR s.

Lemma 4.12. V~o r C(u), V2 6 ]0, 1], 3x �9 X such that R(x) c K Z

Proof." Otherwise, 32 e ]0, 1], such that Vx e X, R(x) c~ Kf~ 4: O. Thus, there is (s, a selection of Zs such that, Vx �9 X , f s ( x , r �9 K [ . Hence,

6{fs(x,G(x))lx �9 x } = K2

and from remark 4.11,

6{fs(x, Cs(x))lx x} + c KZ

Since u �9 U6{ f s ( x , ( s ( x ) ) l x �9 J(} + IR s, this implies u e K +. This contra- dicts u ~ C(u) c K [ .

We turn now to the description of the approaching strategy. Like in Kohlberg (1973), we have to consider the maximum payoff which is compati- ble with the observations.

Notation 4.13: 1) For x �9 X , As (x ) = {a �9 Ala = EIzi, x-i), i �9 S , z i. �9 zi. s ( . }-.x) 2) V x � 9 V a � 9 V i � 9 ~ o ' ( x , a ) = m a x { g ' ( z ' , x - ' ) l z ' � 9 a = f ( z i, x- i )} . We denote ~0(x, a) = (~oi(x, a))i~ s. (We omit the dependence of ~o on S for simplicity)

As(x ) is the set of attainable signals if x is to be played and if a player i in S deviates to z i in Z~,(x). ~oi(x, a) is the maximal payoff for player i if he deviates when x is to be played and if a in As(x ) is observed. From continuity of the payoffs and since the set {z i �9 Z~(x)la = g(z i, x - i ) } is compact, there is z ~ Zs (x ) such that ~o(x,a) = f s ( x , z ) .

Page 12: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

104 T. Tomala

Proof of Proposition 4.9: We construct the strategy inductively. - The first joint move Xl is arbitrary. - Suppose that the first T moves x ~ , . . . , x r have been chosen and that

a l , . . . , a r have been observed, with at e As(xt), Vt <_ T. Let (gT(Xl,... , XT; a l , . . . ,aT) = 1 /T Etr=l (o(xt, at).

�9 I f ~ r ( x l , . . . , x r ;a i , . . . , a t ) e C(u), xr+l is arbitrary, �9 otherwise, XT+I is such that R(Xr+l) c K~(T+I).

- If at a given stage t, we observe at ~ As(xt), f rom that stage on, the strategy is arbitrary.

Let i e S and v i e ~e~(a).. Let Or, be the sequence of ( p r ( X l , . . . , x r ; a x , . . . , aT) induced by (U, a - ' ) . Let wr be the closest point to ~ r o f C(u), and let rcr = 1/(T + 1)~ r + (1 - 1/(T + 1))wr. Denote fir = d(Or, C(u)). We will prove that f r goes to zero uniformly with respect to the strategy v~. Again we follow Mertens et al. (1994) (. ;. ) denotes the usual inner product on ]R N and I[. II is the Euclidean norm. Let M = maxx~x ]lg(x)[[. 2

We h a v e 62+1 ~__ [[(flT+l -- WT[I '2 and IlOr+~ - wrl[ 2 = I l O r - wrl[ + 2(0 r - WT; 0r+l -- ~r ) + ][,~T+I -- ~rll 2" Since ~r+l - ~ r = [((ar+l - wr) - ((Or - w r ) ] / ( T + 1), a n d [l~Or+~ - ~r l l -< 4M~, we get,

@+1 < (1 - 2 / (T + 1))fi 2 + ( I / ( T + 1)2)4M 2

+ ( 2 / ( T + 1))((0r+ 1 - wr; Or - wr)

We have ({0r+ 1 - - nr ; Or - wr) _< 0. Thus ( ( 0 r + l - wr; Or - wr) < ( n r - w r ; ( o r - w r ) < ( 1 / ( T + l ) ) ( ~ o r w r ; 0 r - - w r ) = f ~ T / ( r + l ) . It fol- lows that

fi2T+ 1 < f~,[1 -- 2 / (T + 1) + 2 / (T + 1) 2] + 4M2/(T + 1) 2

It is then easily proved that there is a sequence er with l imr er = 0, such that Vie S, V'C i e ~ ( o ' ) , f T -~< eT. This sequence depends only on u and S.

Notation 4.14." For S, a non-empty subset of S, we choose a strategy a(S) as in proposi t ion 4.9.

5. The m a i n theorem

T h e o r e m 5.1 .

1) I f IN I = 2, then

E~ : co ~ ~ 0 v ( { i ) ) i ~ N

2) I f IN[ _> 3, then

Eoo : co ~ ~ 0 V(S) S ~ N

Page 13: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria of repeated games with public observation 105

Remark 5.2: The existence problem.

Although the question of existence of equilibria is unusual in repeated games with complete information, it arises when dealing with pure strategies. For example, the repetition of a one-shot game which has no Nash equilibrium and with a trivial observation, does not have equilibrium. This of course never happens in the case of finite action sets, behavior strategies and observation of the lottery on signals. We have not found a characterization of games and signals for which E~ is non-empty but we give two sufficient conditions for the repeated game to have an equilibrium. The first one deals with the payoff, and the second one with the signal.

Condition 1: If G has a Nash equilibrium, then E~ ~ 0. The proof of this fact is standard.

Condition 2: If there is x s X for which, Vi ~ N, yi(x) = {x i} and, VS ~ N, ISI > 2, Z s ( x ) = {xS}, and if g(x) is individually rational (i.e g(x) OieU V({i})), then Eoo r 0.

In terms of the signal, this means that there is an x such that g(x) is indi- vidually rational and:

Vi, Vy i ~ x i, f(yi, x-i) r ~(x)

and,

Vi, j, i r j, g(z i, x -i) = f ( z j, x -j) ~ z i = x i and z j = x j

It is easy to see that for such an x, g(x) s ~ and that VS c N, g(x) e V(S). This means that Eo~ is non-empty if there is a joint action x giving an indi- vidually rational payoff and for which every deviation is detectable and the deviator is identified.

Example 5. 3: 1) Standard information. We say that the information is standard if t ~ is one-to-one, i.e. each player is fully informed of the joint action. Then, E~ = co g(X) n (~i~N V({i}), which is the content of the folk theorem. Just remark that, if IS[ > 2, Vx, Zs(x) = {xS}. 2) Trivial information. The information is trivial if f is constant. Then, Eoo = co g(nash( G) ), where nash(G) is the set of Nash equilibria of the one- shot game G.

According to the propositions 3.4 and 4.6, we only need to construct the equilibrium strategies to complete the proof of the theorem. In the two-player case, the inclusion of E~ in Ni~N V({i}), which is the set of individually rational payoffs, is proved by standard arguments (see e.g. Lehrer, 1989).

5.1. The equilibrium strategy for at least three players

Let u ~ co ~ c~ Ds=ar V(S). From proposition 3.7, there is a, a joint strategy with payoff u, such that there are no profitable undetectable deviations. We

Page 14: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

106 T. Tomala

shall specify this strategy outside the equilibrium path. Let (xt)t>a be the main path induced by ~r and (at)t>_1 be the associated sequence of signals. All players should stick to this path as long as the right signals are observed. Suppose that a deviation is detected, i.e. that there is a first stage t~ such that the observed signal is bt~ ~ a* tl"

Notation 5.4: For x e X and b E A, denote,

N(x, b) = {i E NI3y i ~ )[i, ~(yi, x-i) = b}

If x is to be played and if b is observed instead of ((x), N(x, b) represents the set of players that have a deviation inducing this signal.

Let $1 be N(x~, bt~). Each player is then requested to play his component of the strategy ~r(S1)[ht~ ], where cr(S1 )[ht~ ] is the punishing strategy against SI defined in proposition 4.9 in the subgame following ht, = (a~,.. . ,a~_l,bt~), the observed history up to q. All players should adhere to this strategy as long as they do not get more information about the identity of the deviator than that he belongs to $1 (in fact each non-deviating member of $1 knows a little bit more, since he knows that he did not himself deviate but he has no mean to communicate this fact to the other players). More precisely, for t > tb if xt is requested by the strategy and if bt is observed, the players stick to or(S1) as long a s bt ~ ASl (xt).

Let t 2 = i n f { t > q l b f ~ A s ~ ( x ~ ) } . Compute N(xtz,bt2) and put $ 2 = N(xt2,bt2) c~ $1. Note that since bt2 q~As~(xt2), there is at least one player in S1 who cannot induce bt2. Therefore, $2 is a strict subset of Sl. The players are then requested to play cr(S2)[ht2], where ht2 is the history (a l , . . . , %-1, bt l , . . . , bt:), as long as they do not learn ore about the identity of the deviator. When some new information appears, the strategy switches to a(S3)[ht3] and so on.

When a player deviates and is detected, he involves a subset of players (the set of potential deviators) and all members of this subset will be punished. The deviator then chooses either to accept this punishment or to reveal more about his identity. In the first case, his deviation is not profitable, and in the second case, the punishments are directed against a smaller subset of players to which he still belongs. Even if the deviator never accepts the punishment, and always prefers to reveal more information, since there are finitely many players, the maximal information about his identity will be revealed after finitely many stages. The strategy plays then like one of the a(S) forever and the deviator is effectively punished.

Precisely, for all player i, for all detectable strategy z i , the above descrip- tion defines a sequence of subsets of players as follows:

�9 $1 = N(x;, ,bt~), with tl = inf{t >_ llat(a-i ,v i) = bt :/: at}, � 9 = O L 1 N(xtk, bt~), with tm = inf{t > tin-1 Ibt (~ Asm_~ (xt)}.

The strategy o- plays ~r(Sm)[ht,,], from tm+ 1 to tm+l. The sequence &, is stationary for m > n = INI.

A stage tm will be called a "switching time". Along every play, at most n switching times are met.

Page 15: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria of repeated games with public observation 107

We prove now that a is a uniform equilibrium. Let z i be a strategy of player i. Recall the definition of the blocks Bp in the proof of proposition 3.7. Let tip (z t a-i) be the average payoff of player i on the block Bp under (z i, a-i). Let P be the set of p's such that there is no switching time in Bp. From the properties of the main path defined in proposition 3.7, and of the a(S)'s, there is a sequence t/p, depending on u i only, with lim t/p = 0, such that for all p in P, flp(z ~, a -z) < qp + u ~. This is because IBp] goes to infinity with p. Since there is no switching time in Bp, either the main path crosses entirely Bp, or on Bp, one and only one of the strategies a(S) is played. For all this strategies, for p large enough, the average payoff does not exceed u i by much.

For T > 1, let p(T) be the number of block such that T ~ Bp(T)+I , and let k(T) = y']~(=r 1) [Bpl. We will use that limk(T)/T = 1. Recall that M =

maxxex ][g(x)[] > maXxex I#(x)l. We have then,

a-') < (T - k ( T ) ) M + Ekt(ff ) g:('c i, a -i) (T - k(T)) + k(T)

thus,

y;(zi,a_i) < T - k ( T ) l ~-~gi(zi G_i. ~ - k ( T ) M + k ( T ) ~ t , , :

The first term converges to zero. Let Q = { p e P , p <p(T)} and let m ( Z ) =E, QIB, t. Let B = Up~eBp. Since there are at most n switching times, lira m (T) /k (T) = 1. We find then that,

1 ~(r)~ i:,.gi k ( T ) - m ( T ) 1 ~ i i k(T) 2' ' 'gt( ' a - i ) < m - ~ M + ~ ( T ) t~e Bgt(z'a-i)"

t=l

The length of any block containing a switching time is less than p(T) 2. Thus, k(T) - m(T) < np(T) 2. Hence,

2 1 ~ii i I ~--~ . <nP(r)~+~..,gt(z,a-). k ( r ) ~__l g~('r~'a-i) k(T) " , - : t~B

Again the first term converges to zero. We have then,

�9 �9 1

and,

Page 16: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

108 T. Tomala

1 1

Since, lira ~/p = 0, the proof is complete.

5.2. The equilibrium strategy for two players

The description of the equilibrium strategy is much easier in the two-player case, since when a deviation is detected, the identity of the deviator is known to all the players. Hence, for u E co N c~ ('~ieN V({i}), take a strategy ~r as in proposition 3.7. There is no profitable undetectable deviations. To complete the description of the strategy, when a player notices a deviation, he punishes his opponent to his minmax level v i forever (recall remark 4.5). The proof that this is a uniform equilibrium is now standard in repeated games.

6. Concluding remarks

6.1. Public equilibria of games with general observation

A repeated game with general observation specifies for each player a signal- ling function : ' : X --+ A i, where A i is the set of signal for player i. The set of histories of length t for player i is then Ht, the product of A' , t times with itself. Let H i = Ut>0 HI, a strategy for player i is a mapping a i : H i ~ X i.

From a system of signals (:i) i~N can be defined a public signal which is the public par t of the system ( : i ) ieN (see Sorin, 1995). This is done through the finest a-algebra that includes the information of each player. This public signal is such that it is the maximal one that can be given to all players, in addition to their private signal, without increasing their information. Let : be this public signal. A strategy for player i in the repeated game F~(( : i ) i~N) is public if it depends on the public signal only. A public (uniform) equilibrium is a profile of public strategies such that no player has a profitable deviation (in the sense of uniform equilibria) which is a public strategy. A public equi- librium is an equilibrium (Sorin, 1995). Let us denote by E*((~i)ieN), the set of payoff vectors associated to public equilibria of Foo ((: i) i ~ U)" Obviously, E ~ ( ( g i ) i e N ) =Eoo(E). Thus, our main theorem provides a formula for E * ((: i ) ieU) as a function of the public part of the signal.

6.2. Subgame perfect equilibria

As usual in undiscounted infinitely repeated games, the requirement of per- fection does not change the set of equilibrium payoffs (see e.g. Sorin, 1992). A simple modification of the equilibrium strategy turns it into a perfect equilib- rium. The key argument is that a punishing phase should last for a finite number of stages, and the play should go back to the equilibrium path. It is thus harmless to punish. To ensure that punishments of deviations from the equilibrium path are efficient, just proceed the following way. If the deviation takes place at stage n, punish until the payoff of all players that are suspected

Page 17: Pure equilibria of repeated games with public observation · PDF filePure equilibria of repeated games with public observation 95 1985). There is a technical reason for restricting

Pure equilibria of repeated games with public observation 109

is within 1/n of the equilibrium payoff. This occurs after a finite number of periods because of the strong properties of convergence of our strategy.

6.3. Open problems

- The expression of the sets V(S) is quite complicated. Although we did not find a simpler expression, a simplification is highly desirable.

- How to generalize such results to mixed strategies? Up to now, the biggest problem seems to define a set of suspects when the strategies are mixed on the equilibrium path.

- Giving similar result for discounted games seems to be possible as far as good properties of approachability are granted for low discount factor. How- ever, the problem of finitely repeated games is much more intricate. Benoit and Krishna (1987) give a sufficient condition for the set of Nash equilibrium payoffs of the T-fold repeated game to converge to E~ as T goes to infinity. Namely, for each player i there is a Nash equilibrium of the one-shot game with payoff vector e(i), for which ei(i) > v i. The natural generalization of this condition would be that for each subset of players S, there is a one-shot Nash equilibrium payoff e(S) that lies in the interior of V(S), but actually this does not seem to be sufficient. A challenging problem is thus to find a sufficient condition for the set of Nash equilibrium payoffs of the T-fold repeated game to converge to E~.

References

Benoit J-P, Krishna V (1985) Finitely repeated games. Econometrica 53:905-922 Benoit J-P, Krishna V (1987) Nash equilibria of finitely repeated games. Int. J. Game Theory

16:197-204 Blackwell D (1956) An analog of the minmax theorem for vector payoffs. Pacific. J. Math. 6:1-8 Dalkey N (1953) Equivalence of information patterns and essentially determinate games. In:

Kuhn HV, Tucker AW (eds.) Contributions to the theory of games, vol. 2, Princeton Uni- versity Press

Fudenberg D, Levine D, Maskin E (1994) The folk theorem with imperfect public information. Econometrica 62:997-1039

Kohlberg E (1973) Optimal strategies in repeated games with incomplete information. Int. J. Game Theory 4:7-24

Lehrer E (1989) Lower equilibrium payoffs in repeated games with non-observable actions. Int. J. Game Theory 18:57-89

Lehrer E (1990) Nash equilibria of n-player repeated games with semi-standard information. Int. J. Game Theory 19:191-217

Lehrer E (1992) Two player repeated games with non-observable actions and observable payoffs. Math. Oper. Res. 17:200-224

Mertens JF, Sorin S, Zamir S (1994) Repeated games, part A, background material. CORE Discussion Paper 9420

Sorin S (1992) Repeated games with complete information. In: Attmann R, Hart S (eds.) Hand- book of game theory with economic applications, vol. 1, North-Holland, pp. 71-107

Sorin S (1995) Cooperation through repetition 1: Complete information, working paper 9501, THEMA, Universit6 Paris 10, France

Tomala T (1996) Nash equilibria of repeated games with observable payoff vector, mimeo