QR 38 3/15/07, Repeated Games I I.The PD II.Infinitely repeated PD III.Patterns of cooperation.

30
QR 38 3/15/07, Repeated Games I I. The PD II.Infinitely repeated PD III.Patterns of cooperation
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    228
  • download

    0

Transcript of QR 38 3/15/07, Repeated Games I I.The PD II.Infinitely repeated PD III.Patterns of cooperation.

QR 38

3/15/07, Repeated Games I

I. The PD

II. Infinitely repeated PD

III. Patterns of cooperation

I. The Prisoners’ Dilemma

• The PD is a mixed-motive game, in which the players can gain from cooperation but have an incentive to defect.

• Players can be labeled cheaters or cooperators.

• The equilibrium of the one-shot game is for both to defect. This leaves both unhappy; it is not Pareto optimal.

OPEC quotas

Quota Overproduce

Quota 60, 60 36, 70

Overproduce 70, 36 50, 50

Kuwait

SaudiArabia

The PD

• OPEC is a classic cartel example.• It reveals that mutual cooperation does not

have to be socially desirable; but it is beneficial for the players themselves.

• PDs may arise throughout social interaction: economic collusion, trade protection, arms control, alliance competition, ethnic conflict, regions competing for investment.

The PD• Political theorists have also studied the PD, e.g.,

Hobbes’ argument that the state of nature is a war of all against all.

So looking for ways to “solve” the PD has occupied many analysts. Some mechanisms simply make the PD disappear:

• Allow for external enforcement or binding commitments (Schelling)

• Find some way to know what the other player will do on a given move

• Change payoffs

Solving the PD

More interesting is how the PD can be solved on its own terms.

Most solutions focus on the importance of repetition, which allows for a strategy of reciprocity.

• The basic idea is that cheating today may lead to the collapse of cooperation in the future.

• If the value of future cooperation is high enough, this threat can deter cheating today.

Solving the PD

If repetition works to solve the PD, cooperation is tacit and self-enforcing.

• Consider a finitely-repeated game first. Can use rollback to find the equilibrium.

• Say that OPEC members know they will be playing for only three months.

• The game represents profits each month.

Finitely-repeated PD• First consider what will happen in month 3.

– Cheating is the dominant strategy, so everyone will cheat in this month.

• If everyone will cheat in month 3, there is no incentive to cooperate in month 2.

• And so no incentive to cooperate in month 1 either.

• In a finitely-repeated PD, the equilibrium is no cooperation.

• This holds for any PD repeated a known number of times.

II. Infinitely-repeated PD

Infinite repetition leads to different results, because the players can take the history of the game into account when deciding what to do in any particular round.

• Since they don’t know when the game will end, players don’t use rollback.

Strategies in the infinitely-repeated PD

If play in one round takes into account behavior in previous rounds, the players are using contingent strategies.

A common example is trigger strategies: players cooperate as long as the other does, but any defection triggers a period of punishment.

• The punishment period lasts for a specified number of rounds, and the player punishing plays defect during this period.

Trigger strategies

Two common trigger strategies:• Grim trigger• Tit-for-tat (TFT)

Consider the OPEC game. What happens to Kuwait’s payoff if it cheats and SA is playing TFT?

• Kuwait will gain 10 on the first round, then lose 10 on the next round (assuming it continues to cheat).

Value of cheating

To calculate whether it is worth cheating, need to know what weight to put on present versus future gains.

• Generally, a gain today is worth more than a gain tomorrow.– Money made today can be invested, earn

interest.– Gains in future may be uncertain.

Value of cheating• The relative importance of the present and

future become central.• Players’ optimal strategies will depend on

how much they care about the future; how far-sighted they are.

• Thinking about this in economic terms, we need to calculate the total return (r) on an investment.

• Note that we need more than ordinal payoffs to make the following calculations.

Present values

First, consider whether it is worth cheating only once against TFT.

• In OPEC example, this would lead to an immediate gain of 10, but a loss of 24 on the next round.

Need to find the present value (PV) of 24: how much earned today is worth the same as 24 earned in the future.

Present values• The PV is the amount you gain

immediately.• Worth cheating if this is more than the PV

of future losses.• In this case, want to find the conditions

under which it is worthwhile to gain 10 today while losing 24 on the next round.

• Cheating will be worthwhile if the immediate amount gained plus return on this investment is greater than 24.

Present values

• The immediate amount gained plus return on investment is PV + r(PV).

• So it is worthwhile to cheat if:

PV + r(PV) > 24

PV > 24/(1+r)

• r is like an interest rate.

• Here, only worth cheating if 10>24/(1+r); r>1.4 (140% interest)

Present values

• What we have just done is to calculate the present value of 24.

• The PV of 24 depends on the rate of return.

Cheating forever

Second, consider whether it is worth cheating forever.

• Now, losses are incurred over an infinite horizon.

• Lose 10 each week after the initial gain.

• Lose an infinite sum of 10s, but the PV is smaller each week; future losses are discounted.

Cheating forever• As above, next week’s loss (10) is worth

10/(1+r) from today’s perspective.

• The loss in the first week of continued cheating is 10/(1+r).

• Week 2: 10/(1+r)2.

• Week 3: 10/(1+r)3.

• Week n: 10/(1+r)n.

• Lose an infinite sum of 10’s, but PV of this smaller each week.

Cheating forever• So, by cheating forever, you lose the sum

from n=1 to n=infinity of 10/(1+r)n, where n indicates the round number.

• r is the rate of return; assuming it is positive

• 1/(1+r)<1; this is known as d or (delta); the discount factor.

Cheating forever

• The mathematical rule for infinite sums converges to the value 10/r in this case

• So worthwhile to cheat forever if 10>10/r; r>1.

• Some infinite sums converge, as each term gets smaller.

• The series approaches, but never reaches, the value of the sum.

• Some infinite sums don’t converge.

Discounting in international relations

What does the rate of return mean in IR?

• Interest rate

• Chance the game will end

• D&S distinguish between r and R; R incorporates the chance p that the game continues. Treat these as the same, for our purposes – both are the rate of return.

General formula

• Generalize the discussion by using units for payoffs: C (cooperate), D (defect), H (high), L (low).

C D

C C, C L, H

D H, L D, D

General formula

• One-time gain from cheating is H-C

• Loss from being punished after cheating once is C-L

• Per-period loss for perpetual cheating is C-D

• Algebra same as earlier

General formula

• Cheat once against TFT if

(H-C)>(C-L)/(1+R)

R>((C-L)/(H-C))-1

• Cheat forever if R>(C-D)/(H-C)

When to cheat?

So the calculation about cheating depends on the immediate gain from cheating, future costs, and the shadow of the future.

• Can think of the shadow of the future as the discount factor (1/(1+r)), the importance of the future relative to the present.

• When the rate of return is high, this is close to 0, when low, close to 1.

When to cheat?

• Immediate gain from cheating likely to be high if cheating is not detected immediately.

• So monitoring becomes important to sustain agreements to cooperate.

• International institutions do a lot of monitoring. Then cooperation can become self-enforcing (if states care about the future).

III. Patterns of cooperation

Applying these ideas: Axelrod’s tournament.

• Note different notation: w is the “discount parameter,” the same as the discount factor.

• Round-robin tournament, 200 rounds.

• Asked participants to submit strategies in order to study their properties.

Axelrod’s tournament

Results:

• No strategy is always best; it depends on what strategies the others are using.

• Nice strategies don’t defect first; they can be the best choice if others are willing to cooperate

• TFT robust because it is nice, provocable, and forgiving

Axelrod’s tournament• TFT did best in the first round, and again in the

second round when everyone knew the first round’s results.

Evolutionary approach: a population of players using a particular strategy. A strategy can be invaded if a new strategy does better than the existing population. If it cannot be invaded, the strategy is collectively stable.

TFT is collectively stable as long as w is high enough. However, ALL D is also collectively stable.