Download - Game Theory: Experiments - Berkeley-Haasfaculty.haas.berkeley.edu/hoteck/PAPERS/Experiments.pdf · September, 2006 3 Game Theory: Experiments Teck H. Ho Motivation Nash equilibrium

September, 2006

1

Teck H. HoGame Theory: Experiments

Game Theory: Experiments

Teck H. HoUniversity of California, Berkeley

September, 2006

2


OutlineMotivation and Modeling Philosophy

Mutual Consistency: p-Beauty Contest

Instant Convergence: Median-Effort Game

Self-Interest: Ultimatum Game

Summary

September, 2006

3


MotivationNash equilibrium and its refinements: Dominant theories in economics and business for predicting behaviors in games.Subjects do not play Nash in many one-shot games.Behaviors do not converge to Nash with repeated interactions in some games.Multiplicity problem (e.g., coordination games).Modeling subject heterogeneity really matters in games.

September, 2006

4


Standard Assumptions in Equilibrium Analysis

Assumptions Nash Cognitive Experience-Weighted Social Equilbirum Hierarchy Attraction Learning Preferences

"Solution" Method

Strategic Thinking X X X X

Best Response X X X X

Mutual Consistency X X

Utility Function

Self-Interest X X X

Instant Convergence X X X

September, 2006

5


Modeling Philosophy

Simple (Economics)General (Economics)Precise (Economics)Empirically disciplined (Psychology)

“the empirical background of economic science is definitely inadequate...it would have been absurd in physics to expect Kepler and Newton without Tycho Brahe” (von Neumann & Morgenstern ‘44)

“Without having a broad set of facts on which to theorize, there is a certain danger of spending too much time on models that are mathematically elegant, yet have little connection to actual behavior. At present our empirical knowledge is inadequate...” (Eric Van Damme ‘95)

September, 2006

6


OutlineMotivation and Modeling Philosophy

Mutual Consistency: p-Beauty Contest

Instant Convergence: Median-Effort Game

Self-Interest: Ultimatum Game

Summary

September, 2006

7


Example 1: p-Beauty Contest

n players

Every player simultaneously chooses a number from 0 to 100

Compute the group average

Define Target Number to be 0.7 times the group average

The winner is the player whose number is the closet to the Target Number

The prize to the winner is US$20Ho, Camerer, and Weigelt (AER, 1998)

September, 2006

8


Ingredients of an Economic Experiment

Abstract: Task is abstract so that we do not introduce auxiliary preferences

Saliency: Rewards must be associated with subjects’ actions

Dominance: Payoffs dominate thinking costs (i.e. mistakes are not cheap)

September, 2006

9


Economic vs. Psychological Experiments

ECONOMIC EXPERIMENTS PSYCHOLOGICAL EXPERIMENTS

LANGUAGE MOSTLY QUANTITATIVE MOSTLY VERBAL

TASK ABSTRACT CONTEXT SPECIFIC

SUBJECT'S MOTIVATION MONEY (PERFORMANCE-BASED) COURSE REQUIREMENT / FIXED FEE

THEORY PREDICTION MOSTLY POINT MOSTLY DIRECTIONAL

September, 2006

10


A Sample of CEOs

David Baltimore President California Institute of Technology

Donald L. Bren

Chairman of the BoardThe Irvine Company

Eli BroadChairmanSunAmerica Inc.

Lounette M. Dyer Chairman Silk Route Technology

David D. Ho Director The Aaron Diamond AIDS Research Center

Gordon E. Moore Chairman Emeritus Intel Corporation

Stephen A. Ross Co-Chairman, Roll and Ross Asset Mgt Corp

Sally K. Ride President Imaginary Lines, Inc., and Hibben Professor of Physics

September, 2006

11


Results in various p-BC contests

Subject Pool Group Size MeanCEOs 20 37.980 year olds 33 37.0Economics PhDs 16 27.4Portfolio Managers 26 24.3Game Theorists 27-54 19.1

September, 2006

12


The Cognitive Hierarchy (CH) Model

People are different and have different decision rules

Modeling heterogeneity (i.e., distribution of types of players). Types of players are denoted by levels 0, 1, 2, 3,…,

Modeling decision rule of each type

Camerer, Ho, and Chong (2004, QJE)

September, 2006

13


Modeling Decision RuleProportion of k-step is f(k)

Step 0 choose randomly

k-step thinkers know proportions f(0),...f(k-1)

Form beliefs and best-respond based on beliefs

Iterative and no need to solve a fixed point

g k ( h ) =f ( h )

f ( h ' )h ' = 1

K −1

∑

September, 2006

14


Choices of players in p-Beauty Contest

Level 0 chooses randomly with an average of 50

Level 1 assumes all others are level 0 and best respond with a choice of 35

Level 2 assumes others are levels 0 and 1. If f(0) = f(1), then level 2 will choose a value of

75.29)352150

21(7.0 =⋅+⋅⋅

September, 2006

15


Modeling Heterogeneity, f(k)

A1:

sharp drop-off due to increasing difficulty in simulating others’ behaviors

A2: f(0) + f(1) = 2f(2)

kkfkf τ

=− )1(

)(

September, 2006

16


Implications

!)(

kekf

kττ ⋅= −A1 Poisson distribution with mean and variance = τ

A1,A2 Poisson, τ=1.618 (golden ratio Φ).

September, 2006

17


Poisson Distribution

f(k) with mean step of thinking τ: !)(

kekf

kττ ⋅= −

Poisson distributions for various τ

00.05

0.10.15

0.20.25

0.30.35

0.4

0 1 2 3 4 5 6number of steps

frequ

ency τ=1

τ=1.5τ=2

September, 2006

18


Theoretical Properties of CH Model

Theory: τ ∞ converges to Nash equilibrium in (weakly) dominance solvable games (e.g., p-Beauty Contest)

Advantages over Nash equilibriumCan “solve” multiplicity problem (also picks one statistical distribution) Sensible interpretation of mixed strategies (de facto purification)

September, 2006

19


Results in various p-BC games

Subject Pool Group Size Mean Error (Nash) Error (CH) τ

CEOs 20 37.9 -37.9 -0.1 1.080 year olds 33 37.0 -37.0 -0.1 1.1Economics PhDs 16 27.4 -27.4 0.0 2.3Portfolio Managers 26 24.3 -24.3 0.1 2.8Game Theorists 27-54 19.1 -19.1 0.0 3.7

September, 2006

20



Assumptions Nash Cognitive EWA Social Equilbirum Hierarchy Learning Preferences

"Solution" Method




Utility Function

Self-Interest X X X


September, 2006

21


Example 2: Median-Effort Game

September, 2006

22


Actual Choices inMedian-Effort Game Over Time

1 2 3 4 5 6 71

4

7

10

0.000

0.100

0.200

0.300

0.400

0.500

0.600

Strategy

Period

Figure 1a: Actual choice frequencies

Van Huyck, Battalio, and Beil (1990)

September, 2006

23


The learning setting

In games where each player is aware of the payoff tablePlayer i’s strategy space consists of discrete choices indexed by j (e.g., 1, 2,…, 7)The game is repeated for several roundsAt each round, all players observed:

Strategy or action history of all other playersOwn payoff history

September, 2006

24


Research question

To develop a good descriptive model of adaptive learning to predict the probability of player i (i=1,…,n) choosing strategy j at round t in any game

)(tPij

September, 2006

25


Criteria of a “good” model

Satisfies plausible principles of behavior (i.e., conformity with other sciences such as psychology)

Fits and predicts choice behavior well

Ability to generate new insights

September, 2006

26


“Laws” of Effects in Learning

Law of actual effect: successes increase the probability of chosen strategies

Teck is more likely than Colin to “stick” to his previous choice (other things being equal)

Law of simulated effect: strategies with simulated successes will be chosen more often

Colin is more likely to switch to T than M

Law of diminishing effect: Incremental effect of reinforcement diminishes over time

$1 has more impact in round 2 than in round 7.

854

L R8

109

TMB

Row player’s payoff table

Colin chose B and received 4Teck chose M and received 5Their opponents chose L

September, 2006

27


EWA Learning and its Special Cases

Choice reinforcement learning (Thorndike, 1911; Bush and Mosteller, 1955; Herrnstein, 1970; Arthur, 1991; Erev and Roth, 1998): successful strategies played again

Belief-based Learning (Cournot, 1838; Brown, 1951; Fudenberg and Levine 1998): form beliefs based on opponents’ action history and choose according to expected payoffs

EWA learning nests both as special cases

September, 2006

28


Assumptions of Reinforcement and Belief Learning

Reinforcement learning ignores simulated effect

Belief learning predicts actual and simulated effects are equally strong

EWA learning allows for a positive (and smaller than actual) simulated effect

September, 2006

29


The EWA Model)0(),0( NAijInitial attractions and experience (i.e., )

Updating rules

Choice probabilities

11)(

)(

)())(,()1()1(

)( )(

))(,()1()1(

)(

+ )−(⋅=

⎪⎪⎩

⎪⎪⎨

⎧

≠⋅+−⋅−⋅

=+−⋅−⋅

=−

−

tΝtN

tsstN

tsstAtN

tsstN

tsstAtN

tAiij

iijiij

iijiijiij

ij

ρ

πδφ

πφ

∑=

⋅

⋅

=+i

ik

ij

m

k

tA

tA

ij

e

etP

1

)(

)(

)1(λ

λ

Camerer and Ho (Econometrica, 1999)

September, 2006

30


EWA Model and Laws of Effects

Law of actual effect: successes increase the probability of chosen strategies (positive incremental reinforcement increases attraction and hence probability)Law of simulated effect: strategies with simulated successes will be chosen more often ( δ > 0 )Law of diminishing effect: Incremental effect of reinforcement diminishes over time ( N(t) >= N(t-1) )

11)(

)(

)())(,()1()1(

)( )(

))(,()1()1(

)(

+ )−(⋅=

⎪⎪⎩

⎪⎪⎨

⎧

≠⋅+−⋅−⋅

=+−⋅−⋅

=−

−

tΝtN

tsstN

tsstAtN

tsstN

tsstAtN

tAiij

iijiij

iijiijiij

ij

ρ

πδφ

πφ

September, 2006

31


The EWA model: An Example

Period 0:

Period 1:

)0(),0( BT AA

1)0()0()0()1(

1)0(8)0()0()1(

+⋅4+⋅⋅

=

+⋅⋅+⋅⋅

=

NNAA

NNAA

BB

TT

ρφ

ρδφ


History: Period 1 = (B,L)T

B

L R

8

4

9

10

September, 2006

32


Reinforcement Model: An Example

Period 0:

Period 1:

)0(),0( BT RR

4)0()1()0()1(

+⋅=

⋅=BB

TT

RRRR

φ

φ


History: Period 1 = (B,L)

1)0()0()0()1(

1)0(8)0()0()1(

+⋅4+⋅⋅

=

+⋅⋅+⋅⋅

=

NNAA

NNAA

BB

TT

ρφ

ρδφ

RF1)0(0 If

⇒ ,= ,=0,=

EWANρδ

T

B

L R

8

4

9

10

September, 2006

33


Belief-based(BB) model: An Example

Period 0:

Period 1:

)0()0()0( RL NNN +=

)0(10)0(4)0()0(10)0(4)0(

)0(9)0(8)0()0(9)0(8)0(

)0()0()0(

)0()0()0(

NNNBBE

NNNBBE

NNB

NNB

RLRLB

RLRLT

RR

LL

⋅+⋅=⋅+⋅=

⋅+⋅=⋅+⋅=

==

T

B

L R

8

4

9

10

1)0(4)0()0()1(10)1(4)1(

1)0(8)0()0()1(9)1(8)1(

1)0(0)0()1(

1)0(1)0()1(

+⋅+⋅⋅

=⋅+⋅=

+⋅+⋅⋅

=⋅+⋅=

+⋅+⋅

=+⋅+⋅

=

NNEBBE

NNEBBE

NNB

NNB

BRLB

TRLT

RR

LL

ρρ

ρρ

ρρ

ρρ

Bayesian Learning with Dirichlet priors

September, 2006

34


Relationship between Belief-based (BB) and EWA Learning Models

1)0(4)0()0( )1(10)1(4)1(

1)0(8)0()0( )1(9)1(8)1(

1)0(0)0()1(

1)0(1)0()1(

+⋅+⋅⋅

=⋅+⋅=

+⋅+⋅⋅

=⋅+⋅=

+⋅+⋅

=+⋅+⋅

=

NNEBBE

NNEBBE

NNB

NNB

BRLB

TRLT

RR

LL

ρρ

ρρ

ρρ

ρρ

1)0()0()0()1(

1)0(8)0()0()1(

+⋅4+⋅⋅

=

+⋅⋅+⋅⋅

=

NNAA

NNAA

BB

TT

ρφ

ρδφ

BB1 If

⇒=,=

EWAφρδ

BB EWA

September, 2006

35


Model InterpretationSimulation or attention parameter (δ ) : measures the degree of sensitivity to foregone payoffs

Exploitation parameter ( ): measures how rapidly players lock-in to a strategy (average versus cumulative)

Stationarity or motion parameter ( φ ): measures players’perception of the degree of stationarity of the environment

φρφκ −

=

September, 2006

36


Model Interpretation

CournotWeighted Fictitious Play

Fictitious Play

Average Reinforcement

Cumulative Reinforcement

September, 2006

37


New InsightReinforcement and belief learning were thought to be fundamental different for 50 years.

For instance, “….in rote [reinforcement] learning success and failure directly influence the choice probabilities. … Belief learning is very different. Here experiences strengthen or weaken beliefs. Belief learning has only an indirect influence on behavior.” (Selten, 1991)

EWA shows that belief and reinforcement learning are related and special kinds of EWA learning

September, 2006

38


Actual versus EWA Model Frequencies

1 2 3 4 5 6 7

1

2

3

4

56

78

910

0.000

0.100

0.200

0.300

0.400

0.500

0.600

Strategy

Period

Figure 1d: EWA mode l fre que nc ie s

1 2 3 4 5 6 7

1

4

7

10

0.0000.100

0.200

0.300

0.400

0.500

0.600

Strategy

Period


September, 2006

39


Estimation and ResultsGame Calibration Validation Model No. of Parameters LL AIC BIC ρ 2 LL MSD

Median Action(M=378)

1-segment

Random Choice 0 -677.29 -677.29 -677.29 0.0000 -315.24 0.1217

Choice Reinforcement 8 -341.70 -349.70 -365.44 0.4837 -80.27 0.0301

Belief-based 9 -438.74 -447.74 -465.45 0.3389 -113.90 0.0519

EWA 11 -309.30 -320.30 -341.94* 0.5271 -41.05 0.0185

2-Segment

Random 0 -677.29 -677.29 -677.29 0.0000 -315.24 0.1217

Choice Reinforcement 17 -331.25 -348.25 -381.70 0.4858 -66.32 0.0245

Belief-based 19 -379.24 -398.24 -435.62 0.4120 -70.31 0.0250

EWA 23 -290.25 -313.25* -358.51 0.5374* -34.79* 0.0139*

September, 2006

40


ExtensionsHeterogeneity (JMP, Camerer and Ho, 1999)

Payoff learning (EJ, Camerer, Ho, and Wang, 2006)

Sophistication and strategic teaching Sophisticated learning (JET, Camerer, Ho, and Chong, 2002)Reputation building (GEB, Chong, Camerer, and Ho, 2006)

EWA Lite (Self-tuning EWA learning) (JET, Ho, Camerer, and Chong, forthcoming)

Applications:Auction markets (Book Chapter, Camerer, Ho, and Hsia, 2000)Product Choice at Supermarkets (JMR, Ho and Chong, 2004)

September, 2006

41




"Solution" Method




Utility Function

Self-Interest X X X


September, 2006

42


Example 3: Ultimatum game

Proposer offers division of $10; responder accepts or rejectsEmpirical Regularities:

There are very few offers above $5Between 60-80% of the offers are between $4 and $5There are almost no offers below $2Low offers are frequently rejected and the probability of rejection decreases with the offer

Self-interest predicts that the proposer would offer 10 cents to the respondent and that the latter would accept

September, 2006

43


Could Subjects in US Learn to be More Self-Interested Over Time?(Roth et. al, 1991)

September, 2006

44


Does Stake Size Matter? (Slonim-Roth, 1998)

September, 2006

45


Do players learn to accept low offers at high stakes? No.

September, 2006

46


Ultimatum offer experimental sitesHenrich et. al (2001; 2005)

September, 2006

47


Ultimatum offers across 16 small societies (mean shaded, mode is largest circle…)

Mean offersRange 26%-58%

September, 2006

48


Low offers and low offers rejection rate

September, 2006

49


Modeling Challenge &Major Classes of Theories

The challenge is to have a general, precise, psychologically plausible model of social preferences

Three major classes of theories:Distributional (inequity-aversion. Fehr-Schmidt, Bolton-Ockenfels, Charness-Rabin)Reciprocal (Rabin)Self-image (Levine, Bernheim, Rotemberg)

September, 2006

50


A Model of Social Preference(Charness and Rabin, 2002)

Blow is a model that embeds the first 2 classes of theories. Player B’s utility is given as:

B’s utility is a weighted sum of her own monetary payoff and A’s payoff, where the weight places on A’s payoff depend on whether A is getting a higher or lower payoff than B and whether A has behaved unfairly.

otherwise. 0 and ,misbehaved hasA if 1 otherwise; 0 and , if 1 otherwise; 0 and , if 1 where

)1()(),(

=−==<==>=

⋅⋅−⋅−⋅−+⋅⋅+⋅+⋅=

qqssrr

qsrqsrU

AB

AB

BABAB

ππππ

πθσρπθσρππ

September, 2006

51


Parameters of the Model

The parameter θ models reciprocity

The parameters ρ and σ allow for a range of different distributional preferences

Difference aversion corresponds to σ < 0 < ρ < 1. (B likes money, and prefers that payoffs are equal, and prefers to lower A’s payoff when A does better than B)

Social-welfare preferences correspond to . (Subjects prefer more for themselves and the other person, but are more in favor of getting payoffs for themselves when they are behind than they are ahead)

Competitive preferences correspond to . (consistent with psychology of status, i.e., people like their payoffs to be high relative to others’ payoffs)

01 >≥≥ σρ

0≤≤ ρσ

September, 2006

52


September, 2006

53


Estimated Parameters

September, 2006

54




"Solution" Method




Utility Function

Self-Interest X X X


September, 2006

55


What Can Neuroscience Contribute?What does mental activity look like when mutual consistency condition is met (during equilibration)?

There is no room for skill difference in equilibrium analysis. In CH model, we have a precise concept of skill. Is it possible to create a map between mental activity and superior performance in a certain class of games?

How does mental activity tell us which learning theories are correct? EWA predicts both actual and simulated effects and that the former is stronger than the latter (mental activity at striatal-cortical loop versus cortex).

Is there a difference in mental activity when reciprocal or distributional fairness is invoked?

http://128.32.67.154/siq/default1.asp

September, 2006

56


September, 2006

57


Estimates of Mean Thinking Step τ

September, 2006

58


Figure 3b: Predicted Frequencies of Nash Equilibrium for Entry and Mixed Games

y = 0.707x + 0.1011R2 = 0.4873

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Empirical Frequency

Pred

icte

d Fr

eque

ncy

Nash: Theory vs. Data

September, 2006

59


Figure 2b: Predicted Frequencies of Cognitive HierarchyModels for Entry and Mixed Games (common τ)

y = 0.8785x + 0.0419

R2 = 0.8027

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Empirical Frequency

CH Model: Theory vs. Data

September, 2006

60


Actual versus Belief-Based Model Frequencies

1 2 3 4 5 6 71

47

10

0.000

0.200

0.400

0.600

Strategy

Period

Figure 1b: Be lie f-base d Mode l fre que ncie s

1 2 3 4 5 6 7

1

4

7

10

0.0000.100

0.200

0.300

0.400

0.500

0.600

Strategy

Period


September, 2006

61


Actual versus Reinforcement Model Frequencies

1 2 3 4 5 6 7

1

3

5

7

9

0.000

0.100

0.200

0.300

0.400

0.500

0.600

Strategy

Period

Figure 1c: Choice reinforcement model frequencies

1 2 3 4 5 6 7

1

4

7

10

0.0000.100

0.200

0.300

0.400

0.500

0.600

Strategy

Period


September, 2006

62


September, 2006

63


September, 2006

64


Fair offers correlate with market integration (top), payoffs to cooperation in everyday life (bottom)

MI: How much do peoplerely on market exchangein their daily lives?

PC: How important and how large is a group’s payoff fromcooperation in economic production?