Meaningful Learning in Weighted Voting Games: An Experiment · meaningful learning with no feedback...

ソシオネットワーク戦略ディスカッションペーパーシリーズ ISSN 1884-9946

第 62 号 2018 年 6 月

RISS Discussion Paper Series

No.62 June, 2018

文部科学大臣認定共同利用・共同研究拠点

関西大学ソシオネットワーク戦略研究機構

The Research Institute for Socionetwork Strategies, Kansai University

Joint Usage / Research Center, MEXT, Japan

Suita, Osaka, 564-8680, Japan

URL: http://www.kansai-u.ac.jp/riss/index.html

e-mail: [email protected]

tel. 06-6368-1228

fax. 06-6330-3304

Supplementary Results for“Meaningful Learning

in Weighted Voting Games: An Experiment”

Naoki Watanabe

文部科学大臣認定共同利用・共同研究拠点

関西大学ソシオネットワーク戦略研究機構

The Research Institute for Socionetwork Strategies, Kansai University

Joint Usage / Research Center, MEXT, Japan

Suita, Osaka, 564-8680, Japan

URL: http://www.kansai-u.ac.jp/riss/index.html

e-mail: [email protected]

tel. 06-6368-1228

fax. 06-6330-3304

Supplementary Results for“Meaningful Learning

in Weighted Voting Games: An Experiment”

Naoki Watanabe

Supplementary Results for “Meaningful Learning

in Weighted Voting Games: An Experiment”∗

Naoki Watanabe†

March 19, 2020

Abstract

This paper supplements the experimental results presented by Guerci et al. (2017),

where subjects choose one of two weighted voting games (options), as follows. (1) It

was reconfirmed that without any payoff-related feedback information, subjects learned

to choose the option that generates higher expected payoffs for them (correct option)

and that they generalized what they had thought introspectively in a binary choice

problem to a similar but different one (meaningful learning). (2) The subjects who had

experienced a binary choice problem without any feedback information, in some peri-

ods when they were first faced with the similar but different binary choice problem in

which their meaningful learning was observed, took as a long time on average to make

their decision as those who received immediate feedback information. Those subjects

spent a significantly longer time than inexperienced subjects who were faced with the

same binary choice problem. (3) When subjects received immediate feedback informa-

tion about payoffs, in many cases, we could not reject the hypothesis that experienced

subjects did not take the win-stay-lose-shift strategy or we might not think that they

searched for the correct options with a certain rule.

Keywords: meaningful learning, bandit experiment, weighted voting, feedback infor-

mation, introspective thinking

JEL Classification: C91, D72, D83∗This paper is an outgrowth of the author’s previous work. The author wishes to thank his co-authors of

that paper, Eric Guerci and Nobuyuki Hanaki, for giving him their permission to reuse the data, and DanielFriedman, Midori Hirokawa, Yoichi Hizen, Tatsuya Kameda, and Naoko Nishimura for their comments onthat previous work which motivated him to provide these supplementary results. This research could nothave been completed without the research assistance of Yoichi Izunaga, Toru Suzuki, Hiroshi Tanaka, andYosuke Watanabe. Financial support from Foundation for the Fusion of Science and Technology (FOST),MEXT Grants-in-Aid 24330078 and 25380222, JSPS-ANR bilateral research grant “BECOA” (ANR-11-FRJA-0002), and Joint Usage/Research of ISER at Osaka University are gratefully acknowledged.

†Graduate School of Business Administration, Keio University, 4-1-1 Hiyoshi Kohoku, Yokohama, Kana-gawa 223-8526, Japan; E-mail: [email protected]; Phone: +81-45-564-2039

1

1 Introduction

A major question in experimental studies on learning in games is whether subjects can gen-

eralize what they have learned in a situation to other similar but different situations, which

is called “meaningful learning” (Rick and Weber, 2010), “transfer of learning” (Cooper

and Kagel, 2003, 2008), or “epiphany” (Dufwenberg et al., 2010).1 We investigate this

higher-order concept of learning in a situation related to weighted voting games and aim to

supplement the results shown in Guerci et al. (2017).

In weighted voting experiments (Montero et al., 2008; Guerci et al., 2014), the outcomes

varied largely from one round to another in each session, which implies that in those strategic

situations it would be too complex for subjects to make deep inference about the underlying

relationships between their nominal votes and expected payoffs. Guerci et al. (2017) thus

drastically simplified the experimental design to remove subjects’ learning through their

strategic interaction; in each session subjects choose one of two weighted voting games

(options) repeatedly and obtain their payoffs which are stochastically generated for each

choice they make according to a voting theory. The binary choice problems are different

between the first and second parts of the session, but the payoff generating function in

binary choice problems remains the same throughout the experiment. Subjects thus have a

chance to learn something underlying the situation they face in the first part and to apply

what they learned in the first part to their decision in the second part.

By focusing on individual learning in the way described above, Guerci et al. (2017)

observed that (i) subjects learned to choose the option that generates higher expected

payoffs for them (correct option) even without any payoff-related feedback information

and that (ii) there was statistically significant evidence of meaningful learning only for

the treatment with no payoff-related feedback information.2 In their sessions, however,

meaningful learning with no feedback information might be a consequence of inertia on

subjects’ choice, because the correct options were the same choice (Choice 2) in the first

and second parts of the sequences of binary choice problems. Thus, the first aim of this

paper is to confirm whether observation (ii) was not the consequence of the inertia on

subjects’ choice by changing the correct options between the two parts.

1In the context of learning across multiple games, where a game is played together with other games,“behavioral spillover” (Cason et al., 2012; Bednar et al., 2012) and “learning spillover” (Grimm and Mengel,2012; Mengel and Sciubba, 2014) are related but different concepts from meaningful learning, althoughpossible reasons underlying these spillovers vary and include meaningful learning as one of them.

2Weber (2003) and Rick and Weber (2010) observed that withholding feedback information promotedmeaningful learning in p-beauty contest games. Neugebauer et al. (2009), however, reported that subjectsdid not learn to play the dominant strategy in a public goods game unless they received feedback information.

2

The other aim of this paper is to explore what hindered subjects from meaningful learn-

ing or gaining a deeper insight on the relationship between their nominal votes and expected

payoffs when immediate feedback information was given to them. As noted in Guerci et al.

(2017), in the post-experimental questionnaire, some subjects who received feedback infor-

mation reported that they changed their choices when they received zero points, while they

did not when they received a non-zero amount of points. It would be plausible for subjects

to take this “win-stay-lose-shift” strategy (Nowak and Sigmund, 1993), when they had not

yet been convinced of what they had learned. On the other hand, if subjects neither took

that strategy nor randomized their choice in some way, then it would then be plausible for

us to infer that they surely learned something, although what they learned might be wrong.

Also, when it was difficult for subjects to choose an option, they would need a longer time

for their deep inference, whereas they might give up thinking in a few seconds and choose

options according to the win-stay-lose-shift strategy or randomize their choice in some way.

Without any payoff-related feedback information, however, subjects could not choose the

win-stay-lose-shift strategy, and thus a longer time for introspective thinking, for example,

might facilitate deeper inference without misleading or confusing information, if any.

This paper shows the following results, examining the data taken from the additional

sessions and reexamining the data used in Guerci et al. (2017). (1) It was reconfirmed

that without any feedback information, subjects learned to choose the correct option and

confirmed that in a sequence of binary choice problems, they meaningfully learned even if

the correct option were changed from Choice 1 in the first part to Choice 2 in the second part

in the sequence. (2) The subjects who had experienced a binary choice problem without

any feedback information took as a long time on average to make their decision as those

who received feedback information, in some periods when they were first faced with the

similar but different binary choice problem in which their meaningful learning was observed.

Those experienced subjects spent a significantly longer time than inexperienced subjects

who were faced with the same binary choice problem. (3) When subjects received immediate

feedback information about payoffs, in many cases, we could not reject the hypothesis that

experienced subjects did not take the win-stay-lose-shift strategy or we might not think

that they searched for the correct options with a certain rule.

Those results stated above implies that observation (ii) reported in Guerci et al. (2017)

was not the consequence of the inertia on subjects’ choice. There should be some fac-

tors that promoted subjects’ meaningful learning without any feedback information. In

weighted voting situations, however, feedback information about their payoffs might lead

false inference of subjects or confuse their inference.

3

The rest of this paper proceeds as follows. Section 2 describes the experimental design.

Section 3 presents the results, where we deal with length of time to choose, win-stay-

lose-shift strategy, and some randomness of subjects’ choice. Section 4 briefly reviews the

literature behind this research and summarizes the results of this paper for further research.

2 Experimental design

This paper reexamines the data used in Guerci et al. (2017) and also examines the data

taken from the additional sessions, and thus we need to review the experimental design

for all sessions conducted for that paper. Subsection 2.1 describes the session outline and

Subsection 2.2 explains the experimental procedure. The instructions given to subjects are

attached in the Appendix of Guerci et al. (2017).

2.1 Session outline

There are three treatments, each of which is explained at the end of this subsection. For

each treatment, there are some sessions. Each session consists of 60 periods. In each period,

the subject is asked to choose one of two four-member committees that will divide 120 points

among the members. Let N = {1, 2, 3, 4} be the set of the members (players) to join. A

committee (a weighted voting game) is represented by [q; v1, v2, v3, v4], where q is the quota

(the minimal number of votes required for an allocation to be adopted) and vi is the voting

weight (the number of votes) allocated to player i ∈ N . The subject acts as player 1 in

this experiment. Both committees have the same total number of votes, the same quota,

and the same number of votes for the subject. In each session, the subject faces one binary

choice problem for the first 40 periods, while in the following 20 periods the subjects face

another but different binary choice problem; for example, in the first 40 periods subjects

face a choice between [14; 5, 3, 7, 7] and [14; 5, 4, 6, 7], while in the following 20 periods they

are faced with [6; 1, 2, 3, 4] and [6; 1, 1, 4, 4]. Subjects are not asked to play the weighted

voting games that they choose themselves.3 Before the start of each session, subjects are

clearly informed that the other three members of the committees are all fictitious. Thus,

the experiment is regarded as a two-armed bandit experiment with contextual information

on voting situations related to the possible payoffs for subjects.4

3This point is to avoid the complexities mentioned at the beginning of the Introduction; subjects aresimultaneously learning to play a weighted voting game from interfering with the other subjects learningabout the underlying relationship between their nominal voting weights and their expected payoffs.

4In signaling games, Cooper and Kagel (2003, 2008) showed that the use of meaningful context serves asa catalyst for meaningful learning.

4

The payoff each subject obtains from his or her committee choice is determined by an

external function based on a power index in voting theory called DPI (Deegan and Packel,

1978). Given a weighted voting game, a non-empty subset S of N is called a coalition,

and a coalition is called a winning coalition if∑

i∈S vi ≥ q; otherwise, it is called a losing

coalition. A minimum winning coalition (MWC) is a winning coalition such that deviation

by any member of the coalition turns its status from winning to losing. In the experiment,

for each period, one MWC is drawn with equal probability from all the possible MWCs for

the committee that the subject has chosen. If the subject is a member of the drawn MWC,

then he or she receives an equal share of the total payoff with the other members.

The binary choice problems we use are shown in Table 1 (See Table 13 for more details.).5

We hereafter denote each MWC by the votes apportioned to its members. For example, the

MWCs in a committee [14; 5, 3, 7, 7] are written as (5, 3, 7), (5, 3, 7), and (7, 7). Suppose that

[14; 5, 3, 7, 7] is chosen by a subject (player 1) in a period. Then, player 1 has a 2/3 chance

of being on the MWCs and will receive 1/3 of the total payoff (120 points) for being on

each of the MWCs, while there is a 1/3 probability that player 1 will not be on the MWC

and will receive nothing. We did not explain this underlying payoff generating function

to our subjects; they were simply told that payoffs were determined based on a theory of

decision-making in committees. Note, however, that we use binary choice problems in which

the better committees for the subjects are the same regardless of whether we employ DPI

or other power indices such as BzI (Banzhaf, 1965) and SSI (Shapley and Shubik, 1954).6

Problem Choice 1 (Expected payoff) Choice 2 (Expected payoff)

A [14;5, 3, 7, 7] (120 × 2/3 × 1/3) [14;5, 4, 6, 7] (120 × 3/4 × 1/3)B [6;1, 2, 3, 4] (120 × 1/3 × 1/3) [6;1, 1, 4, 4] (120 × 2/3 × 1/3)C [14;3, 5, 6, 8] (120 × 2/3 ×1/3) [14;3, 6, 6, 7] (120 × 3/4 × 1/3)D [9;1, 3, 5, 6] (120 × 1/3 × 1/3) [9;1, 2, 6, 6] (120 × 2/3 × 1/3)

Table 1: Four binary choice problems used in the experiment. In each committee, subjects are all assignedto player 1, and the number of votes given to player 1 is shown in bold. Choice 2 generates a higher expectedpayoff for the subjects in all binary choice problems listed here.

5In terms of both expected payoffs and the set of MWCs available in each choice, Problems A and Care identical, and Problems B and D are identical. There is a crucial difference between the two options inProblem A ([14; 5, 3, 7, 7] vs. [14; 5, 4, 6, 7]) and Problem C ([14; 3, 5, 6, 8] vs. [14; 3, 6, 6, 7]). In each of thesetwo problems, one option has two “large” voters who can form an MWC on their own, whereas the otheroption does not. In Problem B ([6; 1, 2, 3, 4] vs. [6; 1, 1, 4, 4]) and Problem D ([9; 1, 3, 5, 6] vs. [9; 1, 2, 6, 6]),there is no such clear difference between the two options as there are two large voters who can form anMWC by themselves in both options.

6Guerci et al. (2017) adopted DPI as our payoff generating function because, in all the experimentsreported in Montero et al. (2008), Aleskerov et al. (2009), Esposito et al. (2012), Guerci et al. (2014), andWatanabe (2014), the most frequently observed winning coalitions were MWCs.

5

Subjects are faced with one of the following sequences of binary choice problems: A → B,

B → A, C → D, orD → C (the order of problems is indicated by the arrows), where the first

problem is used in the first 40 periods, and the second problem is used in the subsequent 20

periods. In Guerci et al. (2017), as shown in Table 1, the committee that generates a higher

expected payoff for subjects (correct option) is Choice 2 for all problems listed there. In

the additional sessions, however, the order of the correct options displayed on the subjects’

monitors was changed in each sequence of binary choice problems; the correct option is

Choice 1 in the first 40 periods and it is Choice 2 in the subsequent 20 periods.

We have three feedback treatments: (1) no feedback, (2) partial feedback, and (3) full

feedback. For the no-feedback treatment, subjects are not informed of any payoffs they

receive as the result of their committee choice until the session ends. For the partial-

feedback treatment, each subject is informed of his or her own payoff in the committee he

or she chooses. For the full-feedback treatment, after each choice subjects are informed of

the payoffs of all four players in the committee they chose. We impose a 30-second time

limit for the choice stage and a 10-second limit for the feedback stage, regardless of the

amount of feedback information. If a subject does not choose a committee within the 30

seconds of the choice stage, then he or she obtains zero points for that period. In this case,

regardless of the treatment, in the feedback stage the subject receives special feedback that

he or she has obtained zero points for the period because they failed to make a choice within

the time limit. This rule is clearly stated in the written instructions and explained aloud.

If a subject makes an early choice, say within the first 10 seconds of the choice stage,

then a waiting screen is shown until all the subjects in the session have made their de-

cisions. If all the subjects make their choice before the end of the 30-second time limit,

then they all enter the feedback stage. For the no-feedback treatment, during the 10-second

feedback stage subjects are shown a screen conveying the message “Please wait until the

experiment continues.” For the full-feedback and partial-feedback treatments, the relevant

payoff information is displayed during these 10 seconds.

2.2 Experimental procedure

The sessions whose results were reported in Guerci et al. (2017) were conducted at the

Institute of Social and Economic Research (ISER) at Osaka University in June 2014 and

at the University of Tsukuba in November 2014. There were 3 treatments, and for each

treatment there were 4 sequences of binary choice problems, as described above. Each

session in Osaka (June) involved 20 subjects and each session in Tsukuba (Nov) involved

6

10 subjects, and thus 360 subjects participated in those sessions. The sessions in Tsukuba

(Nov) were conducted for a robustness check to the results obtained in the sessions in Osaka

(June), and the main results were reported in Guerci et al. (2017) with the data pooled at

the two locations. For the no-feedback treatment, the additional sessions were conducted

also at the ISER in September 2014. Each session in Osaka (Sept) involved 10 subjects, and

thus we had 40 subjects in total there. The subjects were undergraduate students recruited

from all over the campus at each location, but third- or fourth-year economics majors were

excluded. Each subject participated once in this experiment, and he or she was randomly

assigned to a session in each month and at each location.

The experiment was computerized with z-Tree (Fischbacher, 2007). Each session lasted

around 60 minutes including the time for administering the instructions and the post-

experiment questionnaire. At the beginning of each session, subjects were provided with a

written instruction upon arrival, and then the experimenter read it aloud. No communi-

cation among subjects was allowed. Subjects were allowed to ask questions regarding the

instruction and they were given the answers which other subjects could hear. Thereafter,

any information available to the subjects was provided through their computer screens. We

followed other bandit experiments (Meyer and Shi, 1995; Hu et al., 2013) for the payment

scheme. In the instruction, each subject was informed that in addition to the show-up fee of

JPY 1000, he or she would receive payment according to the total points he or she obtained

over all 60 periods at a rate of 1 point = JPY 1. The average earning of our subjects was

JPY 2500 (about 18 USD in 2014.)

3 Results

Subsection 3.1 gives a brief overview of the data taken from the sessions in Osaka (Sept).

In Subsections 3.2 and 3.3, we reconfirm learning and meaningful learning of subjects who

participated in sessions for the no-feedback treatment by using the same data processing

methods as the ones used in Guerci et al. (2017). Subsection 3.4 examines all the data we

have and compares the length of time subjects took to determine their choice in sessions for

the no-feedback treatment with the one in sessions for the other treatments. Subsection 3.5

investigates whether subjects took the win-stay-lose shift strategy in sessions for the partial-

feedback and full-feedback treatments by reexamining the data used in Guerci et al. (2017).

In Subsection 3.6 statistically tests the randomness of subjects’ individual choice. In what

follows, we employ 5% significance level in rejecting the null hypotheses.7

7The data are available from the author upon request.

7

3.1 Overview of the data: New Sessions with no feedback information

No-fb(a) A→B (b) B→A

(c) C→D (d) D→C

Figure 1: Time series of the percentage of subjects who chose the correct option among those who chosewithin the time limit in the sessions conducted in Osaka (Sept). 10 out of 40 subjects failed to make a choicewithin the time limit at least once in the 60 periods. The correct option was Choice 1 in the first 40 periodsand was Choice 2 in the subsequent 20 periods.

Figure 1 presents the time series of the percentage of subjects in Osaka (Sept) who chose

the correct option among those who chose within the time limit in each of the four sequences

of binary choice problems; the dotted vertical line in each panel separates Period 40 and

Period 41 to indicate that the problems were different in the periods before and after the

line. In the additional sessions, as noted in Subsection 2.1, the correct option was Choice 1

in the first 40 periods, which was changed to Choice 2 in the subsequent 20 periods.

In Period 1, as is seen in Figure 1, subjects found the correct option easier in Problem A

(C) than in Problem B (D), which was reported also in Guerci et al. (2017) for all the three

feedback treatments. Further, we could not reject the null hypothesis that the percentage

of subjects who chose the correct option among those who chose within the time limit in

Period 1 were equal between the additional sessions and the original sessions conducted for

Guerci et al. (2017) in any of the four binary choice problems; the p-values for the two-sided

χ2 test are 0.526, 0.559, 0.853, and 0.673 in Problems A, B, C, and D, respectively. Thus,

it is not inappropriate to compare the new data with those used in Guerci et al. (2017) in

the following subsections, although we should continue to care about this point there.

8

The features of the new sessions are summarized in Table 2.8

Table 2: Features of the new sessions.

session sequence of show-up point-to- # of session avg. paymentno. binary choice fee (JPY) JPY ratio subj. date to subject

1 A → B 1000 1.0 10 Sept.24, 2014 25522 B → A 1000 1.0 10 Sept.24, 2014 24423 C → D 1000 1.0 10 Sept.25, 2014 26284 D → C 1000 1.0 10 Sept.25, 2014 2472

3.2 Learning the correct option in new sessions: Osaka (Sept)

In Subsectoins 3.2 and 3.3, we analyze the time series of individual choice. Let FRik denote

the relative frequency of periods in which subject i chose the correct option within the k-th

block of 5 consecutive periods, that is, from 5(k − 1) + 1 to 5k. FRi2 is, e.g., the number of

times subject i chose the correct option in Period 6 to Period 10, divided by 5. The change

in the relative frequencies that subject i chose the correct option between the l-th block

and the m-th block is defined as

∆FRil,m = FRi

l − FRim.

We hereafter drop superscript i from ∆FRil,m and simply refer to them as ∆FRl,m. For

each of four sequences of binary choice problems, we had 10 subjects in Osaka (Sept) who

participated in the treatment with no payoff-related feedback information, and thus we have

10 observations of ∆FRl,m for each sequence of binary choice problems.

We expect experience in introspective thinking to improve learning for choosing the

correct option even without any payoff-related feedback information. For each binary choice

problem, the p-value for the one-tailed signed-rank (SR) test for each feedback treatment

is thus reported in Table 3, where the null hypothesis is ∆FR2,1 ≤ 0 (∆FR8,1 ≤ 0) and the

alternative hypothesis is ∆FR2,1 > 0 (∆FR8,1 > 0).

As shown in Table 3, for all problems, values in ∆FR2,1 were not significantly greater

than zero but values in ∆FR8,1 were significantly greater than zero. Thus, we see that,

overall, the distributions of ∆FR8,1 lie towards the right of those for ∆FR2,1, which may

imply that more subjects learned to choose the correct option when they were given more

time to learn even without any payoff-related feedback information.

8A more detailed summary is available upon request.

9

Problem A Problem B Problem C Problem D

∆FR2,1 0.2948 0.1763 0.1990 0.3392∆FR8,1 0.0339 0.0463 0.0125 0.0142

Table 3: P-values of one-tailed SR test in the sessions for the no-feedback treatment.

Observation 1. It was reconfirmed that subjects learned to choose the option with higher

expected payoffs even without any feedback information regarding their payoffs.9

3.3 Meaningful learning (learning transfer) in new sessions: Osaka (Sept)

Guerci et al. (2017) considered that for each binary choice problem, meaningful learning

was observed if the following two criteria are both satisfied. We say that subjects are

experienced if they have completed their choice in the first 40 periods.

(a) There was a significant increase in number of subjects who chose the correct option

between Period 1 and Period 41.

(b) There was a significant increase from FR1 to FR9.

Note that in criteria (a) and (b), subjects who chose in Period 1 or for FR1 and those in

Period 41 or for FR9 are faced with different sequences of binary choice problems.

First, we consider whether criterion (a) is satisfied. The left-most column No-fb (10) in

Table 4 presents the numbers of subjects who chose the correct option in Period 1 and Period

41 in each binary choice problem in the additional sessions for the no-feedback treatment.

For reference, Table 4 also lists those numbers reported in Guerci et al. (2017) for the

no-feedback treatment, the partial-feedback treatment, and the full-feedback treatment,

respectively, in the columns No-fb (30), Part-fb (30), and Full-fb (30). The numbers of

subjects who participated in sessions for those treatments are noted in parentheses.

The p-values for the χ2 test are also reported in Table 4, where the null hypothesis

is that the percentages of inexperienced subjects and experienced subjects who chose the

correct options are the same when they first encountered the same problem. Significantly

more experienced subjects chose the correct options than inexperienced subjects for the

no-feedback treatment (No-fb (30)) in Problem B (p = 0.002), the additional sessions for

the no-feedback treatment (No-fb (10)) in Problem C (p = 0.019), and for the no-feedback

9Guerci et al. (2007) reported that in Problem B and Problem D, subjects learned to choose the optionwith higher expected payoffs even without any payoff-related feedback information.

10

No-fb No-fb Part-fb Full-fb(10) (30) (30) (30)

Problem A

Inexperienced 7 20 25 22Experienced 7 15 16 17

p-value (χ2) 1.000 0.190 0.012 0.176

Problem B


p-value (χ2) 0.653 0.002 0.176 0.781

Problem C


p-value (χ2) 0.019 0.260 1.000 1.000

Problem D


p-value (χ2) 0.025 <0.001 0.003 0.114

Table 4: Numbers of subjects who chose the correct option in Period 1 and Period 41 for the binary choiceproblems used in the additional and the original sessions for the no-feedback treatment (No-fb (10), No-fb(30)), the partial-feedback treatment (Part-fb(30)), and the full-feedback treatment (Full-fb (30)). For eachtreatment, the number of subjects who participated in sessions for the treatment is noted in parentheses. Thep-values for χ2 tests are reported for comparison between inexperienced and experienced subjects. Guerciet al. (2017) reported the p-values for No-fb (30), part-fb (30), and Full-fb (30).

(No-fb (10), No-fb (30)) and the partial-feedback treatments in Problem D (p = 0.025,

p < 0.001, and p = 0.003, respectively). Thus, those five cases satisfied criterion (a).

Next, we consider whether criterion (b) is satisfied by comparing the relative frequency in

which inexperienced subject i chose the correct option within the first block of 5 consecutive

periods (Periods 1 to 5) with the relative frequency in which experienced subject j chose

the correct option within the first block of 5 consecutive periods when they faced the same

problem within the ninth block of 5 consecutive periods (Periods 41 to 45). Table 5 lists

p-values for the Mann-Whitney U-test (MW) test as well as the Kolmogorov-Smirnov test

(KS) of the distributions of FR1 (inexperienced subjects) and FR9 (experienced subjects)

for each feedback treatment in each binary choice problem, where the null hypothesis is

FR1 = FR9. For No-fb (10), the p-values are not calculated with the MW test and the KS

11


Problem A

p-value (perm, MW) 0.445 0.867 0.854 0.547p-value (KS) 0.913 0.889 0.283

Problem B


Problem C


Problem D

p-value (perm, MW) 0.001 <0.001 0.264 0.791p-value (KS) <0.001 0.103 0.990

Table 5: P-values for testing FR1 = FR9. Guerci et al. (2017) reported the p-values for the Mann-WhitneyU-test (MW) and the Kolmogorov-Smirnov test (KS). For No-fb (10), the p-values are not calculated withthe MW test and the KS test but with the permutation test (perm), because of the small sample size. InNo-fb (10), criteria (a) and (b) are satisfied only for Problem D.

test but with a permutation test (perm), because of the small sample size.

In sessions for the partial-feedback and full-feedback treatments, as shown in Table 5,

the p-values for the MW and KS tests suggest that there be no significant difference between

FR1 and FR9 for all problems. Thus, meaningful learning was not observed for the partial-

feedback and full-feedback treatments. In sessions for No-fb (30), however, there were

significant differences between FR1 and FR9 in Problem B (p = 0.049 for MW and 0.030 for

KS, respectively) and in Problem D (p < 0.001 for MW and < 0.001 for KS, respectively).

For Problems B and D, criteria (a) and (b) were thus both satisfied. Therefore, Guerci et al.

(2017) reported that meaningful learning was observed only for the no-feedback treatment

in Problems B and D.

In the additional sessions for No-fb (10), there was a significant difference between them

in Problem D (p = 0.001 for perm), whereas there was no such a difference in Problem B

(p = 0.376 for perm). For Problem C, criterion (a) was truly satisfied, as was confirmed

above but criterion (b) was not. Therefore, only for Problem D criteria (a) and (b) are both

satisfied. We conclude as follows.

12

Observation 2. Statistically significant evidence of meaningful learning was reconfirmed

in Problem D in the additional sessions for the no-feedback treatment.

Note that we could reconfirm meaningful learning in Problem D in the sessions for the

no-feedback treatment, even if the correct option was Choice 1 in the first 40 periods and

it was changed to Choice 2 in the subsequent 20 periods. Thus, meaningful learning for the

treatment with no payoff-related feedback information is not a consequence of inertia of the

subject’s choice. Subjects generalized what they had thought introspectively in a choice

problem to another similar but different one.

3.4 Length of time to choose

Observation 2 states that meaningful learning was reconfirmed in Problem D in the addi-

tional sessions for the no-feedback treatment. Then, what were the potential factors pro-

moting meaningful learning of subjects without any payoff-related feedback information?

This subsection considers the time that elapsed until subjects made their choice.

Table 6 presents the average length of time for subjects to make their choice in the

1st block and the 9th block of 5 consecutive periods for each binary choice problem in

sessions for each treatment, which is measured in seconds. For every binary choice problem,

numerically, (inexperienced or experienced) subjects spent longer time in sessions for Part-fb

(30) and Full-fb (30) than in sessions for No-fb (10) and No-fb (30).

The p-values for the Brunner-Munzel test are reported Table 6, where the null hypothesis

is that the length of time for inexperienced subjects and experienced subjects spent are on

average the same in the first 5 consecutive periods when they were faced with the same

problem. In each of Problems B, C, and D, experienced subjects spent a significantly

longer time than inexperienced subjects, regardless of the treatments. The p-values for the

Kruskal-Wallis test are also provided there, where the null hypothesis is that the length of

time for subjects to choose in the same 5 consecutive periods when they were first faced with

a binary choice problem are on average the same among all treatments. Only in Problems

C and D with which the experienced subjects were faced, there was no significant difference

in average length of time they spent among all treatments.

As shown in Table 7, the p-values for the Brunner-Munzel test for testing imply that

there was no significant difference in average length of time subjects spent between No-fb

(10) and No-fb (30) in every binary choice problem, regardless of whether subjects were

inexperienced or experienced. Thus, it is not inappropriate to compare the new data with

those used in Guerci et al. (2017). Also, there was no that difference significantly between

13

No-fb No-fb Part-fb Full-fb(10) (30) (30) (30) p-value (KW)

Problem A

Inexperienced 14.260 15.380 20.547 17.380 0.003Experienced 18.200 15.533 20.102 22.287 <0.001

p-value (BM) 0.086 0.812 0.816 <0.001

Problem B

Inexperienced 12.060 13.980 16.753 18.113 0.004Experienced 17.700 17.120 22.567 23.373 <0.001

p-value (BM) 0.002 0.039 <0.001 <0.001

Problem C

Inexperienced 14.600 12.247 17.907 16.213 <0.001Experienced 20.760 19.149 21.353 21.167 0.240

p-value (BM) 0.049 <0.001 0.004 <0.001

Problem D

Inexperienced 14.240 13.400 15.847 17.087 0.046Experienced 19.800 18.913 20.800 20.667 0.195

p-value (BM) 0.016 <0.001 <0.001 0.022

Table 6: Average length of time (in seconds) for subjects to choose in the 1st block (for inexperiencedsubjects) and the 9th block (for experienced subjects) of 5 consecutive periods in each binary choice problemin sessions for each treatment. The p-values for the two-sided Brunner-Munzel test are provided for thecomparison of average length of time between inexperienced subjects and experienced subjects who werefaced with the same binary choice problem in sessions for each treatment. The p-values for the Kruskal-Wallis test are also provided for testing whether there is no difference in average length of time subjectsspent in the same 5 consecutive periods among all treatments.

Part-fb (30) and Full-fb (30) in almost every case (except for the inexperienced subjects in

Problem A). In Problems A and B, however, subjects spent a significantly longer time in

sessions for Part-fb (30) than in sessions for No-fb (30), regardless of whether subjects were

inexperienced or experienced. Note that in Problem D, there was no significant difference

in average length of time experienced subjects spent between No-fb (30) and Part-fb (30).

To summarize all the above argument, we can conclude as follows.

Observation 3. (1) Subjects who experienced their choice in sessions for the no-feedback

treatment took as a long time on average to make their choice as those who made their

choice in sessions for the partial-feedback treatment and the full-feedback treatment, when

they were first faced with Problem D in the 9th block of 5 consecutive periods. (2) Those

14

No-fb (10) No-fb (30) Part-fb (30)vs No-fb (30) vs Part-fb (30) vs Full-fb (30)

Problem A

Inexperienced 0.506 0.001 0.044Experienced 0.159 0.006 0.206

Problem B

Inexperienced 0.663 0.038 0.322Experienced 0.851 <0.001 0.307

Problem C

Inexperienced 0.402 <0.001 0.303Experienced 0.456 0.080 0.711

Problem D

Inexperienced 0.471 0.068 0.400Experienced 0.970 0.075 0.966

Table 7: P-values for the two-sided Brunner-Munzel test for testing that there is no difference in averagelength of time subjects spent between two treatments.

experienced subjects spent a significantly longer time than inexperienced subjects who were

faced with problem D in the 1st block of 5 consecutive periods.

As for (1) in Observation 3, we can partly follow the above argument in another way.

Table 8 lists the p-values for the one-sided Steel test (Steel) for multiple comparison of

average length of time for subjects to choose observed in 1st block and 9th block of 5

consecutive periods; subjects in No-fb (10) spent as a long time on average as those in

No-fb (30), Part-fb (30), and Full-fb (30), respectively.

Note that in view of the statistical results shown in Tables 6, 7, and 8, the same argument

as described above can be applied to Problem C as well, although meaningful learning was

not observed in Problem C. Also note that Guerci et al. (2007) reported meaningful learning

in Problem B, although the KW test (Table 6) suggests that there was a significant difference

in average length of time experienced subjects spent in Problem B among No-fb (10), No-fb

(30), Part-fb (30), and Full-fb (30). Truly, Problems A and C (B and D) are identical

in terms of MWCs and expected payoffs, but the statistical tests mentioned here suggest

that numerically different appearances of the binary choice problems induce subjects to

choose different options. By using some other appropriate experimental design, we could

15

No-fb Part-fb Full-fbSteel (H1 :<) (30) (30) (30)

Problem A

Inexperienced No-fb (10) 0.402 0.008 0.066Experienced No-fb (10) 0.968 0.238 0.034

Problem B


Problem C


Problem D


Table 8: P-values for the one-sided Steel test for multiple comparison of average length of time for subjectsto choose observed in 1st block and 9th block of 5 consecutive periods, where the alternative hypothesis (H1)is that as compared to the control group No-fb (10), they spent a longer time on average in the treatmentgroup No-fb (30), Part-fb (30), and Full-fb (30), respectively.

have considered such a cognitive or psychological effect of those appearances on subjects’

meaningful learning and the length of time for introspective thinking.

3.5 Switching options

The statistical test results shown in Subsection 3.3 imply that meaningful learning was not

observed in sessions for both the partial-feedback and the full-feedback treatments. Then,

what hindered subjects from meaningfully learning in sessions for those treatments with

feedback information? As noted in Guerci et al. (2017), in the post-experimental ques-

tionnaire, some subjects in sessions for the partial-feedback and full-feedback treatments

reported that they changed their choices when they received zero points, while they did

not when they received a non-zero amount of points. If subjects had not yet been con-

vinced of what they had learned, then it would be plausible for them to take this type of

“win-stay-lose-shift” strategy (Nowak and Sigmund, 1993).

In this experiment, subjects received 0 points or 40 points (Table 13 in Subsection 3.6),

Tables 9 - 12 list the frequencies of observing 0 points and 40 points (freq), the frequencies of

16

switching choices immediately after observing 0 points and 40 points (switch), the ratios of

switch to freq (ratio), for the partial-feedback treatment and the full-feedback treatment in

the 1st (B1), 8th (B8), 9th (B9), and 12th (B12) blocks of 5 consecutive periods, respectively.

The p-value for the two-sided Fisher exact test is reported for each binary choice problem,

where the null hypothesis is that switching choices immediately after observing 0 points and

switching choices immediately after observing 40 points were equally likely to be observed.

In the following analysis, we confine attention to Problem B in sequence A→ B and Problem

D in sequence C → D for which meaningful learning was observed in sessions for No-fb (30).

First, consider the case of the partial-feedback treatment. It is not plausible that in

B1 and B8 subjects took the win-stay-lose-shift strategy in Problem C (p = 0.361 and

p = 0.290) and also not plausible that subjects who experienced Problem C from B1 to

B8 and encountered Problem D in B9 took the win-stay-lose-shift strategy (p = 0.068).

Moreover, the null hypothesis was not rejected also in B12 (p > 0.999). On the other hand,

it is inferred that in B9 subjects who experienced Problem A from B1 to B8 and encountered

Problem B took the win-stay-lose-shift strategy (p =< 0.001), where “lose-shift” was chosen

at about 40% (=0.397) after 0 points realized while “win-stay” was chosen at about 89%

(=1-0.113) after 40 points realized, although the null hypothesis was rejected in B12. Thus,

we have following observation.

Observation 4. For the partial-feedback treatment, it is not plausible that subjects took the

win-stay-lose-shift strategy in many periods for sequence C → D, although we can reject the

hypothesis that experienced subjects did not take that strategy in some consecutive periods

after they were first faced with Problem B.

It is not plausible that subjects learned something in Problem C (p = 0.768 for the

one-sided SR test for the null hypothesis ∆FR8,1 ≤ 0). We can thus understand that

experienced subjects had not learned anything sure with Problem C they could generalize

to Problem D, and in fact Guerci et al. (2017) reported that meaningful learning was not

observed in Problem D. As noted above, however, it is also not plausible that in Problem C

subjects took the win-stay-lose-shift strategy in B8, which might imply that they gave up

even learning at last. We continue to investigate this point in the next subsection.

Next, consider the case of the full-feedback treatment. It is not plausible that in B1

subjects took the win-stay-lose-shift strategy in Problem C (p = 0.111) and yet they did in

B8 in Problems A and C (p = 0.313 and p = 0.155), as is the case for the partial-feedback

treatment. It is, however, inferred that subjects who experienced Problem C from B1 to

B8 and encountered Problem D took the win-stay-lose-shift strategy (p =< 0.001) and

17

Table 9: Frequency of switching choices in B1: two-sided Fisher testPart-fb (30) Full-fb (30)

0 point 40 points 0 point 40 points

Problem A freq 43 77 Problem A freq 44 76switch 23 18 switch 25 19ratio 0.535 0.234 ratio 0.568 0.250p-value 0.001 p-value 0.001

B freq 53 66 B freq 65 55switch 30 15 switch 36 13ratio 0.566 0.227 ratio 0.554 0.236p-value <0.001 p-value 0.001

C freq 27 90 C freq 39 81switch 12 30 switch 20 28ratio 0.444 0.333 ratio 0.513 0.346p-value 0.361 p-value 0.111

D freq 57 62 D freq 62 57switch 37 21 switch 37 13ratio 0.649 0.339 ratio 0.597 0.228p-value 0.001 p-value <0.001



Problem A freq 45 105 Problem A freq 49 101switch 15 23 swith 15 22ratio 0.333 0.219 ratio 0.306 0.218p-value 0.155 p-value 0.313

B freq 59 91 B freq 72 78switch 20 10 switch 21 9ratio 0.339 0.120 ratio 0.292 0.115p-value 0.001 p-value 0.008



that they still took the win-stay-lose-shift strategy (p = 0.004) in B12. We can infer that

subjects who experienced Problem A from B1 to B8 and encountered Problem B took the

18



Problem A freq 28 92 Problem A freq 40 80switch 15 13 switch 12 12ratio 0.536 0.130 ratio 0.300 0.150p-value <0.001 p-value 0.088

B freq 58 62 B freq 55 65switch 23 7 switch 28 14ratio 0.397 0.113 ratio 0.509 0.215p-value <0.001 p-value 0.001

C freq 36 83 C freq 35 85switch 16 16 switch 20 18ratio 0.444 0.193 ratio 0.571 0.212p-value 0.007 p-value <0.001


win-stay-lose-shift strategy (p =< 0.001) in B9, although the null hypothesis was rejected

in B12. In short, we have the following observation.

Observation 5. For the full-feedback treatment, it is inferred that experienced subjects

took the win-stay-lose-shift strategy in Problem D, and we can reject the hypothesis that

experienced subjects did not take that strategy some consecutive periods after they were first

faced with Problem B.

19



Problem A freq 60 85 Problem A freq 63 87switch 23 11 switch 19 9ratio 0.383 0.129 ratio 0.302 0.104p-value <0.001 p-value 0.003

B freq 36 114 B freq 48 102switch 4 15 switch 10 12ratio 0.111 0.132 ratio 0.208 0.118p-value >0.999 p-value 0.215


D freq 41 108 D freq 45 105switch 9 25 switch 22 25ratio 0.220 0.232 ratio 0.489 0.238p-value >0.999 p-value 0.004

20

3.6 Random choice of runs

Observation 4 implies that we could not conclude that in sequence C→ D, the win-stay-lose-

shift strategy hindered subjects from meaningful learning in sessions for the partial-feedback

treatment. Subjects might take some more complicated way in search of the correct options.

In this subsection, we thus consider a possibility of subjects’ random choice of runs in some

sequences of options (not the randomness of options themselves chosen by subjects).

Problem A Choice 1 [14; 5, 3, 7, 7] Choice 2 [14; 5, 4, 6, 7]

(71, 72) (0, 0, 60, 60) (5, 4, 6) (40, 40, 40, 0)(5, 3, 71) (40, 40, 40, 0) (5, 4, 7) (40, 40, 0, 40)(5, 3, 72) (40, 40, 0, 40) (5, 6, 7) (40, 0, 40, 40)

(4, 6, 7) (0, 40, 40, 40)

Problem B Choice 1 [6; 1, 2, 3, 4] Choice 2 [6; 1, 1, 4, 4]

(2, 4) (0, 60, 0, 60) (41, 42) (0, 0, 60, 60)(3, 4) (0, 0, 60, 60) (11, 12, 41) (40, 40, 40, 0)(1, 2, 3) (40 40, 40, 0) (11, 12, 42) (40, 40, 0, 40)

Problem C Choice 1 [14; 3, 5, 6, 8] Choice 2 [14; 3, 6, 6, 7]

(6, 8) (0, 0, 60, 60) (61, 62) (40, 40, 40, 0)(3, 5, 6) (40, 40, 40, 0) (1, 2, 61) (40, 40, 0, 40)(3, 5, 8) (40, 40, 0, 40) (1, 2, 62) (40, 0, 40, 40)

(61, 62, 7) (0, 40, 40, 40)

Problem D Choice 1 [9; 1, 3, 5, 6] Choice 2 [9; 1, 2, 6, 6]

(3, 6) (0, 60, 0, 60) (61, 62) (0,0, 60, 60)(5, 6) (0, 0, 60, 60) (1, 2, 61) (40, 40, 40, 0)(1, 3, 5) (40, 40, 40, 0) (1, 2, 62) (40, 40, 0, 40)

Table 13: MWCs and payoff vectors. The MWCs in [14; 5, 3, 7, 7] are, e.g., written here as (5, 3, 71),(5, 3, 72), and (71, 72) by the votes apportioned to the members, not with references to the specific players.

Under the null hypothesis, the number of runs in a sequence of options is a random

variable. When the payoffs associated with options are stochastically determined by random

variables, the null hypothesis is not rejected as often in the runs test, even if the sequence of

options is generated by some systematic rule.10 Accordingly, we apply the following criterion

10Table 12 lists MWCs and payoff vectors for each choice in four binary choice problems. In Problem D,denote by 1-1, 1-2, and 1-3 payoff vectors (0, 60, 0, 60), (0, 0, 60, 60), and (40, 40, 40, 0), respectively whenChoice 1 is chosen and by 2-1, 2-2, and 2-3 (0, 0, 60, 60), (40, 40, 40, 0), and (40, 40, 0, 40), respectivelywhen Choice 2 is chosen. Assume that when Choice k (= 1, 2) is successively chosen, payoff vectors realize inthe order of k-1→ k-2 → k-3 → k-1 → · · · , and assume in addition that when the alternative option is oncechosen after observing payoff vector k-i (i=1, 2, 3) and then Choice k is chosen again, the order resumesfrom the payoff vector next to k-i. Given that Choice 1 is chosen at Period 1, the win-stay-lose-shift strategygenerates the following sequence of 20 choices: 1, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, . . .. The p-value forthe runs test applied to the sequence of those choices is 0.8391, and thus the null hypothesis was not rejected,although they were generated systematically with the win-stay-lose-shift strategy.

21

to our analysis; when it is inferred that subjects took the win-stay-lose-shift strategy, we

prioritize the inference over the result in the runs test. If the null hypothesis is rejected

in the runs test, then we infer that the runs in the sequence of options were chosen with a

certain rule for searching the correct options.


Problem A

Inexperienced 5 22 12 11Experienced 3 15 9 W 15

Problem B

Inexperienced 4 22 13 W 11 WExperienced 5 13 8 w+rand 8 w+rand

Problem C

Inexperienced 5 14 9 rand 12Experienced 4 18 10 W 13 W

Problem D

Inexperienced 7 21 12 W 7 WExperienced 5 15 7 rand 7 W

Table 14: Numbers of subjects each of whom is counted if the null hypothesis was rejected, where thenull hypothesis is that the number of runs in a sequence of options each subject chose in Periods 21-40(Inexperienced) or in Periods 41-60 (Experienced) is a random variable. The sample size is 10 in each caseof No-fb (10) and 30 in each case of No-fb (30), Part-fb (30), and Full-fb (30). “W” (“w”) indicates that wecan infer that the win-stay-lose-shift strategy was taken in both B1 and B8 or in both B9 and B12 (in B9).“rand” indicates that we conclude that subjects did not search the correct options with any certain rules.When it is inferred that subjects took the win-stay- lose-shift strategy, we prioritize the inference over theresult in the runs test.

Table 14 presents the numbers of subjects each of whom is counted if the null hypothesis

was rejected in the runs test for each binary choice problem in sessions for each treatment.

The runs test is applied to a sequence of options each identical subject chose and is not

used for testing the randomness of options themselves but the randomness of the number of

runs in a sequence of options. Thus, 5 consecutive periods in B1, B8, B9, and B12 are too

few as the samples; for each subject, the runs test was here applied to a sequence of options

the subject chose in a binary choice problem in Periods 21-40 and in another one in Periods

41-60, respectively. For each case in each binary choice problem, the sample size is 10 in

sessions for No-fb (10) and 30 in No-fb (30), Part-fb (30), and Full-fb (30), respectively.

22

As shown in Subsection 3.2, for all four binary choice problems we examined, it was

reconfirmed that subjects learned to choose the correct option in sessions for No-fb (10)

(Observation 1). Table 14 shows that in Problem B, 40% of inexperienced subjects searched

for the correct options with a certain rule in sessions for No-fb (10). Thus, we presume in

the following analysis that subjects did not search the correct options with any certain rules,

if 30% or less of subjects are counted for each of whom the null hypothesis was rejected in

the runs test and if it is not inferred that subjects took the win-stay-lose-shift strategy.

The first part of Observation 4 states that for the partial-feedback treatment, it is not

plausible that subjects took the win-stay-lose-shift strategy in many periods for sequence

C → D. Then, what happened? 9 inexperienced subjects in Problem C and 7 experienced

subjects in Problem D are counted for each of whom the null hypothesis was rejected in the

runs test. According to the criterion described above, we have the following observation.

Observation 6. In sessions for the partial-feedback treatment, it is inferred that in sequence

C → D, subjects did not search for the correct options with any certain rules.

As stated in the second parts of Observations 4 and 5, we can reject the hypothesis

that in Problem B, experienced subjects did not take the win-stay-lose-shift strategy in B9

for both the partial-feedback treatment and the full-feedback treatment. In B12, however,

we cannot reject that hypothesis. Moreover, 8 inexperienced subjects and 8 experienced

subjects are counted for each of whom the null hypothesis was rejected in the runs test.

According to the criterion described above, we have the following observation.

Observation 7. In Problem B, in sessions for both partial-feedback treatment and the full-

feedback treatment, it is inferred that experienced subjects took the win-stay-lose-shift strategy

in some consecutive periods after they were first faced with the binary choice problem but

finally they did not search for the correct options with a certain rule.

Table 14 indicates that when subjects received immediate feedback information about

payoffs, in many cases, we could not reject the null hypothesis that experienced subjects

did not take the win-stay-lose-shift strategy or we might not think that they searched for

the correct options with a certain rule. In the full-feedback treatment, subjects observe

that (i) payoffs are equally allocated among a subset of committee members, and (ii) these

subsets of committee members change from period to period. Thus, as compared to sessions

for the partial-feedback information, we expected it to be much easier to learn about the

underlying relationship between payoffs and votes in this treatment compared to the other

two treatments. In weighted voting situations, feedback information about their payoffs

might lead false inference of subjects or confuse their inference.

23

4 Final remarks

Standard models of learning in games need some feedback information with which people

learn something; providing feedback information promotes learning in the theory of rein-

forcement learning (e.g., Erev and Roth, 1998), belief-based learning (e.g., Cheung and

Friedman, 1997), and experience weighted attraction (EWA) learning (Camerer and Ho,

1999).11 In experiments of p-beauty-contest games, however, Weber (2003) and Rick and

Weber (2010) found that withholding feedback information promotes meaningful learning

in the sense that subjects learn to perform iterated dominance, although it is a higher-order

concept of learning than that the standard models of learning deal with.

Following Weber (2003) and Rick and Weber (2010), some researchers attempted to

reconfirm similar observations in various situations. Neugebauer et al. (2009) reported that

subjects do not learn to play the dominant strategy in a public goods game without feedback

information. In a bandit experiment with a context of weighted voting games, however,

Guerci et al. (2017) reported that even without any payoff-related feedback information,

subjects learned to choose the correct option and they generalized what they had thought

introspectively in a binary choice problem to another similar but different one. As a next

step for understanding human behavior of learning and calling for a new class of models of

learning that allow for introspective thinking, we needed to further explore what hindered

subjects from meaningful learning when immediate payoff-related feedback information was

given to them, and we should have figured out what factor might induce subjects to gain a

deeper insight on the relationship between their nominal votes and expected payoffs without

any feedback information.

In this paper we showed that subjects’ inference might be mistaken or confused when

feedback information was given to them in a context of weighted voting and that a longer

thinking time might induce subjects to deeply infer some underlying relationship between

votes and payoffs without misleading or confusing feedback information. Then, in some

way in another experiment, we should next confirm whether meaningful learning is observed

when subjects who do not choose options corresponding to immediate feedback information.

To further proceed with this strand of research, the cognitive ability of subjects will also

be taken into account as an important potential factor related to the possibility of their

meaningful learning. There might be some features we should capture with which subjects

think the binary choice problems in their mind.

11Arifovic et al. (2006) showed that those models fail to replicate human behavior in a repeated two-persongame, and Erev et al. (2010) reported that those models do not perform well in predicting how people behavein market entry games.

24

References

Aleskerov, F., Beliani, A., Pogorelskiy, K. (2009). Power and preferences: an experimental

approach. National Research University, Higher School of Economics, mimeo.

Arifovic, J., McKelvey, R.D., Pevnitskaya, S. (2006). An initial implementation of the Turing

tournament to learning in two person games. Games Econ. Behav. 57, 93-122.

Banzhaf, J.F. (1965). Weighted voting doesn’t work: a mathematical analysis. Rutgers Law

Review 19, 317-343.

Bednar, J., Chen, Y., Liu, T. X., Page, S. (2012). Behavioral spillovers and cognitive load

in multiple games: An experimental study. Games Econ. Behav. 74, 12-31.

Cason, T. N., Savikhin, A. C., Sheremeta, R. M. (2012). Behavioral spillovers in coordina-

tion games. European Economic Review 56, 233-245.

Camerer, C., Ho, T.-H. (1999). Experience-weighted attraction learning in normal form

games. Econometrica 67, 827-874.

Cheung, Y.-W., Friedman, D. (1997). Individual learning in normal form games: some

laboratory results. Games Econ. Behav. 19, 46-76.

Cooper, D.J., Kagel, J.H. (2003). Lessons learned: generalized learning across games. Amer.

Econ. Rev. 93, 202-207.

(2008). Learning and transfer in signaling games. Econ. Theory 34, 415-439.

Deegan, J., and Packel, E. (1978). A new index of power for simple n-person games. Int.

Jour. Game Theory 7, 113-123.

Dufwenberg, M., Sundaram, R., Butler, D. J. (2010). Epiphany in the game of 21. Jour.

Econ. Behav. Org. 75, 132-143.

Erev, I., Roth A.E. (1998). Predicting how people play games: reinforcement learning in

experimental games with unique, mixed strategy equilibria. Amer. Econ. Rev. 88, 848-81

Erev, I., Ert, E., Roth, A.E. (2010). A choice prediction competition for market entry games:

an introduction. Games 1, 117-136.

Esposito, G., Guerci, E., Lu, X., Hanaki, N., Watanabe, N. (2012). An experimental study

on “meaningful learning” in weighted voting games. mimeo., Aix-Marseille University.

25

Fischbacher, U. (2007). z-Tree: Zurich toolbox for ready-made economic experiments. Ex-

per. Econ. 10, 171-178.

Grimm, V., Mengel, F. (2012). An experiment on learning in a multiple games environment.

Jour. Econ. Theor. 147, 2220-2259

Guerci, E., Hanaki, N. Watanabe, N. (2017). Meaningful learning in weighted voting games:

an experiment. Theory. Decis. 83, 131-153.

Guerci, E., Hanaki, N. Watanabe, N., Esposito, G., Lu, X. (2014). A methodological note

on a weighted voting experiment. Soc. Choice Welfare 43, 827-850

Hu, Y., Kayaba, Y., Shum, M. (2013). Nonparametric learning rules from bandit experi-

ments. Games Econ. Behav. 81, 215-231.

Mengel, F., Sciubba E. (2014). Extrapolation and structural similarity in games. Econ. Let.

125, 381-385.

Meyer, R.J., Shi, Y. (1995). Sequential choice under ambiguity: intuitive solutions to the

armed-bandit problem. Manag. Sci. 41, 817-834.

Montero, M., Sefton, M., Zhang, P. (2008). Enlargement and the balance of power: an

experimental study. Soc. Choice Welfare 30, 69-87.

Neugebauer, T., Perote, J., Schmidt, U., Loos, M. (2008). Selfish-biased conditional cooper-

ation: on the decline of contributions in repeated public goods experiments. Jour. Econ.

Psy. 30, 52-60.

Nowak, M,A., Sigmund, K. (1993). A strategy of win stay, lose shift that outperforms

tit-for-tat in the prisoner’s-dilemma game. Nature 364, 56-58.

Rick, S., Weber, R.A. (2010). Meaningful learning and transfer of learning in games played

repeatedly without feedback. Games Econ. Behav. 68, 716-730.

Shapley, L.S., Shubik, M. (1954). A method for evaluating the distribution of power in a

committee system. Amer. Polit. Sci. Rev. 48, 787-792.

Watanabe, N. (2014). Coalition formation in a weighted voting experiment. Japanese Jour.

Electral Stud. 30, 56-67.

Weber, R.A. (2003). Learning with no feedback in a competitive guessing game. Games

Econ. Behav. 44, 134-144.

26

Meaningful Learning in Weighted Voting Games: An Experiment · meaningful learning with no feedback...

Documents

Transcript of Meaningful Learning in Weighted Voting Games: An Experiment · meaningful learning with no feedback...