Meaningful Learning in Weighted Voting Games: An Experiment · meaningful learning with no feedback...
Transcript of Meaningful Learning in Weighted Voting Games: An Experiment · meaningful learning with no feedback...
ソシオネットワーク戦略ディスカッションペーパーシリーズ ISSN 1884-9946
第 62 号 2018 年 6 月
RISS Discussion Paper Series
No.62 June, 2018
文部科学大臣認定 共同利用・共同研究拠点
関西大学ソシオネットワーク戦略研究機構
The Research Institute for Socionetwork Strategies, Kansai University
Joint Usage / Research Center, MEXT, Japan
Suita, Osaka, 564-8680, Japan
URL: http://www.kansai-u.ac.jp/riss/index.html
e-mail: [email protected]
tel. 06-6368-1228
fax. 06-6330-3304
Supplementary Results for“Meaningful Learning
in Weighted Voting Games: An Experiment”
Naoki Watanabe
文部科学大臣認定 共同利用・共同研究拠点
関西大学ソシオネットワーク戦略研究機構
The Research Institute for Socionetwork Strategies, Kansai University
Joint Usage / Research Center, MEXT, Japan
Suita, Osaka, 564-8680, Japan
URL: http://www.kansai-u.ac.jp/riss/index.html
e-mail: [email protected]
tel. 06-6368-1228
fax. 06-6330-3304
Supplementary Results for“Meaningful Learning
in Weighted Voting Games: An Experiment”
Naoki Watanabe
Supplementary Results for “Meaningful Learning
in Weighted Voting Games: An Experiment”∗
Naoki Watanabe†
March 19, 2020
Abstract
This paper supplements the experimental results presented by Guerci et al. (2017),
where subjects choose one of two weighted voting games (options), as follows. (1) It
was reconfirmed that without any payoff-related feedback information, subjects learned
to choose the option that generates higher expected payoffs for them (correct option)
and that they generalized what they had thought introspectively in a binary choice
problem to a similar but different one (meaningful learning). (2) The subjects who had
experienced a binary choice problem without any feedback information, in some peri-
ods when they were first faced with the similar but different binary choice problem in
which their meaningful learning was observed, took as a long time on average to make
their decision as those who received immediate feedback information. Those subjects
spent a significantly longer time than inexperienced subjects who were faced with the
same binary choice problem. (3) When subjects received immediate feedback informa-
tion about payoffs, in many cases, we could not reject the hypothesis that experienced
subjects did not take the win-stay-lose-shift strategy or we might not think that they
searched for the correct options with a certain rule.
Keywords: meaningful learning, bandit experiment, weighted voting, feedback infor-
mation, introspective thinking
JEL Classification: C91, D72, D83∗This paper is an outgrowth of the author’s previous work. The author wishes to thank his co-authors of
that paper, Eric Guerci and Nobuyuki Hanaki, for giving him their permission to reuse the data, and DanielFriedman, Midori Hirokawa, Yoichi Hizen, Tatsuya Kameda, and Naoko Nishimura for their comments onthat previous work which motivated him to provide these supplementary results. This research could nothave been completed without the research assistance of Yoichi Izunaga, Toru Suzuki, Hiroshi Tanaka, andYosuke Watanabe. Financial support from Foundation for the Fusion of Science and Technology (FOST),MEXT Grants-in-Aid 24330078 and 25380222, JSPS-ANR bilateral research grant “BECOA” (ANR-11-FRJA-0002), and Joint Usage/Research of ISER at Osaka University are gratefully acknowledged.
†Graduate School of Business Administration, Keio University, 4-1-1 Hiyoshi Kohoku, Yokohama, Kana-gawa 223-8526, Japan; E-mail: [email protected]; Phone: +81-45-564-2039
1
1 Introduction
A major question in experimental studies on learning in games is whether subjects can gen-
eralize what they have learned in a situation to other similar but different situations, which
is called “meaningful learning” (Rick and Weber, 2010), “transfer of learning” (Cooper
and Kagel, 2003, 2008), or “epiphany” (Dufwenberg et al., 2010).1 We investigate this
higher-order concept of learning in a situation related to weighted voting games and aim to
supplement the results shown in Guerci et al. (2017).
In weighted voting experiments (Montero et al., 2008; Guerci et al., 2014), the outcomes
varied largely from one round to another in each session, which implies that in those strategic
situations it would be too complex for subjects to make deep inference about the underlying
relationships between their nominal votes and expected payoffs. Guerci et al. (2017) thus
drastically simplified the experimental design to remove subjects’ learning through their
strategic interaction; in each session subjects choose one of two weighted voting games
(options) repeatedly and obtain their payoffs which are stochastically generated for each
choice they make according to a voting theory. The binary choice problems are different
between the first and second parts of the session, but the payoff generating function in
binary choice problems remains the same throughout the experiment. Subjects thus have a
chance to learn something underlying the situation they face in the first part and to apply
what they learned in the first part to their decision in the second part.
By focusing on individual learning in the way described above, Guerci et al. (2017)
observed that (i) subjects learned to choose the option that generates higher expected
payoffs for them (correct option) even without any payoff-related feedback information
and that (ii) there was statistically significant evidence of meaningful learning only for
the treatment with no payoff-related feedback information.2 In their sessions, however,
meaningful learning with no feedback information might be a consequence of inertia on
subjects’ choice, because the correct options were the same choice (Choice 2) in the first
and second parts of the sequences of binary choice problems. Thus, the first aim of this
paper is to confirm whether observation (ii) was not the consequence of the inertia on
subjects’ choice by changing the correct options between the two parts.
1In the context of learning across multiple games, where a game is played together with other games,“behavioral spillover” (Cason et al., 2012; Bednar et al., 2012) and “learning spillover” (Grimm and Mengel,2012; Mengel and Sciubba, 2014) are related but different concepts from meaningful learning, althoughpossible reasons underlying these spillovers vary and include meaningful learning as one of them.
2Weber (2003) and Rick and Weber (2010) observed that withholding feedback information promotedmeaningful learning in p-beauty contest games. Neugebauer et al. (2009), however, reported that subjectsdid not learn to play the dominant strategy in a public goods game unless they received feedback information.
2
The other aim of this paper is to explore what hindered subjects from meaningful learn-
ing or gaining a deeper insight on the relationship between their nominal votes and expected
payoffs when immediate feedback information was given to them. As noted in Guerci et al.
(2017), in the post-experimental questionnaire, some subjects who received feedback infor-
mation reported that they changed their choices when they received zero points, while they
did not when they received a non-zero amount of points. It would be plausible for subjects
to take this “win-stay-lose-shift” strategy (Nowak and Sigmund, 1993), when they had not
yet been convinced of what they had learned. On the other hand, if subjects neither took
that strategy nor randomized their choice in some way, then it would then be plausible for
us to infer that they surely learned something, although what they learned might be wrong.
Also, when it was difficult for subjects to choose an option, they would need a longer time
for their deep inference, whereas they might give up thinking in a few seconds and choose
options according to the win-stay-lose-shift strategy or randomize their choice in some way.
Without any payoff-related feedback information, however, subjects could not choose the
win-stay-lose-shift strategy, and thus a longer time for introspective thinking, for example,
might facilitate deeper inference without misleading or confusing information, if any.
This paper shows the following results, examining the data taken from the additional
sessions and reexamining the data used in Guerci et al. (2017). (1) It was reconfirmed
that without any feedback information, subjects learned to choose the correct option and
confirmed that in a sequence of binary choice problems, they meaningfully learned even if
the correct option were changed from Choice 1 in the first part to Choice 2 in the second part
in the sequence. (2) The subjects who had experienced a binary choice problem without
any feedback information took as a long time on average to make their decision as those
who received feedback information, in some periods when they were first faced with the
similar but different binary choice problem in which their meaningful learning was observed.
Those experienced subjects spent a significantly longer time than inexperienced subjects
who were faced with the same binary choice problem. (3) When subjects received immediate
feedback information about payoffs, in many cases, we could not reject the hypothesis that
experienced subjects did not take the win-stay-lose-shift strategy or we might not think
that they searched for the correct options with a certain rule.
Those results stated above implies that observation (ii) reported in Guerci et al. (2017)
was not the consequence of the inertia on subjects’ choice. There should be some fac-
tors that promoted subjects’ meaningful learning without any feedback information. In
weighted voting situations, however, feedback information about their payoffs might lead
false inference of subjects or confuse their inference.
3
The rest of this paper proceeds as follows. Section 2 describes the experimental design.
Section 3 presents the results, where we deal with length of time to choose, win-stay-
lose-shift strategy, and some randomness of subjects’ choice. Section 4 briefly reviews the
literature behind this research and summarizes the results of this paper for further research.
2 Experimental design
This paper reexamines the data used in Guerci et al. (2017) and also examines the data
taken from the additional sessions, and thus we need to review the experimental design
for all sessions conducted for that paper. Subsection 2.1 describes the session outline and
Subsection 2.2 explains the experimental procedure. The instructions given to subjects are
attached in the Appendix of Guerci et al. (2017).
2.1 Session outline
There are three treatments, each of which is explained at the end of this subsection. For
each treatment, there are some sessions. Each session consists of 60 periods. In each period,
the subject is asked to choose one of two four-member committees that will divide 120 points
among the members. Let N = {1, 2, 3, 4} be the set of the members (players) to join. A
committee (a weighted voting game) is represented by [q; v1, v2, v3, v4], where q is the quota
(the minimal number of votes required for an allocation to be adopted) and vi is the voting
weight (the number of votes) allocated to player i ∈ N . The subject acts as player 1 in
this experiment. Both committees have the same total number of votes, the same quota,
and the same number of votes for the subject. In each session, the subject faces one binary
choice problem for the first 40 periods, while in the following 20 periods the subjects face
another but different binary choice problem; for example, in the first 40 periods subjects
face a choice between [14; 5, 3, 7, 7] and [14; 5, 4, 6, 7], while in the following 20 periods they
are faced with [6; 1, 2, 3, 4] and [6; 1, 1, 4, 4]. Subjects are not asked to play the weighted
voting games that they choose themselves.3 Before the start of each session, subjects are
clearly informed that the other three members of the committees are all fictitious. Thus,
the experiment is regarded as a two-armed bandit experiment with contextual information
on voting situations related to the possible payoffs for subjects.4
3This point is to avoid the complexities mentioned at the beginning of the Introduction; subjects aresimultaneously learning to play a weighted voting game from interfering with the other subjects learningabout the underlying relationship between their nominal voting weights and their expected payoffs.
4In signaling games, Cooper and Kagel (2003, 2008) showed that the use of meaningful context serves asa catalyst for meaningful learning.
4
The payoff each subject obtains from his or her committee choice is determined by an
external function based on a power index in voting theory called DPI (Deegan and Packel,
1978). Given a weighted voting game, a non-empty subset S of N is called a coalition,
and a coalition is called a winning coalition if∑
i∈S vi ≥ q; otherwise, it is called a losing
coalition. A minimum winning coalition (MWC) is a winning coalition such that deviation
by any member of the coalition turns its status from winning to losing. In the experiment,
for each period, one MWC is drawn with equal probability from all the possible MWCs for
the committee that the subject has chosen. If the subject is a member of the drawn MWC,
then he or she receives an equal share of the total payoff with the other members.
The binary choice problems we use are shown in Table 1 (See Table 13 for more details.).5
We hereafter denote each MWC by the votes apportioned to its members. For example, the
MWCs in a committee [14; 5, 3, 7, 7] are written as (5, 3, 7), (5, 3, 7), and (7, 7). Suppose that
[14; 5, 3, 7, 7] is chosen by a subject (player 1) in a period. Then, player 1 has a 2/3 chance
of being on the MWCs and will receive 1/3 of the total payoff (120 points) for being on
each of the MWCs, while there is a 1/3 probability that player 1 will not be on the MWC
and will receive nothing. We did not explain this underlying payoff generating function
to our subjects; they were simply told that payoffs were determined based on a theory of
decision-making in committees. Note, however, that we use binary choice problems in which
the better committees for the subjects are the same regardless of whether we employ DPI
or other power indices such as BzI (Banzhaf, 1965) and SSI (Shapley and Shubik, 1954).6
Problem Choice 1 (Expected payoff) Choice 2 (Expected payoff)
A [14;5, 3, 7, 7] (120 × 2/3 × 1/3) [14;5, 4, 6, 7] (120 × 3/4 × 1/3)B [6;1, 2, 3, 4] (120 × 1/3 × 1/3) [6;1, 1, 4, 4] (120 × 2/3 × 1/3)C [14;3, 5, 6, 8] (120 × 2/3 ×1/3) [14;3, 6, 6, 7] (120 × 3/4 × 1/3)D [9;1, 3, 5, 6] (120 × 1/3 × 1/3) [9;1, 2, 6, 6] (120 × 2/3 × 1/3)
Table 1: Four binary choice problems used in the experiment. In each committee, subjects are all assignedto player 1, and the number of votes given to player 1 is shown in bold. Choice 2 generates a higher expectedpayoff for the subjects in all binary choice problems listed here.
5In terms of both expected payoffs and the set of MWCs available in each choice, Problems A and Care identical, and Problems B and D are identical. There is a crucial difference between the two options inProblem A ([14; 5, 3, 7, 7] vs. [14; 5, 4, 6, 7]) and Problem C ([14; 3, 5, 6, 8] vs. [14; 3, 6, 6, 7]). In each of thesetwo problems, one option has two “large” voters who can form an MWC on their own, whereas the otheroption does not. In Problem B ([6; 1, 2, 3, 4] vs. [6; 1, 1, 4, 4]) and Problem D ([9; 1, 3, 5, 6] vs. [9; 1, 2, 6, 6]),there is no such clear difference between the two options as there are two large voters who can form anMWC by themselves in both options.
6Guerci et al. (2017) adopted DPI as our payoff generating function because, in all the experimentsreported in Montero et al. (2008), Aleskerov et al. (2009), Esposito et al. (2012), Guerci et al. (2014), andWatanabe (2014), the most frequently observed winning coalitions were MWCs.
5
Subjects are faced with one of the following sequences of binary choice problems: A → B,
B → A, C → D, orD → C (the order of problems is indicated by the arrows), where the first
problem is used in the first 40 periods, and the second problem is used in the subsequent 20
periods. In Guerci et al. (2017), as shown in Table 1, the committee that generates a higher
expected payoff for subjects (correct option) is Choice 2 for all problems listed there. In
the additional sessions, however, the order of the correct options displayed on the subjects’
monitors was changed in each sequence of binary choice problems; the correct option is
Choice 1 in the first 40 periods and it is Choice 2 in the subsequent 20 periods.
We have three feedback treatments: (1) no feedback, (2) partial feedback, and (3) full
feedback. For the no-feedback treatment, subjects are not informed of any payoffs they
receive as the result of their committee choice until the session ends. For the partial-
feedback treatment, each subject is informed of his or her own payoff in the committee he
or she chooses. For the full-feedback treatment, after each choice subjects are informed of
the payoffs of all four players in the committee they chose. We impose a 30-second time
limit for the choice stage and a 10-second limit for the feedback stage, regardless of the
amount of feedback information. If a subject does not choose a committee within the 30
seconds of the choice stage, then he or she obtains zero points for that period. In this case,
regardless of the treatment, in the feedback stage the subject receives special feedback that
he or she has obtained zero points for the period because they failed to make a choice within
the time limit. This rule is clearly stated in the written instructions and explained aloud.
If a subject makes an early choice, say within the first 10 seconds of the choice stage,
then a waiting screen is shown until all the subjects in the session have made their de-
cisions. If all the subjects make their choice before the end of the 30-second time limit,
then they all enter the feedback stage. For the no-feedback treatment, during the 10-second
feedback stage subjects are shown a screen conveying the message “Please wait until the
experiment continues.” For the full-feedback and partial-feedback treatments, the relevant
payoff information is displayed during these 10 seconds.
2.2 Experimental procedure
The sessions whose results were reported in Guerci et al. (2017) were conducted at the
Institute of Social and Economic Research (ISER) at Osaka University in June 2014 and
at the University of Tsukuba in November 2014. There were 3 treatments, and for each
treatment there were 4 sequences of binary choice problems, as described above. Each
session in Osaka (June) involved 20 subjects and each session in Tsukuba (Nov) involved
6
10 subjects, and thus 360 subjects participated in those sessions. The sessions in Tsukuba
(Nov) were conducted for a robustness check to the results obtained in the sessions in Osaka
(June), and the main results were reported in Guerci et al. (2017) with the data pooled at
the two locations. For the no-feedback treatment, the additional sessions were conducted
also at the ISER in September 2014. Each session in Osaka (Sept) involved 10 subjects, and
thus we had 40 subjects in total there. The subjects were undergraduate students recruited
from all over the campus at each location, but third- or fourth-year economics majors were
excluded. Each subject participated once in this experiment, and he or she was randomly
assigned to a session in each month and at each location.
The experiment was computerized with z-Tree (Fischbacher, 2007). Each session lasted
around 60 minutes including the time for administering the instructions and the post-
experiment questionnaire. At the beginning of each session, subjects were provided with a
written instruction upon arrival, and then the experimenter read it aloud. No communi-
cation among subjects was allowed. Subjects were allowed to ask questions regarding the
instruction and they were given the answers which other subjects could hear. Thereafter,
any information available to the subjects was provided through their computer screens. We
followed other bandit experiments (Meyer and Shi, 1995; Hu et al., 2013) for the payment
scheme. In the instruction, each subject was informed that in addition to the show-up fee of
JPY 1000, he or she would receive payment according to the total points he or she obtained
over all 60 periods at a rate of 1 point = JPY 1. The average earning of our subjects was
JPY 2500 (about 18 USD in 2014.)
3 Results
Subsection 3.1 gives a brief overview of the data taken from the sessions in Osaka (Sept).
In Subsections 3.2 and 3.3, we reconfirm learning and meaningful learning of subjects who
participated in sessions for the no-feedback treatment by using the same data processing
methods as the ones used in Guerci et al. (2017). Subsection 3.4 examines all the data we
have and compares the length of time subjects took to determine their choice in sessions for
the no-feedback treatment with the one in sessions for the other treatments. Subsection 3.5
investigates whether subjects took the win-stay-lose shift strategy in sessions for the partial-
feedback and full-feedback treatments by reexamining the data used in Guerci et al. (2017).
In Subsection 3.6 statistically tests the randomness of subjects’ individual choice. In what
follows, we employ 5% significance level in rejecting the null hypotheses.7
7The data are available from the author upon request.
7
3.1 Overview of the data: New Sessions with no feedback information
No-fb(a) A→B (b) B→A
(c) C→D (d) D→C
Figure 1: Time series of the percentage of subjects who chose the correct option among those who chosewithin the time limit in the sessions conducted in Osaka (Sept). 10 out of 40 subjects failed to make a choicewithin the time limit at least once in the 60 periods. The correct option was Choice 1 in the first 40 periodsand was Choice 2 in the subsequent 20 periods.
Figure 1 presents the time series of the percentage of subjects in Osaka (Sept) who chose
the correct option among those who chose within the time limit in each of the four sequences
of binary choice problems; the dotted vertical line in each panel separates Period 40 and
Period 41 to indicate that the problems were different in the periods before and after the
line. In the additional sessions, as noted in Subsection 2.1, the correct option was Choice 1
in the first 40 periods, which was changed to Choice 2 in the subsequent 20 periods.
In Period 1, as is seen in Figure 1, subjects found the correct option easier in Problem A
(C) than in Problem B (D), which was reported also in Guerci et al. (2017) for all the three
feedback treatments. Further, we could not reject the null hypothesis that the percentage
of subjects who chose the correct option among those who chose within the time limit in
Period 1 were equal between the additional sessions and the original sessions conducted for
Guerci et al. (2017) in any of the four binary choice problems; the p-values for the two-sided
χ2 test are 0.526, 0.559, 0.853, and 0.673 in Problems A, B, C, and D, respectively. Thus,
it is not inappropriate to compare the new data with those used in Guerci et al. (2017) in
the following subsections, although we should continue to care about this point there.
8
The features of the new sessions are summarized in Table 2.8
Table 2: Features of the new sessions.
session sequence of show-up point-to- # of session avg. paymentno. binary choice fee (JPY) JPY ratio subj. date to subject
1 A → B 1000 1.0 10 Sept.24, 2014 25522 B → A 1000 1.0 10 Sept.24, 2014 24423 C → D 1000 1.0 10 Sept.25, 2014 26284 D → C 1000 1.0 10 Sept.25, 2014 2472
3.2 Learning the correct option in new sessions: Osaka (Sept)
In Subsectoins 3.2 and 3.3, we analyze the time series of individual choice. Let FRik denote
the relative frequency of periods in which subject i chose the correct option within the k-th
block of 5 consecutive periods, that is, from 5(k − 1) + 1 to 5k. FRi2 is, e.g., the number of
times subject i chose the correct option in Period 6 to Period 10, divided by 5. The change
in the relative frequencies that subject i chose the correct option between the l-th block
and the m-th block is defined as
∆FRil,m = FRi
l − FRim.
We hereafter drop superscript i from ∆FRil,m and simply refer to them as ∆FRl,m. For
each of four sequences of binary choice problems, we had 10 subjects in Osaka (Sept) who
participated in the treatment with no payoff-related feedback information, and thus we have
10 observations of ∆FRl,m for each sequence of binary choice problems.
We expect experience in introspective thinking to improve learning for choosing the
correct option even without any payoff-related feedback information. For each binary choice
problem, the p-value for the one-tailed signed-rank (SR) test for each feedback treatment
is thus reported in Table 3, where the null hypothesis is ∆FR2,1 ≤ 0 (∆FR8,1 ≤ 0) and the
alternative hypothesis is ∆FR2,1 > 0 (∆FR8,1 > 0).
As shown in Table 3, for all problems, values in ∆FR2,1 were not significantly greater
than zero but values in ∆FR8,1 were significantly greater than zero. Thus, we see that,
overall, the distributions of ∆FR8,1 lie towards the right of those for ∆FR2,1, which may
imply that more subjects learned to choose the correct option when they were given more
time to learn even without any payoff-related feedback information.
8A more detailed summary is available upon request.
9
Problem A Problem B Problem C Problem D
∆FR2,1 0.2948 0.1763 0.1990 0.3392∆FR8,1 0.0339 0.0463 0.0125 0.0142
Table 3: P-values of one-tailed SR test in the sessions for the no-feedback treatment.
Observation 1. It was reconfirmed that subjects learned to choose the option with higher
expected payoffs even without any feedback information regarding their payoffs.9
3.3 Meaningful learning (learning transfer) in new sessions: Osaka (Sept)
Guerci et al. (2017) considered that for each binary choice problem, meaningful learning
was observed if the following two criteria are both satisfied. We say that subjects are
experienced if they have completed their choice in the first 40 periods.
(a) There was a significant increase in number of subjects who chose the correct option
between Period 1 and Period 41.
(b) There was a significant increase from FR1 to FR9.
Note that in criteria (a) and (b), subjects who chose in Period 1 or for FR1 and those in
Period 41 or for FR9 are faced with different sequences of binary choice problems.
First, we consider whether criterion (a) is satisfied. The left-most column No-fb (10) in
Table 4 presents the numbers of subjects who chose the correct option in Period 1 and Period
41 in each binary choice problem in the additional sessions for the no-feedback treatment.
For reference, Table 4 also lists those numbers reported in Guerci et al. (2017) for the
no-feedback treatment, the partial-feedback treatment, and the full-feedback treatment,
respectively, in the columns No-fb (30), Part-fb (30), and Full-fb (30). The numbers of
subjects who participated in sessions for those treatments are noted in parentheses.
The p-values for the χ2 test are also reported in Table 4, where the null hypothesis
is that the percentages of inexperienced subjects and experienced subjects who chose the
correct options are the same when they first encountered the same problem. Significantly
more experienced subjects chose the correct options than inexperienced subjects for the
no-feedback treatment (No-fb (30)) in Problem B (p = 0.002), the additional sessions for
the no-feedback treatment (No-fb (10)) in Problem C (p = 0.019), and for the no-feedback
9Guerci et al. (2007) reported that in Problem B and Problem D, subjects learned to choose the optionwith higher expected payoffs even without any payoff-related feedback information.
10
No-fb No-fb Part-fb Full-fb(10) (30) (30) (30)
Problem A
Inexperienced 7 20 25 22Experienced 7 15 16 17
p-value (χ2) 1.000 0.190 0.012 0.176
Problem B
Inexperienced 4 9 13 9Experienced 5 21 8 10
p-value (χ2) 0.653 0.002 0.176 0.781
Problem C
Inexperienced 4 19 22 19Experienced 9 23 22 19
p-value (χ2) 0.019 0.260 1.000 1.000
Problem D
Inexperienced 2 8 5 9Experienced 7 21 16 15
p-value (χ2) 0.025 <0.001 0.003 0.114
Table 4: Numbers of subjects who chose the correct option in Period 1 and Period 41 for the binary choiceproblems used in the additional and the original sessions for the no-feedback treatment (No-fb (10), No-fb(30)), the partial-feedback treatment (Part-fb(30)), and the full-feedback treatment (Full-fb (30)). For eachtreatment, the number of subjects who participated in sessions for the treatment is noted in parentheses. Thep-values for χ2 tests are reported for comparison between inexperienced and experienced subjects. Guerciet al. (2017) reported the p-values for No-fb (30), part-fb (30), and Full-fb (30).
(No-fb (10), No-fb (30)) and the partial-feedback treatments in Problem D (p = 0.025,
p < 0.001, and p = 0.003, respectively). Thus, those five cases satisfied criterion (a).
Next, we consider whether criterion (b) is satisfied by comparing the relative frequency in
which inexperienced subject i chose the correct option within the first block of 5 consecutive
periods (Periods 1 to 5) with the relative frequency in which experienced subject j chose
the correct option within the first block of 5 consecutive periods when they faced the same
problem within the ninth block of 5 consecutive periods (Periods 41 to 45). Table 5 lists
p-values for the Mann-Whitney U-test (MW) test as well as the Kolmogorov-Smirnov test
(KS) of the distributions of FR1 (inexperienced subjects) and FR9 (experienced subjects)
for each feedback treatment in each binary choice problem, where the null hypothesis is
FR1 = FR9. For No-fb (10), the p-values are not calculated with the MW test and the KS
11
No-fb No-fb Part-fb Full-fb(10) (30) (30) (30)
Problem A
p-value (perm, MW) 0.445 0.867 0.854 0.547p-value (KS) 0.913 0.889 0.283
Problem B
p-value (perm, MW) 0.376 0.049 0.336 0.538p-value (KS) 0.030 0.299 0.297
Problem C
p-value (perm, MW) 0.107 0.052 0.453 0.994p-value (KS) 0.009 0.536 0.297
Problem D
p-value (perm, MW) 0.001 <0.001 0.264 0.791p-value (KS) <0.001 0.103 0.990
Table 5: P-values for testing FR1 = FR9. Guerci et al. (2017) reported the p-values for the Mann-WhitneyU-test (MW) and the Kolmogorov-Smirnov test (KS). For No-fb (10), the p-values are not calculated withthe MW test and the KS test but with the permutation test (perm), because of the small sample size. InNo-fb (10), criteria (a) and (b) are satisfied only for Problem D.
test but with a permutation test (perm), because of the small sample size.
In sessions for the partial-feedback and full-feedback treatments, as shown in Table 5,
the p-values for the MW and KS tests suggest that there be no significant difference between
FR1 and FR9 for all problems. Thus, meaningful learning was not observed for the partial-
feedback and full-feedback treatments. In sessions for No-fb (30), however, there were
significant differences between FR1 and FR9 in Problem B (p = 0.049 for MW and 0.030 for
KS, respectively) and in Problem D (p < 0.001 for MW and < 0.001 for KS, respectively).
For Problems B and D, criteria (a) and (b) were thus both satisfied. Therefore, Guerci et al.
(2017) reported that meaningful learning was observed only for the no-feedback treatment
in Problems B and D.
In the additional sessions for No-fb (10), there was a significant difference between them
in Problem D (p = 0.001 for perm), whereas there was no such a difference in Problem B
(p = 0.376 for perm). For Problem C, criterion (a) was truly satisfied, as was confirmed
above but criterion (b) was not. Therefore, only for Problem D criteria (a) and (b) are both
satisfied. We conclude as follows.
12
Observation 2. Statistically significant evidence of meaningful learning was reconfirmed
in Problem D in the additional sessions for the no-feedback treatment.
Note that we could reconfirm meaningful learning in Problem D in the sessions for the
no-feedback treatment, even if the correct option was Choice 1 in the first 40 periods and
it was changed to Choice 2 in the subsequent 20 periods. Thus, meaningful learning for the
treatment with no payoff-related feedback information is not a consequence of inertia of the
subject’s choice. Subjects generalized what they had thought introspectively in a choice
problem to another similar but different one.
3.4 Length of time to choose
Observation 2 states that meaningful learning was reconfirmed in Problem D in the addi-
tional sessions for the no-feedback treatment. Then, what were the potential factors pro-
moting meaningful learning of subjects without any payoff-related feedback information?
This subsection considers the time that elapsed until subjects made their choice.
Table 6 presents the average length of time for subjects to make their choice in the
1st block and the 9th block of 5 consecutive periods for each binary choice problem in
sessions for each treatment, which is measured in seconds. For every binary choice problem,
numerically, (inexperienced or experienced) subjects spent longer time in sessions for Part-fb
(30) and Full-fb (30) than in sessions for No-fb (10) and No-fb (30).
The p-values for the Brunner-Munzel test are reported Table 6, where the null hypothesis
is that the length of time for inexperienced subjects and experienced subjects spent are on
average the same in the first 5 consecutive periods when they were faced with the same
problem. In each of Problems B, C, and D, experienced subjects spent a significantly
longer time than inexperienced subjects, regardless of the treatments. The p-values for the
Kruskal-Wallis test are also provided there, where the null hypothesis is that the length of
time for subjects to choose in the same 5 consecutive periods when they were first faced with
a binary choice problem are on average the same among all treatments. Only in Problems
C and D with which the experienced subjects were faced, there was no significant difference
in average length of time they spent among all treatments.
As shown in Table 7, the p-values for the Brunner-Munzel test for testing imply that
there was no significant difference in average length of time subjects spent between No-fb
(10) and No-fb (30) in every binary choice problem, regardless of whether subjects were
inexperienced or experienced. Thus, it is not inappropriate to compare the new data with
those used in Guerci et al. (2017). Also, there was no that difference significantly between
13
No-fb No-fb Part-fb Full-fb(10) (30) (30) (30) p-value (KW)
Problem A
Inexperienced 14.260 15.380 20.547 17.380 0.003Experienced 18.200 15.533 20.102 22.287 <0.001
p-value (BM) 0.086 0.812 0.816 <0.001
Problem B
Inexperienced 12.060 13.980 16.753 18.113 0.004Experienced 17.700 17.120 22.567 23.373 <0.001
p-value (BM) 0.002 0.039 <0.001 <0.001
Problem C
Inexperienced 14.600 12.247 17.907 16.213 <0.001Experienced 20.760 19.149 21.353 21.167 0.240
p-value (BM) 0.049 <0.001 0.004 <0.001
Problem D
Inexperienced 14.240 13.400 15.847 17.087 0.046Experienced 19.800 18.913 20.800 20.667 0.195
p-value (BM) 0.016 <0.001 <0.001 0.022
Table 6: Average length of time (in seconds) for subjects to choose in the 1st block (for inexperiencedsubjects) and the 9th block (for experienced subjects) of 5 consecutive periods in each binary choice problemin sessions for each treatment. The p-values for the two-sided Brunner-Munzel test are provided for thecomparison of average length of time between inexperienced subjects and experienced subjects who werefaced with the same binary choice problem in sessions for each treatment. The p-values for the Kruskal-Wallis test are also provided for testing whether there is no difference in average length of time subjectsspent in the same 5 consecutive periods among all treatments.
Part-fb (30) and Full-fb (30) in almost every case (except for the inexperienced subjects in
Problem A). In Problems A and B, however, subjects spent a significantly longer time in
sessions for Part-fb (30) than in sessions for No-fb (30), regardless of whether subjects were
inexperienced or experienced. Note that in Problem D, there was no significant difference
in average length of time experienced subjects spent between No-fb (30) and Part-fb (30).
To summarize all the above argument, we can conclude as follows.
Observation 3. (1) Subjects who experienced their choice in sessions for the no-feedback
treatment took as a long time on average to make their choice as those who made their
choice in sessions for the partial-feedback treatment and the full-feedback treatment, when
they were first faced with Problem D in the 9th block of 5 consecutive periods. (2) Those
14
No-fb (10) No-fb (30) Part-fb (30)vs No-fb (30) vs Part-fb (30) vs Full-fb (30)
Problem A
Inexperienced 0.506 0.001 0.044Experienced 0.159 0.006 0.206
Problem B
Inexperienced 0.663 0.038 0.322Experienced 0.851 <0.001 0.307
Problem C
Inexperienced 0.402 <0.001 0.303Experienced 0.456 0.080 0.711
Problem D
Inexperienced 0.471 0.068 0.400Experienced 0.970 0.075 0.966
Table 7: P-values for the two-sided Brunner-Munzel test for testing that there is no difference in averagelength of time subjects spent between two treatments.
experienced subjects spent a significantly longer time than inexperienced subjects who were
faced with problem D in the 1st block of 5 consecutive periods.
As for (1) in Observation 3, we can partly follow the above argument in another way.
Table 8 lists the p-values for the one-sided Steel test (Steel) for multiple comparison of
average length of time for subjects to choose observed in 1st block and 9th block of 5
consecutive periods; subjects in No-fb (10) spent as a long time on average as those in
No-fb (30), Part-fb (30), and Full-fb (30), respectively.
Note that in view of the statistical results shown in Tables 6, 7, and 8, the same argument
as described above can be applied to Problem C as well, although meaningful learning was
not observed in Problem C. Also note that Guerci et al. (2007) reported meaningful learning
in Problem B, although the KW test (Table 6) suggests that there was a significant difference
in average length of time experienced subjects spent in Problem B among No-fb (10), No-fb
(30), Part-fb (30), and Full-fb (30). Truly, Problems A and C (B and D) are identical
in terms of MWCs and expected payoffs, but the statistical tests mentioned here suggest
that numerically different appearances of the binary choice problems induce subjects to
choose different options. By using some other appropriate experimental design, we could
15
No-fb Part-fb Full-fbSteel (H1 :<) (30) (30) (30)
Problem A
Inexperienced No-fb (10) 0.402 0.008 0.066Experienced No-fb (10) 0.968 0.238 0.034
Problem B
Inexperienced No-fb (10) 0.480 0.030 0.008Experienced No-fb (10) 0.604 0.004 0.004
Problem C
Inexperienced No-fb (10) 0.946 0.204 0.427Experienced No-fb (10) 0.916 0.620 0.315
Problem D
Inexperienced No-fb (10) 0.890 0.383 0.223Experienced No-fb (10) 0.661 0.078 0.287
Table 8: P-values for the one-sided Steel test for multiple comparison of average length of time for subjectsto choose observed in 1st block and 9th block of 5 consecutive periods, where the alternative hypothesis (H1)is that as compared to the control group No-fb (10), they spent a longer time on average in the treatmentgroup No-fb (30), Part-fb (30), and Full-fb (30), respectively.
have considered such a cognitive or psychological effect of those appearances on subjects’
meaningful learning and the length of time for introspective thinking.
3.5 Switching options
The statistical test results shown in Subsection 3.3 imply that meaningful learning was not
observed in sessions for both the partial-feedback and the full-feedback treatments. Then,
what hindered subjects from meaningfully learning in sessions for those treatments with
feedback information? As noted in Guerci et al. (2017), in the post-experimental ques-
tionnaire, some subjects in sessions for the partial-feedback and full-feedback treatments
reported that they changed their choices when they received zero points, while they did
not when they received a non-zero amount of points. If subjects had not yet been con-
vinced of what they had learned, then it would be plausible for them to take this type of
“win-stay-lose-shift” strategy (Nowak and Sigmund, 1993).
In this experiment, subjects received 0 points or 40 points (Table 13 in Subsection 3.6),
Tables 9 - 12 list the frequencies of observing 0 points and 40 points (freq), the frequencies of
16
switching choices immediately after observing 0 points and 40 points (switch), the ratios of
switch to freq (ratio), for the partial-feedback treatment and the full-feedback treatment in
the 1st (B1), 8th (B8), 9th (B9), and 12th (B12) blocks of 5 consecutive periods, respectively.
The p-value for the two-sided Fisher exact test is reported for each binary choice problem,
where the null hypothesis is that switching choices immediately after observing 0 points and
switching choices immediately after observing 40 points were equally likely to be observed.
In the following analysis, we confine attention to Problem B in sequence A→ B and Problem
D in sequence C → D for which meaningful learning was observed in sessions for No-fb (30).
First, consider the case of the partial-feedback treatment. It is not plausible that in
B1 and B8 subjects took the win-stay-lose-shift strategy in Problem C (p = 0.361 and
p = 0.290) and also not plausible that subjects who experienced Problem C from B1 to
B8 and encountered Problem D in B9 took the win-stay-lose-shift strategy (p = 0.068).
Moreover, the null hypothesis was not rejected also in B12 (p > 0.999). On the other hand,
it is inferred that in B9 subjects who experienced Problem A from B1 to B8 and encountered
Problem B took the win-stay-lose-shift strategy (p =< 0.001), where “lose-shift” was chosen
at about 40% (=0.397) after 0 points realized while “win-stay” was chosen at about 89%
(=1-0.113) after 40 points realized, although the null hypothesis was rejected in B12. Thus,
we have following observation.
Observation 4. For the partial-feedback treatment, it is not plausible that subjects took the
win-stay-lose-shift strategy in many periods for sequence C → D, although we can reject the
hypothesis that experienced subjects did not take that strategy in some consecutive periods
after they were first faced with Problem B.
It is not plausible that subjects learned something in Problem C (p = 0.768 for the
one-sided SR test for the null hypothesis ∆FR8,1 ≤ 0). We can thus understand that
experienced subjects had not learned anything sure with Problem C they could generalize
to Problem D, and in fact Guerci et al. (2017) reported that meaningful learning was not
observed in Problem D. As noted above, however, it is also not plausible that in Problem C
subjects took the win-stay-lose-shift strategy in B8, which might imply that they gave up
even learning at last. We continue to investigate this point in the next subsection.
Next, consider the case of the full-feedback treatment. It is not plausible that in B1
subjects took the win-stay-lose-shift strategy in Problem C (p = 0.111) and yet they did in
B8 in Problems A and C (p = 0.313 and p = 0.155), as is the case for the partial-feedback
treatment. It is, however, inferred that subjects who experienced Problem C from B1 to
B8 and encountered Problem D took the win-stay-lose-shift strategy (p =< 0.001) and
17
Table 9: Frequency of switching choices in B1: two-sided Fisher testPart-fb (30) Full-fb (30)
0 point 40 points 0 point 40 points
Problem A freq 43 77 Problem A freq 44 76switch 23 18 switch 25 19ratio 0.535 0.234 ratio 0.568 0.250p-value 0.001 p-value 0.001
B freq 53 66 B freq 65 55switch 30 15 switch 36 13ratio 0.566 0.227 ratio 0.554 0.236p-value <0.001 p-value 0.001
C freq 27 90 C freq 39 81switch 12 30 switch 20 28ratio 0.444 0.333 ratio 0.513 0.346p-value 0.361 p-value 0.111
D freq 57 62 D freq 62 57switch 37 21 switch 37 13ratio 0.649 0.339 ratio 0.597 0.228p-value 0.001 p-value <0.001
Table 10: Frequency of switching choices in B8: two-sided Fisher testPart-fb (30) Full-fb (30)
0 point 40 points 0 point 40 points
Problem A freq 45 105 Problem A freq 49 101switch 15 23 swith 15 22ratio 0.333 0.219 ratio 0.306 0.218p-value 0.155 p-value 0.313
B freq 59 91 B freq 72 78switch 20 10 switch 21 9ratio 0.339 0.120 ratio 0.292 0.115p-value 0.001 p-value 0.008
C freq 43 107 C freq 39 111switch 10 31 switch 12 18ratio 0.233 0.290 ratio 0.282 0.162p-value 0.547 p-value 0.063
D freq 52 97 D freq 62 57switch 15 13 switch 37 13ratio 0.289 0.134 ratio 0.452 0.210p-value 0.028 p-value <0.001
that they still took the win-stay-lose-shift strategy (p = 0.004) in B12. We can infer that
subjects who experienced Problem A from B1 to B8 and encountered Problem B took the
18
Table 11: Frequency of switching choices in B9: two-sided Fisher testPart-fb (30) Full-fb (30)
0 point 40 points 0 point 40 points
Problem A freq 28 92 Problem A freq 40 80switch 15 13 switch 12 12ratio 0.536 0.130 ratio 0.300 0.150p-value <0.001 p-value 0.088
B freq 58 62 B freq 55 65switch 23 7 switch 28 14ratio 0.397 0.113 ratio 0.509 0.215p-value <0.001 p-value 0.001
C freq 36 83 C freq 35 85switch 16 16 switch 20 18ratio 0.444 0.193 ratio 0.571 0.212p-value 0.007 p-value <0.001
D freq 54 66 D freq 52 68switch 20 14 switch 31 12ratio 0.370 0.212 ratio 0.566 0.177p-value 0.068 p-value <0.001
win-stay-lose-shift strategy (p =< 0.001) in B9, although the null hypothesis was rejected
in B12. In short, we have the following observation.
Observation 5. For the full-feedback treatment, it is inferred that experienced subjects
took the win-stay-lose-shift strategy in Problem D, and we can reject the hypothesis that
experienced subjects did not take that strategy some consecutive periods after they were first
faced with Problem B.
19
Table 12: Frequency of switching choices in B12: two-sided Fisher testPart-fb (30) Full-fb (30)
0 point 40 points 0 point 40 points
Problem A freq 60 85 Problem A freq 63 87switch 23 11 switch 19 9ratio 0.383 0.129 ratio 0.302 0.104p-value <0.001 p-value 0.003
B freq 36 114 B freq 48 102switch 4 15 switch 10 12ratio 0.111 0.132 ratio 0.208 0.118p-value >0.999 p-value 0.215
C freq 56 93 C freq 54 95switch 24 17 switch 18 11ratio 0.429 0.183 ratio 0.333 0.116p-value 0.002 p-value 0.002
D freq 41 108 D freq 45 105switch 9 25 switch 22 25ratio 0.220 0.232 ratio 0.489 0.238p-value >0.999 p-value 0.004
20
3.6 Random choice of runs
Observation 4 implies that we could not conclude that in sequence C→ D, the win-stay-lose-
shift strategy hindered subjects from meaningful learning in sessions for the partial-feedback
treatment. Subjects might take some more complicated way in search of the correct options.
In this subsection, we thus consider a possibility of subjects’ random choice of runs in some
sequences of options (not the randomness of options themselves chosen by subjects).
Problem A Choice 1 [14; 5, 3, 7, 7] Choice 2 [14; 5, 4, 6, 7]
(71, 72) (0, 0, 60, 60) (5, 4, 6) (40, 40, 40, 0)(5, 3, 71) (40, 40, 40, 0) (5, 4, 7) (40, 40, 0, 40)(5, 3, 72) (40, 40, 0, 40) (5, 6, 7) (40, 0, 40, 40)
(4, 6, 7) (0, 40, 40, 40)
Problem B Choice 1 [6; 1, 2, 3, 4] Choice 2 [6; 1, 1, 4, 4]
(2, 4) (0, 60, 0, 60) (41, 42) (0, 0, 60, 60)(3, 4) (0, 0, 60, 60) (11, 12, 41) (40, 40, 40, 0)(1, 2, 3) (40 40, 40, 0) (11, 12, 42) (40, 40, 0, 40)
Problem C Choice 1 [14; 3, 5, 6, 8] Choice 2 [14; 3, 6, 6, 7]
(6, 8) (0, 0, 60, 60) (61, 62) (40, 40, 40, 0)(3, 5, 6) (40, 40, 40, 0) (1, 2, 61) (40, 40, 0, 40)(3, 5, 8) (40, 40, 0, 40) (1, 2, 62) (40, 0, 40, 40)
(61, 62, 7) (0, 40, 40, 40)
Problem D Choice 1 [9; 1, 3, 5, 6] Choice 2 [9; 1, 2, 6, 6]
(3, 6) (0, 60, 0, 60) (61, 62) (0,0, 60, 60)(5, 6) (0, 0, 60, 60) (1, 2, 61) (40, 40, 40, 0)(1, 3, 5) (40, 40, 40, 0) (1, 2, 62) (40, 40, 0, 40)
Table 13: MWCs and payoff vectors. The MWCs in [14; 5, 3, 7, 7] are, e.g., written here as (5, 3, 71),(5, 3, 72), and (71, 72) by the votes apportioned to the members, not with references to the specific players.
Under the null hypothesis, the number of runs in a sequence of options is a random
variable. When the payoffs associated with options are stochastically determined by random
variables, the null hypothesis is not rejected as often in the runs test, even if the sequence of
options is generated by some systematic rule.10 Accordingly, we apply the following criterion
10Table 12 lists MWCs and payoff vectors for each choice in four binary choice problems. In Problem D,denote by 1-1, 1-2, and 1-3 payoff vectors (0, 60, 0, 60), (0, 0, 60, 60), and (40, 40, 40, 0), respectively whenChoice 1 is chosen and by 2-1, 2-2, and 2-3 (0, 0, 60, 60), (40, 40, 40, 0), and (40, 40, 0, 40), respectivelywhen Choice 2 is chosen. Assume that when Choice k (= 1, 2) is successively chosen, payoff vectors realize inthe order of k-1→ k-2 → k-3 → k-1 → · · · , and assume in addition that when the alternative option is oncechosen after observing payoff vector k-i (i=1, 2, 3) and then Choice k is chosen again, the order resumesfrom the payoff vector next to k-i. Given that Choice 1 is chosen at Period 1, the win-stay-lose-shift strategygenerates the following sequence of 20 choices: 1, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, . . .. The p-value forthe runs test applied to the sequence of those choices is 0.8391, and thus the null hypothesis was not rejected,although they were generated systematically with the win-stay-lose-shift strategy.
21
to our analysis; when it is inferred that subjects took the win-stay-lose-shift strategy, we
prioritize the inference over the result in the runs test. If the null hypothesis is rejected
in the runs test, then we infer that the runs in the sequence of options were chosen with a
certain rule for searching the correct options.
No-fb No-fb Part-fb Full-fb(10) (30) (30) (30)
Problem A
Inexperienced 5 22 12 11Experienced 3 15 9 W 15
Problem B
Inexperienced 4 22 13 W 11 WExperienced 5 13 8 w+rand 8 w+rand
Problem C
Inexperienced 5 14 9 rand 12Experienced 4 18 10 W 13 W
Problem D
Inexperienced 7 21 12 W 7 WExperienced 5 15 7 rand 7 W
Table 14: Numbers of subjects each of whom is counted if the null hypothesis was rejected, where thenull hypothesis is that the number of runs in a sequence of options each subject chose in Periods 21-40(Inexperienced) or in Periods 41-60 (Experienced) is a random variable. The sample size is 10 in each caseof No-fb (10) and 30 in each case of No-fb (30), Part-fb (30), and Full-fb (30). “W” (“w”) indicates that wecan infer that the win-stay-lose-shift strategy was taken in both B1 and B8 or in both B9 and B12 (in B9).“rand” indicates that we conclude that subjects did not search the correct options with any certain rules.When it is inferred that subjects took the win-stay- lose-shift strategy, we prioritize the inference over theresult in the runs test.
Table 14 presents the numbers of subjects each of whom is counted if the null hypothesis
was rejected in the runs test for each binary choice problem in sessions for each treatment.
The runs test is applied to a sequence of options each identical subject chose and is not
used for testing the randomness of options themselves but the randomness of the number of
runs in a sequence of options. Thus, 5 consecutive periods in B1, B8, B9, and B12 are too
few as the samples; for each subject, the runs test was here applied to a sequence of options
the subject chose in a binary choice problem in Periods 21-40 and in another one in Periods
41-60, respectively. For each case in each binary choice problem, the sample size is 10 in
sessions for No-fb (10) and 30 in No-fb (30), Part-fb (30), and Full-fb (30), respectively.
22
As shown in Subsection 3.2, for all four binary choice problems we examined, it was
reconfirmed that subjects learned to choose the correct option in sessions for No-fb (10)
(Observation 1). Table 14 shows that in Problem B, 40% of inexperienced subjects searched
for the correct options with a certain rule in sessions for No-fb (10). Thus, we presume in
the following analysis that subjects did not search the correct options with any certain rules,
if 30% or less of subjects are counted for each of whom the null hypothesis was rejected in
the runs test and if it is not inferred that subjects took the win-stay-lose-shift strategy.
The first part of Observation 4 states that for the partial-feedback treatment, it is not
plausible that subjects took the win-stay-lose-shift strategy in many periods for sequence
C → D. Then, what happened? 9 inexperienced subjects in Problem C and 7 experienced
subjects in Problem D are counted for each of whom the null hypothesis was rejected in the
runs test. According to the criterion described above, we have the following observation.
Observation 6. In sessions for the partial-feedback treatment, it is inferred that in sequence
C → D, subjects did not search for the correct options with any certain rules.
As stated in the second parts of Observations 4 and 5, we can reject the hypothesis
that in Problem B, experienced subjects did not take the win-stay-lose-shift strategy in B9
for both the partial-feedback treatment and the full-feedback treatment. In B12, however,
we cannot reject that hypothesis. Moreover, 8 inexperienced subjects and 8 experienced
subjects are counted for each of whom the null hypothesis was rejected in the runs test.
According to the criterion described above, we have the following observation.
Observation 7. In Problem B, in sessions for both partial-feedback treatment and the full-
feedback treatment, it is inferred that experienced subjects took the win-stay-lose-shift strategy
in some consecutive periods after they were first faced with the binary choice problem but
finally they did not search for the correct options with a certain rule.
Table 14 indicates that when subjects received immediate feedback information about
payoffs, in many cases, we could not reject the null hypothesis that experienced subjects
did not take the win-stay-lose-shift strategy or we might not think that they searched for
the correct options with a certain rule. In the full-feedback treatment, subjects observe
that (i) payoffs are equally allocated among a subset of committee members, and (ii) these
subsets of committee members change from period to period. Thus, as compared to sessions
for the partial-feedback information, we expected it to be much easier to learn about the
underlying relationship between payoffs and votes in this treatment compared to the other
two treatments. In weighted voting situations, feedback information about their payoffs
might lead false inference of subjects or confuse their inference.
23
4 Final remarks
Standard models of learning in games need some feedback information with which people
learn something; providing feedback information promotes learning in the theory of rein-
forcement learning (e.g., Erev and Roth, 1998), belief-based learning (e.g., Cheung and
Friedman, 1997), and experience weighted attraction (EWA) learning (Camerer and Ho,
1999).11 In experiments of p-beauty-contest games, however, Weber (2003) and Rick and
Weber (2010) found that withholding feedback information promotes meaningful learning
in the sense that subjects learn to perform iterated dominance, although it is a higher-order
concept of learning than that the standard models of learning deal with.
Following Weber (2003) and Rick and Weber (2010), some researchers attempted to
reconfirm similar observations in various situations. Neugebauer et al. (2009) reported that
subjects do not learn to play the dominant strategy in a public goods game without feedback
information. In a bandit experiment with a context of weighted voting games, however,
Guerci et al. (2017) reported that even without any payoff-related feedback information,
subjects learned to choose the correct option and they generalized what they had thought
introspectively in a binary choice problem to another similar but different one. As a next
step for understanding human behavior of learning and calling for a new class of models of
learning that allow for introspective thinking, we needed to further explore what hindered
subjects from meaningful learning when immediate payoff-related feedback information was
given to them, and we should have figured out what factor might induce subjects to gain a
deeper insight on the relationship between their nominal votes and expected payoffs without
any feedback information.
In this paper we showed that subjects’ inference might be mistaken or confused when
feedback information was given to them in a context of weighted voting and that a longer
thinking time might induce subjects to deeply infer some underlying relationship between
votes and payoffs without misleading or confusing feedback information. Then, in some
way in another experiment, we should next confirm whether meaningful learning is observed
when subjects who do not choose options corresponding to immediate feedback information.
To further proceed with this strand of research, the cognitive ability of subjects will also
be taken into account as an important potential factor related to the possibility of their
meaningful learning. There might be some features we should capture with which subjects
think the binary choice problems in their mind.
11Arifovic et al. (2006) showed that those models fail to replicate human behavior in a repeated two-persongame, and Erev et al. (2010) reported that those models do not perform well in predicting how people behavein market entry games.
24
References
Aleskerov, F., Beliani, A., Pogorelskiy, K. (2009). Power and preferences: an experimental
approach. National Research University, Higher School of Economics, mimeo.
Arifovic, J., McKelvey, R.D., Pevnitskaya, S. (2006). An initial implementation of the Turing
tournament to learning in two person games. Games Econ. Behav. 57, 93-122.
Banzhaf, J.F. (1965). Weighted voting doesn’t work: a mathematical analysis. Rutgers Law
Review 19, 317-343.
Bednar, J., Chen, Y., Liu, T. X., Page, S. (2012). Behavioral spillovers and cognitive load
in multiple games: An experimental study. Games Econ. Behav. 74, 12-31.
Cason, T. N., Savikhin, A. C., Sheremeta, R. M. (2012). Behavioral spillovers in coordina-
tion games. European Economic Review 56, 233-245.
Camerer, C., Ho, T.-H. (1999). Experience-weighted attraction learning in normal form
games. Econometrica 67, 827-874.
Cheung, Y.-W., Friedman, D. (1997). Individual learning in normal form games: some
laboratory results. Games Econ. Behav. 19, 46-76.
Cooper, D.J., Kagel, J.H. (2003). Lessons learned: generalized learning across games. Amer.
Econ. Rev. 93, 202-207.
(2008). Learning and transfer in signaling games. Econ. Theory 34, 415-439.
Deegan, J., and Packel, E. (1978). A new index of power for simple n-person games. Int.
Jour. Game Theory 7, 113-123.
Dufwenberg, M., Sundaram, R., Butler, D. J. (2010). Epiphany in the game of 21. Jour.
Econ. Behav. Org. 75, 132-143.
Erev, I., Roth A.E. (1998). Predicting how people play games: reinforcement learning in
experimental games with unique, mixed strategy equilibria. Amer. Econ. Rev. 88, 848-81
Erev, I., Ert, E., Roth, A.E. (2010). A choice prediction competition for market entry games:
an introduction. Games 1, 117-136.
Esposito, G., Guerci, E., Lu, X., Hanaki, N., Watanabe, N. (2012). An experimental study
on “meaningful learning” in weighted voting games. mimeo., Aix-Marseille University.
25
Fischbacher, U. (2007). z-Tree: Zurich toolbox for ready-made economic experiments. Ex-
per. Econ. 10, 171-178.
Grimm, V., Mengel, F. (2012). An experiment on learning in a multiple games environment.
Jour. Econ. Theor. 147, 2220-2259
Guerci, E., Hanaki, N. Watanabe, N. (2017). Meaningful learning in weighted voting games:
an experiment. Theory. Decis. 83, 131-153.
Guerci, E., Hanaki, N. Watanabe, N., Esposito, G., Lu, X. (2014). A methodological note
on a weighted voting experiment. Soc. Choice Welfare 43, 827-850
Hu, Y., Kayaba, Y., Shum, M. (2013). Nonparametric learning rules from bandit experi-
ments. Games Econ. Behav. 81, 215-231.
Mengel, F., Sciubba E. (2014). Extrapolation and structural similarity in games. Econ. Let.
125, 381-385.
Meyer, R.J., Shi, Y. (1995). Sequential choice under ambiguity: intuitive solutions to the
armed-bandit problem. Manag. Sci. 41, 817-834.
Montero, M., Sefton, M., Zhang, P. (2008). Enlargement and the balance of power: an
experimental study. Soc. Choice Welfare 30, 69-87.
Neugebauer, T., Perote, J., Schmidt, U., Loos, M. (2008). Selfish-biased conditional cooper-
ation: on the decline of contributions in repeated public goods experiments. Jour. Econ.
Psy. 30, 52-60.
Nowak, M,A., Sigmund, K. (1993). A strategy of win stay, lose shift that outperforms
tit-for-tat in the prisoner’s-dilemma game. Nature 364, 56-58.
Rick, S., Weber, R.A. (2010). Meaningful learning and transfer of learning in games played
repeatedly without feedback. Games Econ. Behav. 68, 716-730.
Shapley, L.S., Shubik, M. (1954). A method for evaluating the distribution of power in a
committee system. Amer. Polit. Sci. Rev. 48, 787-792.
Watanabe, N. (2014). Coalition formation in a weighted voting experiment. Japanese Jour.
Electral Stud. 30, 56-67.
Weber, R.A. (2003). Learning with no feedback in a competitive guessing game. Games
Econ. Behav. 44, 134-144.
26