A one-shot bargaining strategy for dealing with multifarious opponents

Appl IntellDOI 10.1007/s10489-013-0497-6

A one-shot bargaining strategy for dealing with multifariousopponents

Shu-juan Ji · Chun-jin Zhang · Kwang-Mong Sim ·Ho-fung Leung

© Springer Science+Business Media New York 2013

Abstract Bargaining is an effective paradigm to solve theproblem of resource allocation. The consideration of factorssuch as bounded rationality of negotiators, time constraints,incomplete information, and complexity of dynamic envi-ronment make the design of optimal strategy for one-shotbargaining much tougher than the situation that all bargain-ers are assumed to be absolutely rational. Lots of prediction-based strategies have been explored either based on assum-ing a finite number of models for opponents, or focusingon the prediction of opponent’s reserve price, deadline, orthe probabilities of different behaviors. Following the meth-ods of estimating opponent’s private information, this pa-per gives a strategy which improves the BLGAN strategy toadapt to various possible bargaining situations and deal withmultifarious opponents. In addition, this paper compares theimproved BLGAN strategy with related work. Experimentalresults show that the improved BLGAN strategy can outper-form related ones when faced with various opponents, espe-

S.-j. JiCollege of Information Science and Engineering, ShandongUniversity of Science and Technology, Qingdao, P.R. Chinae-mail: [email protected]

C.-j. Zhang (B)Center of Modern Education, Shandong University of Scienceand Technology, Qingdao, P.R. Chinae-mail: [email protected]

K.-M. SimSchool of Computing, University of Kent, Chatham Maritime,Kent ME4 4AG, UKe-mail: [email protected]

H.-f. LeungDepartment of Computer Science and Engineering, The ChineseUniversity of Hong Kong, Hong Kong, P.R. Chinae-mail: [email protected]

cially the agents who frequently change their strategies foranti-learning.

Keywords Bargain · Strategy · Heuristic method ·Prediction · Experimental analysis

1 Introduction

Bargaining denotes the process of two agents who havedisparate interests searching for an agreement on some is-sues, and the search process involves exchange of offers,relaxation of initial offers, and mutual concessions [1, 2].It can also be viewed as a form of decision-making withtwo or more actively interacting agents who cannot makedecisions independently (or achieve their goals unilaterally),and therefore must make concessions to achieve a compro-mise [3, 4]. Automated bargaining among software agentsis required in many different contexts such as e-commerce[5, 6], supply chain management [7], resource co-allocationin Cloud computing [8–11] and Grid computing [3, 12–14],in which conflicts and differences need to be resolved.

According to the bargainers’ relationship, bargaining canbe divided into two categories, i.e., the repeated bargain-ing and the one-shot bargaining. One-shot means that thebargainers encounter each other only once, i.e., the sellerand the buyer will not meet each other after the bargainingprocess no matter whether an agreement is reached or not.Therefore, bargainers do not have historical interaction ex-perience for reference and do not consider the possibilityof long term cooperation in bargaining process. One-shotbargaining is an important category for both theoretical andpractical reasons. This paper focuses on the research of one-shot and one-issue (i.e., price) bargaining strategy.

mailto:[email protected]




S.-j. Ji et al.

As there is no historical interaction in the one-shot bar-gaining, it is difficult to obtain opponent’s private informa-tion according to off-line learning methods that are usuallydesigned by learning from histories. The incomplete infor-mation constraints as well as the variety of bargaining op-ponents (absolutely rational or bounded rational) make theoptimal one-shot bargaining strategy design a tough task.However, lots of researches have been done on the design-ing of optimal or dominant bargaining strategy. Typically,Sandholm and Vulkan [15] explored strategies of one-shotone-issue deals under the assumption that all bargainers areabsolutely rational even if reservation price and tolerancedeadline are private to them. They pointed out that “Sit-and-Wait” is the unique sequential equilibrium strategy for bar-gainers in one-shot bargaining. “Sit-and-Wait” means thatthe absolutely rational seller and buyer insist on their initialoffers until the smaller of the bargainers’ deadlines (whichone is smaller is unknown to the bargainers) is reached, atwhich point the agent whose deadline is smaller accepts op-ponent’s offer if that price is within its acceptance range.Otherwise, no agreement can be reached.

Based on the belief that agents usually make conces-sions in bargaining, a lot of heuristic bargaining strate-gies are proposed. For example, Faratin et al. [16] clas-sified agents’ negotiation strategies and tactics into time-dependent, resource-dependent and behavior-dependent onesand defined them formally. Based on Faratin et al. [16]’sdefinition, Hou [17] gave a nonlinear regression to pre-dict opponent’s tactic family, tactic parameter and next of-fer. He also gave a heuristic method to estimate opponent’sreservation price and deadline. Similarly, Lee et al. [18] pro-posed a method to adapt the factor of negotiation tacticsaccording to the prediction results of opponent’s next of-fers. Brozostowski et al. [19] gave a difference-based offersprediction method. In addition, Sim et al. [20] presented aBLGAN algorithm that use a Bayesian method to learn op-ponent’s reservation price and deadline, and use a genetic al-gorithm to search an optimal concession based on these esti-mated values. Especially, an international automated negoti-ating agent competition [21] is held annually to compete theone-shot multi-issue bargaining. Although many prediction-based one-shot one-issue bargaining strategies are proposed,there is no comparison among their performances. Besides,most of these prediction-based strategies are designed undertwo assumptions. One is that all opponents offer accordingto the strategies and tactics models defined by Faratin et al.[16], the other is that opponents make consecutively conces-sions. These assumptions limit the coverage of opponentsas there are lots of opponents who do not offer accordingto any patterns, or even take some strategies to prevent theirprivate information to be learned.

The contribution of this paper is the design of a one-shotbargaining strategy that can deal with multifarious oppo-nents, including (1) consecutively-conceding opponents, (2)

sit-and-wait opponents, (3) opponents who concede accord-ing to fixed patterns, during which there are one or multiple“sit-and-wait” duration(s), (4) opponents who concede with-out any fixed patterns. Besides, according to three sets ofexperiments, we compare the strategy with other prediction-based strategies and reveal the characteristics of these strate-gies. The improved BLGAN strategy is equivalent to theBLGAN strategy in the first set experiments because all theopponents in these experiments make consecutive conces-sions. In the second and third set of experiments, the BL-GAN strategy failed to reach agreements because some for-mulas in it are invalid when bargain with opponents whomay sit and wait during bargaining. In contrast, the im-proved BLGAN strategy is feasible and can achieve betterperformance than the compared strategies. That is becausethe agent who adopts the improved BLGAN strategy canswiftly adapt its actions from making a appropriate con-cession( when its opponent make concession in the latestround) to sitting-and-waiting (when its opponent choose thesit-and-wait action in the latest round).

The rest of this paper is organized as follows. Section 2reviews the related work. Section 3 first defines the prob-lem and the assumptions, then presents the preliminary el-ements used in bargaining. Section 4 gives the improvedBLGAN strategy. Section 5 illustrates the designed experi-ments, measurement, and discusses the experimental resultsin detail. Section 6 concludes this paper and outlines the pos-sible future work.

2 Related work

Faratin et al. [16] classified agents’ negotiation strategiesand tactics into time-dependent, resource-dependent andbehavior-dependent (i.e., imitative) ones and defined themformally. Though these models cover almost all kinds of of-fer patterns of agents, Faratin et al. did not give methodsabout how to determine the parameters of these models. Lotsof work has been done by extending their work. For exam-ple, Hou [17] extended Faratin et al. [16]’s research by giv-ing a nonlinear-regression-based opponent’s tactics estima-tion method. The process of his prediction mechanism canbe divided into 3 steps. First, using the nonlinear regressionmethod given by Hou [17], the agent predicts which tacticfamily the opponent’s tactic belongs to. Under the assump-tion that the opponent adopts one form of tactics that is de-fined in [16], the polynomial or exponential time-dependanttactics, and the resource-dependant tactics are simultane-ously predicted. If opponent’s tactic is not of these threeforms, it is assumed from the behavior-dependent family.Second, after predicting the opponent’s tactic, the agent thenestimates parameters of the opponent’s time-dependent orresource-dependent tactic according to the previous offers

A one-shot bargaining strategy for dealing with multifarious opponents

that are received from the opponent. Third, based on the es-timated values of these parameters, agent predicts its oppo-nent’s next offer. Moreover, the heuristics method is appliedto estimate opponent’s reservation price and deadline. Ex-perimental results show that Hou [17]’s prediction mecha-nism is an effective online learning approach to eliciting anopponent’s private information(tactic family, tactic param-eters, next offer, reservation price and deadline) with onlyopponent’s previous offers. However, any agent who usesthis method needs to maintain a knowledge base of oppo-nent’s tactics. Moreover, Hou did not give any method abouthow to generate an appropriate response according to thepredicted next offer, reservation price and deadline of oppo-nent. Prediction accuracy is not coinciding with the ultimateaim of maximizing agent’s utility in one-shot bargaining.

Based on Faratin et al.’s [16] work about making deci-sion according to an offer function parameterized by differ-ent tactic factors, Lee et al. [18] gave a tactic factor adap-tation strategy. Using this strategy, an agents executes thefollowing steps in each round of negotiation. First, underthe assumption that the opponent offers according to a lin-ear function and makes constant concessions in each round,the agent predicts the opponent’s following l rounds of of-fers according to the opponent’s recent two offers. Then,the agent calculates the utility of these predicted offers, andfinds the round that this agent’s utility may be maximized.Third, this agent adapts its original concession factor basedon the round that is found in the previous step. Finally, thisagent makes concessions according to this new concessionfactor in the current round. Experiments show that this factoradaptation strategy can outperform the static factor strate-gies in average utility. This strategy is designed under theassumption that opponents offer in linear patterns and arewilling to make similar concession in the subsequent negoti-ation rounds. However, as only small part of bargainers offerthese kinds of patterns, hence, this strategy is not suitable forthe complex environment, in which there are various kindsof opponents. Besides, this strategy is only suitable for thesituation that the bargained article’s value is discounted withtime (i.e. the discounted factor of this article’s utility mustbe larger than zero). That is because, when the discountedfactor is assigned with zero, the denominator of the conces-sion factor adaptation formula given by Lee et al. [18] willbe zero.

Similarly, based on the previous offers received fromthe opponent, Brozostowski et al. [19] gave a difference-based prediction strategy as well as a concession method.The main idea of their bargaining strategy is as follows.First, under the assumption that opponent adopts one strat-egy which combines the time-dependant tactics or the rela-tive /absolute Tit-For-Tat tactics [16], this agent predicts itsopponent’s offers by the integration of the time-dependantdifference-based approximation method and the behavior-dependant difference-based approximation method given by

Brozostowski et al. [19]. Second, assuming that opponentmakes constant concession in following rounds, this agentcalculates the next l rounds of offers by linearly extendingthe approximated offer curve. Third, agent makes conces-sions according to these predicted opponent’s offers. Exper-imental results show that this prediction strategy is betterthan all kinds of random static strategies in average util-ity. Similar to the limitation of Lee et al.’s strategy [18],the assumption in this work (i.e., opponent adopts a strat-egy which combines the time-dependant tactics and rela-tive/absolute Tit-For-Tat tactics) can not cover all kinds ofopponent’s offer patterns. Moreover, the prediction accu-racy of difference-based prediction method is affected by thenumber of historical offers. That is to say, only when thereis enough data, the difference-based prediction is meaning-ful and helpful. Otherwise, the prediction accuracy will bereduced.

Similar to above work about predicting opponent’s in-formation using historical offers, Sim and his colleagues[20] proposed a BLGAN algorithm, which integrates theBayesian learning algorithm and the genetic algorithm tolearn opponent’s private information and search optimalcounter proposal. The first step of this algorithm is to esti-mate opponent’s reservation price using the Bayesian learn-ing algorithm and calculate the deadline according to thisestimated reservation price. Then, the agent adjusts its of-fer parameter and generates its possible proposal accordingto this estimated information. Third, the generic algorithmis used to find an optimal proposal that can maximize theagent’s expected utility. Finally, the possible proposal andthe optimal proposal are compared. The better one is chosenas the counter offer and sent to the opponent. Four sets of ex-periments are executed to prove that the integrated methodis better than the Bayesian-based algorithm as well as thegenetic algorithm. Different from the work above, agentswho use the BLGAN algorithm do not need to maintain aknowledge base of opponent’s tactics. The BLGAN algo-rithm is designed under the assumption that the opponent’sreservation price and offer function follows the uniform dis-tribution and the normality distribution respectively. How-ever, in some cases of real bargaining, these two kinds ofdistributions are not common knowledge. Moreover, a lotof agent’s reservation price and offer function do not com-ply with these distribution functions. Most importantly, BL-GAN algorithm is designed with the hypothesis that oppo-nents make consecutive strategy. Hence, the formulas in thispaper are designed without consideration of the opponentmaking similar offers in the bargaining process. This meansthat some of these formulas will be invalid in the cases thatopponent sit and wait in some rounds of bargaining becausethe denominators of these formulas are zero in these cases.

Since 2010, an international automated negotiating agentcompetition [21] was held annually to provide a platform

S.-j. Ji et al.

to compare and benchmark different state-of-art heuris-tic strategies developed for automated bilateral negotiation.In these competitions, many multi-attribute one-shot nego-tiation strategies are designed in multiple domains. Most ofthem focus on learning opponent’s preference profile. How-ever, the bargaining environment setting of these competi-tions is inconsistent with the environment constraints of thispaper. First, the deadline in these competitions is commonknowledge (3 real-time minutes) for bargainers, while thedeadlines of bargainers in this paper are private knowledge.Second, in this paper, the bargained article or service willnot devalue as time goes by, while many domains in ANACare set with a discount factor. Third, in this paper we focuson one issue (i.e., price) bargaining, while most of the agentsin ANAC are designed for multi-issue domains.

3 One-shot bilateral bargaining

Bargaining (also called haggling) is a type of negotiation inwhich the buyer and seller of a good or service dispute theprice which will be paid. According to bargainers’ relation-ship, bargaining can be divided into two categories, i.e., therepeated bargaining and the one-shot bargaining. This paperfocuses only on the one-shot and one-issue (i.e., price) bar-gaining, in which, each bargainer has an initial offer (abbr.IP), a reservation offer (abbr. RP) and a deadline. The initialoffer of a seller is determined by the basic cost and the ad-ditional cost (logistics cost, labour cost, etc.) of this articleor service. And the initial offer of a buyer is determined byits estimation of the cost and the market price of this arti-cle or service. The reservation offer of a seller (buyer) is itsacceptable lowest (highest) price. The range between a bar-gainer’s initial offer and reservation offer are called accep-tance range. Deadline is the bargainer’s acceptable maxi-mal number of bargaining round. Except for the initial offer,reservation offer and deadline, to maximize the individualgains, a bargainer also has an offer function to describe itsoffering characteristics or concession methods, which is alsocalled bargaining strategy. Bargainers try their best to keeptheir reservation offer, deadline and bargaining strategy assecrets. That is because the more information known by theopponent, the larger probability of exploiting by the oppo-nent. Therefore, reservation offer, deadline and bargainingstrategy of bargainers are private knowledge. Based on theseconcepts, in this section, we first describe the bargaining en-vironment, and then give a formal definition of bargaining.Subsequently, we illustrate the bargaining protocol.

The bargaining environment is characterized as follows:

• Only one seller and one buyer are involved in the bargain-ing process.

• The bargaining is one-shot and involves one issue (i.e.,price).

• The value of the item or service will not decay as timegoes by.

• Reservation price, deadline, and offer function of bargain-ers are private knowledge. Therefore, each bargainer canonly observe an opponent’s past offers.

• Suppose that the acceptance range of one agent’s reserva-tion offer is inside the other’s acceptance range, and viceversa. This assumption is rational because the research re-sults show that if opponent’s reservation offer is outsideof an agent’s acceptance range, this agent will not havemotivation to concede [1].

• The aim of the buyer (seller) is to buy (sell) the item orservice at a price as low (high) as possible.

• In any round of bargaining, the buyer (seller) is not al-lowed to offer the seller (buyer) lower (higher) than thebuyer (seller) did in the previous round.

• Any offer given by agents cannot be canceled or with-drawn.

Based on these environmental constraints, a bargaining isformally defined as follows.

Definition 1 A bargaining is a 10-tuple B = (Ag,Acj ,

Protocol,RPs ,RPb,Ps(t),Pb(t), S, τj ,D), where

• Ag = {s, b} is a set of agents, where s is the seller and b

is the buyer.• Acj = {accept,offer,quit} is a set of actions that can be

chosen by agent j in each round the bargaining, wherej ∈ {s, b}.

• Protocol specifies the rules that govern the bargainingprocess. The alternating offer protocol is adopted in thispaper, which will be illustrated later in this section.

• RPs is the reservation price of seller s.• RPb is the reservation price of buyer b.• Ps(t) is the offer from seller s to buyer b in the t th

(t ≥ 0) round, which is delimited by the seller’s accep-tance range. The initial price of s is denoted as IPs

(IPs = Ps(0)).• Pb(t) is the offer from buyer b to seller s in the t th

(t ≥ 0) round, which is delimited by the buyer’s accep-tance range. The initial offer of b is denoted as IPb

(IPb = Pb(0)).• S is a set of bargaining strategies that can be used by

agents. Each strategy is an offering function of the bar-gainer.

• τj represents the maximal number of rounds that agentj is willing to bargain, where j ∈ {s, b}. τs and τb aregenerally independent of each other.

• D is the deadline of the bargaining process, which can beregarded as the minimal number of τs and τb, i.e., D =min{τs, τb}.

Bargaining protocol specifies the rules that govern the bar-gaining process, such as who can participate in the bargain-


ing, the states of the process, and some events that changethe state of the bargaining process. In the bargaining of thispaper, the seller and the buyer are selected randomly to makethe first offer. And bargainers alternate in making offers un-til an agreement is reached or one bargainer quits because itsdeadline has been reached. The characteristic of these rulesin this bargaining is similar to the alternating offer protocol[10], which is usually used to enable a seller and a buyer toalternately make offers until an agreement is achieved or onebargainer quits. However, the execution conditions of quit,accept and offer actions are different from the traditionalalternating offer protocol. Through modifying the executionconditions of these actions in the traditional alternating offerprotocol, formula (1) defines the alternating offer protocolused in this paper

Acx(t) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

quit t ≥ τx andP−x(t − 1) /∈ [IPx,RPx]

accept(P−x(t − 1)) P−x(t − 1) ≥ Px(t)

and x = s orP−x(t − 1) ≤ Px(t)

and x = b ort = τx andP−x(t − 1) ∈ [IPx,RPx]

offer(Px(t)) otherwise

(1)

where x ∈ {s, b} is the seller or the buyer, −x is the opponentof x, and t ≥ 1.

According to formula (1), a bargainer will not immedi-ately accept the opponent’s offer even though it is withinhis/her acceptance range. Instead, he/she will delay the ac-ceptance action until his/her deadline is reached or theseller’s (buyer’s) offer is below (higher) than the buyer’s(seller’s) next offer. This approach is consistent with the psy-chological research results about humans delaying their ac-ceptances as late as possible [22]. Moreover, in the case ofarticle or service’s value will not decay with time, this kindof concession has the following merits [23]: (a) bargainercan keep the reservation offer as a backup, (b) bargainerconcerns more for itself than its opponent, (c) bargainer canavoid winner’s curse.

As the seller and the buyer are selected randomly to makethe first offer, without loss of generality, we assume that theseller offers first in this paper. After the seller’s offering IPs

(i.e., Ps(0)), the buyer agent makes a counter-offer IPb (i.e.,Pb(0)). Then it is the seller’s turn to choose quit, accept oroffer action according to formula (1). If the seller’s next offeris smaller than or equal to the buyer’s offer, the seller willaccept the buyer’s offer. In that case, the bargaining endswith an agreement on transaction price at Pb(0). Otherwise,if this condition does not hold until the seller’s deadline, theseller will quit the bargaining process. If the buyer’s offer

is smaller than the one that the seller will offer next whenseller’s deadline is not reached, the seller agent will makea counter-offer Ps(1). Then it is the buyer’s turn to selectthe action of quitting, accepting or offering according to for-mula (1). Such a process continues until the deadline of thisbargaining is reached the seller’s next offer is smaller than orequal to the buyer’s offer, or the buyer’s next offer is largerthan or equal to the seller’s offer before deadline is reached.

4 A review of BLGAN bargaining strategy

Sim et al. [20] proposed a BLGAN algorithm that integratesthe Bayesian learning algorithm and the genetic algorithmto learn opponent’s reservation price and deadline. This al-gorithm can be divided into four steps. First, the Bayesianlearning algorithm is used to estimate opponent’s reserva-tion price. Based on the estimated an opponent’s reservationprice, the possible deadline of the opponent is calculated.Second, based on the estimated reservation price and dead-line, the agent adjusts its offer parameter (also called conces-sion factor) and generates a possible next proposal. Third,generic algorithm is used to find an optimal proposal that canmaximize the agent’s expected utility. Finally, the possiblenext proposal generated in the second step and the optimalproposal generated in the third step is compared. The betterone is chosen as the counter offer and sent to the opponent.As the first three steps are core of the BLGAN algorithm,following subsections only describe the execution formulasof the first three steps. Section 4.1 re-describes the main ideaand calculation formulas in the first step. Section 4.2 illus-trates the formulas in the second and the third steps.

4.1 Estimation of opponent’s reservation price anddeadline

Recall the bargaining environment that is specified inSect. 3, one agent’s reservation offer is inside the otheragent’s acceptance range and vise versa. Consequently,this agent can assume that the range of the opponent’sreservation price is inside it’s own (i.e., this agent’s) ac-ceptance range, which can be denoted as [min(IPx,RPx),max(IPx,RPx)], where x (x ∈ {s, b}) represents this agent.

In literature [20], the Bayesian Learning method is usedfor learning the opponent’s reservation price. In this method,four concepts are adopted in the calculation process, i.e.,the possible opponent’s reservation prices, the prior proba-bility, conditional probability, and posterior probability ofeach possible opponent’s reservation price. The ith pos-sible value of opponent’s reservation price is denoted asRPi−x ∈ [min(IPx,RPx),max(IPx,RPx)] (x represents thisagent and −x represents opponent agent). It is supposedthat there are H = max(IPx,RPx) − min(IPx,RPx) num-ber of possible reservation prices of opponent. As the

S.-j. Ji et al.

possible reservation prices range from min(IPx,RPx) tomax(IPx,RPx), we can rationally assume that these possiblereservation prices follow the uniform distribution. That is tosay, the RP1−x = min(IPx,RPx), RP2−x = min(IPx,RPx) +1, RP3−x = min(IPx,RPx)+2, . . . , RPH−x = max(IPx,RPx).Pt(RPi−x) denotes the prior probability of the ith possi-ble value of opponent’s reservation price, P(P−x(t)|RPi−x)

represents the conditional probability of the t th opponent’soffer (P−x(t)) given the ith possible value of opponent’sreservation price, and P(RPi−x |P−x(t)) represents the pos-terior probability of the ith possible value of the opponent’sreservation price, given opponent offer the P−x(t) in the t thround. Formulas (2)–(4) define these concepts [20]

Pt

(RPi−x

) ={

1/H if t = 0P(RPi−x |P−x(t)) otherwise

(2)

P(P−x(t)|RPi−x

) =2√2π

e− (P−x (t)−μi )2

2

∫ max(IPx ,RPx)

min(IPx ,RPx)2√2π

e− (P−x (t)−μi )2

2

(3)

P(RPi−x |P−x(t)

)

= Pt−1(RPi−x) × P(P−x(t)|RPi−x)∑H

i=1 Pt−1(RPi−x) × P(P−x(t)|RPi−x)(4)

where x ∈ {s, b}, −x is x’s opponent, H = max(IPx,RPx)−min(IPx,RPx),

μi = RPi−x × [1 + (−1)β × α(t)

],

β ={

1 if x = s,

0 if x = b,

α(t) =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

|1 − IP−x/IPx | if t = 0,

|1 − P−x(t) × [1 − α(t − 1)]/P−x(t − 1)|if t > 0 and x = s,

|1 − P−x(t) × [1 + α(t − 1)]/P−x(t − 1)|if t > 0 and x = b.

According to formula (2), the probability of each possi-ble opponent’s reservation price (i.e., Pt (RPi−x)) is assignedwith 1/H before bargaining begins (i.e., t = 0). In contrast,once the bargaining process really begins, the probability ofeach possible opponent’s reservation price is assigned withthe posterior probability of the corresponding possible op-ponent’s reservation price given opponent offer P−x(t) inthe t th round.

The posterior probability of each possible opponent’sreservation price given opponent offer P−x(t) in the t thround (i.e., P(RPi−x |P−x(t))) is calculated using the Baye-sian updating formula defined in formula (4). The condi-tional probability of the t th opponent’s offer (P−x(t)) given

the ith possible value of opponent’s reservation price in for-mula (3) (i.e., P(P−x(t)|RPi−x)) follows the normal distri-bution function N(μi,1).

Based on the concepts defined in formulas (2)–(4), for-mula (5) [20] defines the estimated value of opponent’sreservation price. According to formula (5), the estimatedreservation price of opponent is the mathematical expecta-tion of all possible reservation prices. Before bargaining re-ally begins (i.e., in the round of t = 0), the mathematicalexpectation is the summary of products between each pos-sible opponent’s reservation price and the prior probability(which is calculated according to formula (2)) of each possi-ble opponent’s reservation price. After the bargaining begins(i.e., t > 0), the mathematical expectation is the summary ofproducts between each possible opponent’s reservation priceand the posterior probability of the ith possible value of op-ponent’s reservation price given opponent offers P−x(t) inthe t th round (i.e., P(RPi−x |P−x(t)))

ERP−x(t) ={∑H

i=1 RPi−x × Pt (RPi−x) t = 0∑H

i=1 RPi−x × P(RPi−x |P−x(t)) t > 0(5)

where H = max(IPx,RPx) − min(IPx,RPx), RPi−x ∈[min(IPx,RPx),max(IPx,RPx)] (x represents this agentand −x represents opponent agent) is the ith possible valueof opponent’s reservation price, Pt (RPi−x) denotes the priorprobability of the ith possible value of opponent’s reser-vation price, and P(RPi−x |P−x(t)) represents the posteriorprobability of the ith possible value of opponent’s reserva-tion price when opponent offers the P−x(t) in the t th round.

Based on the estimated reservation price of opponent,a bargainer’s deadline can be estimated according to for-mula (6). The deadline is determined by five parameters, i.e.,opponent’s last two offers P−x(t) and P−x(t − 1), the cur-rent round of bargaining t , the estimated reservation price ofopponent ERP−x(t) and the estimated opponent’s conces-sion factor λt−x . The estimated opponent’s concession factor(see formula (7)) is a logarithmic function with five param-eters, i.e., independent variables of opponent’s offers in thepast three rounds, the current round of bargaining t , the pastround of bargaining t −1 and the estimated reservation priceof opponent

τ̂ t−x = t

[ P−x(t)−P−x(t−1)ERP−x(t)−P−x(t−1)

]1/λt−x

(6)

λt−x = log( t−1

t)

(P−x(t − 1) − P−x(t − 2)

P−x(t) − P−x(t − 1)

× |ERP−x(t) − P−x(t − 1)||ERP−x(t) − P−x(t − 2)|

)

(7)

where P−x(t) is the offer from opponent in the t th round,P−x(t − 1) and P−x(t − 2) is opponent’s offer in the roundof t − 1 and t − 2 respectively, ERP−x(t) is the estimatedreservation price of opponent that is made in the t th round.


4.2 Generate the possible offers

After estimating the opponent’s reservation price and dead-line, two possible offers are generated and compared inthe BLGAN algorithm. One is generated by adjusting thisagent’s concession factor according to the estimated reser-vation price and deadline of opponent, the other is gener-ated using the generic algorithm. Formula (8) [20] shows thefirst offer generation function, which is defined with param-eters of this agent’s recent offer in the round of t − 1 (i.e.,Px(t − 1)), this agent’s reservation price RPx , this agent’sdeadline τx , current round of bargaining t and this agent’sown concession factor λt

x . Agent x’s concession factor λtx

in the t th round is calculated according to formula (9) [20].

Px(t) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

Px(t − 1) + |RPx − Px(t − 1)| × ( 1τx−t+1 )λ

tx

if x = b

Px(t − 1) − |RPx − Px(t − 1)| × ( 1τx−t+1 )λ

tx

if x = s

(8)

where λtx is the concession factor of agent x in the t th round,

Px(t − 1) is the offer of agent x in the round of t − 1, RPx

is the reservation price of agent x and τx is the deadline ofagent x.

λtx =

∣∣∣∣ log τ̂ t−x−t+1

τx−t+1

(

max

{

0,ERP−x(t) − Px(t − 1)

RPx − Px(t − 1)

})∣∣∣∣ (9)

where Px(t − 1) is the offer of agent x in the round of t − 1,ERP−x(t) is the estimated reservation price of opponent inthe t th round, RPx is the reservation price of agent x andτ̂ t−x is the estimated deadline of opponent’s in the t th round.

From formula (8), we can see that bargainer’s conces-sion is determined by its own reservation price, its lastoffer, its deadline, current bargaining round, and its con-cession factor. The concession factor (formula (9)) of thisagent is a log function with parameters of estimated op-ponent’s deadline τ̂ t−x , current bargaining round t , last re-cent offer of this agent Px(t − 1), the reservation price ofthis agent RPx and the estimated reservation price of op-ponent ERP−x(t). If x = s and Ps(t − 1) ≥ ERP−s(t), λt

s

is at infinity because RPs < Ps(t − 1) (where t < τs ) andmax{0,

Ps(t−1)−ERP−s (t)Ps(t−1)−RPs

} = 0, therefore, this seller agentshould choose Sit-and-Wait strategy. In contrast, if x = s

and RPs ≤ ERP−s(t), the best course of action for the selleragent is to terminate the negotiation immediately to avoidwasting computational resources in haggling in a fruitlessnegotiation [20]. The buyer agent can achieve similar re-sults with symmetric conditions among Pb(t − 1), RPb andERP−b(t).

To optimize the offer of agent in the t th round, Sim et al.generate another possible offer using the generic algorithm(GA), which is described in Fig. 1 [20]. The main idea of this

generic algorithm is to search the optimal individual (i.e., theone whose fitness is maximal) in the populations which aredefined or generated by crossover and mutation operationin the search space. The first step of a generic algorithm isto define the search space in each round of bargaining ac-cording to formula (10). Then, the generations counter, pop-ulation size, and population are initialized respectively. Be-sides, the fitness calculation function is defined in formula(11). Based on these pretreatments, the searching process isexecuted by repeating the crossover operation, the mutationoperation, and the selection operation until the generationcounter reaches the maximum number.

After these two possible offers are calculated by formula(8) and the generic algorithm respectively, the agent can se-lect the one whose fitness is much larger and offer it to itsopponent in the t th round.

5 Improvement of the BLGAN bargaining strategy

Because of the reasons such as absolutely rationality, fearof business loss, reciprocity heuristics [24], sense of fair-ness [25], self-satisfaction [26], bounded rationality, the be-lief that parties are likely to benefit from working together,the faith in one’s own problem solving ability [27], or evento gather more information about their opponents [28], thesoftware agents may concede in different manners. For in-stance, they may insist on their initial offer until the dead-line, may make consecutive concession in each round of bar-gaining, or make concessions along with some periods ofsitting and waiting, and so on. Software agents should havethe abilities of dealing with all kinds of opponents.

However, the BLGAN algorithm proposed by Sim et al.[20] is designed under the assumption that bargainers makeconsecutive concessions, which can only cover parts of of-fer patterns. In the situations that bargainers insist on theirinitial offer until the deadline, or make concessions alongwith some sit-and-wait periods, some formulas in BLGANalgorithm will be invalid. For example, formula (6) is invalidwhen P−x(t −1) = P−x(t) or ERP−x(t) = P−x(t −1). Thatis because the denominator of the right part of Eq. (6) is zeroif P−x(t − 1) = P−x(t), which is not allowed in any frac-tion. Moreover, if ERP−x(t) = P−x(t − 1), the denominatorof P−x(t)−P−x(t−1)

ERP−x(t)−P−x(t−1)is also zero. Similarly, the definition

of λt−x should be invalid because the fact that the denom-inator of formula (7) will be zero if P−x(t − 1) = P−x(t)

or ERP−x(t) = P−x(t − 2). Besides, as the parameter oflog function can not be zero, formula (7) will be invalid ifP−x(t − 1) = P−x(t − 2) or ERP−x(t) = P−x(t − 1).

To make the BLGAN algorithm applicable to all thepossible offer patterns of opponent, we improve this algo-

S.-j. Ji et al.

1. Define the search space of the t th round according formula (10)

SP(t) = [max

(Px(t − 1),Px(t − 1) − δ

),min

(min

(P−x(t − 1),Px(t)

),RPx

)](10)

where Px(t − 1) and Px(t) is the offer of agent x in the round of t − 1 and t respectively, P−x(t − 1) is the offer of agent−x in the round of t − 1, and RPx is the reservation price of agent x.

2. Set the generations counter g = 0.3. Set the population size N = 200.4. Initialize the population Po(g).5. Calculate the fitness for each individual in Po(g) according to formula (10) with α = 0.9.

α ×(

1 −(

t

τx

)2)

× U(Po(g)

) +(

1 − α ×(

1 −(

t

τx

)2))

×(

1 − |Po(g) − P−x(t)|max(IPx,RPx) − min(IPx,RPx)

)

(11)

where τx , IPx , RPx is the deadline, the initial price and the reservation price of agent x respectively. P−x(t) is the offerof agent −x in the t th round. U(Po(g)) is the utility of Po(g).

6. While g is smaller than the maximum number of generations,a. copy all the individuals to a temporary population TPo(g).b. Perform crossover on TPo(g) with rate of 0.8.c. Perform mutation on TPo(g) with rate of 0.1.d. Calculate the fitness of each individual in TPo(g) according to formula (10) with α = 0.9.e. Use tournament selection to select N individuals from TPo(g) and Po(g) to form a new population Po(g).f. Increment g with 1.

7. Return the best individual in the last generation.

Fig. 1 Generic algorithm for generating next offer [20]

rithm in this section. Using the improved BLGAN bargain-ing strategy, an agent executes similar steps in each round ofbargaining. (1) Uses Bayesian learning (BL) method to esti-mate opponent’s reservation, and then estimates opponent’sdeadline based on the estimated reservation price. (2) Gener-ates a possible offer by adjusting its concession factor basedon the estimated reservation price and the deadline of op-ponent. (3) Generates another possible offer using genericalgorithm (GA). (4) Selects the better one of these two of-fers as the offer in this round. The improvement of this al-gorithm mainly focuses on the method of estimating oppo-nent’s deadline, which is defined in formula (12) and for-mula (13).

τ̂ t−x =⎧⎨

⎩

2 · τx if (P−x(t − 1) = P−x(t))t

[ P−x (t)−P−x (t−1)

ERP−x (t)−P−x (t−1)]1/λt−x

otherwise (12)

λt−x =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

20 if ERP−x(t) = P−x(t − 1)‖P−x(t − 1)

= P−x(t − 2)

log( t−1

t)(P−x (t−1)−P−x(t−2)P−x(t)−P−x(t−1)

× |ERP−x(t)−P−x(t−1)||ERP−x(t)−P−x(t−2)| )

otherwise

(13)

where P−x(t) is the offer from opponent in the t th round,P−x(t − 1) and P−x(t − 2) is opponent’s offer in the round

of t − 1 and t − 2 respectively, ERP−x(t) is the estimatedreservation price of opponent that is made in the t th round.

Formula (12) specifies that the estimated deadline ofopponents in the t th round is assumed to be a value astwice as its own deadline, if opponent makes similar of-fers in last two rounds. Otherwise, the estimated deadlineof opponents is calculated by using formula (6). Formula(12) improves formula (6) by adding a definition in caseof P−x(t − 1) = P−x(t), in which formula (6) will be in-valid because the denominator is equal to zero. Similarly,formula (13) improves formula (7) by adding a definition forcases that the independent variable of log function equals tozero, i.e., if opponent’s estimation reservation price in the t thround is equal to the opponent’s offer in the round of t − 1(i.e., P−x(t − 1) = P−x(t − 2)) or if opponent’s last offer isequal to its further last offer (i.e., ERP−x(t) = P−x(t − 1)).In these cases, the value of log function is equal to infin-ity. Since infinity can not be represented in the computer, informula (13), we use a very large number (i.e., 20) to rep-resent the possible infinity value of λt−x . In a word, theseimprovements in formula (12) and (13) just make up for thedeficiency (i.e., invalid cases) of formulas (6) and (7).

Except for the calculation formulas of opponent’s estima-tion deadline, we also extend the concession factor calcula-tion formula of this agent (i.e., formula (9)) into formula(14). Formula (9) is extended because of the fact that com-


puter can not represent an infinity.

λtx =

⎧⎪⎪⎨

⎪⎪⎩

20 if Px(t−1)−ERP−x(t)Px(t−1)−RPx

≤ 0

| log τ̂ t−x−t+1τx−t+1

(max{0,Px(t−1)−ERP−x(t)

Px(t−1)−RPx})|

otherwise

(14)

According to formula (14), this agent’s concession factorλt

x is assigned with a very large number (i.e., 20) when theindependent variable of log function equals zero. With thislarge concession factor, this agent’s concession (i.e., |RPx −Px(t − 1)| × ( 1

τx−t+1 )λtx ) in the t th round is nearly equal to

zero, which means that this agent chooses the Sit-and-Waitstrategy.

6 Experiments

To verify the performance of the improved BLGAN, we se-lect the Sit-and-Wait strategy, the Lee et al. strategy, andthe optimal settings of strategy proposed by Brozostowski etal. for comparison. These strategies are selected because allof them use the method of predicting opponent’s deadlineor reservation price, and making concessions according tothe prediction results. To compare the performance of thesestrategies, three sets of experiments are designed and im-plemented, in which the compared strategies are adopted bybuyers. These buyers bargain with sets of sellers in each setof experiments. The sellers in the first and second sets of ex-periments offer according to fixed patters. In the first set ofexperiments, the buyers bargain with sellers who make con-secutive concession in the bargaining process. In the secondset of experiments, these buyers deal with sellers who willsit and wait for some rounds during making concessions. Al-though these two kinds of fixed offering sellers cover largecategories of human, there still exist some people who of-fer without any pattern. Therefore, in the third set of ex-periments, five kinds of chameleonic sellers who randomlychoose their offering patterns are designed to bargain withthe buyers who use the compared strategies. Following sec-tions illustrates the details of these experiments. Section 6.1illustrates the design details of the sellers in these three setof experiments. Besides, the variables and constants of thesesellers, the buyers and the bargaining strategies are initial-ized. Section 6.2 specifies the benchmarks for evaluatingthese compared strategies. Finally, Sect. 6.3 summarizes theresults obtained from these three sets of experiments anddiscusses the reasons that lead to these results.

6.1 Experimental setting

In the first set of experiments, we construct 14 kinds of sell-ers based on bargainer’s tactics proposed by Faratin, et al.

Table 1 Configurations of sellers who offer consecutively

Sellers β Eqn. Sellers β Eqn.

1 2 (15) 8 2 (16)

2 5 (15) 9 5 (16)

3 8 (15) 10 8 (16)

4 1 (15) 11 1 (16)

5 0.3 (15) 12 0.3 (16)

6 0.5 (15) 13 0.5 (16)

7 0.7 (15) 14 0.7 (16)

[16] and Hou [17]. Taking time and the opponent’s behav-ior as decision factors, these sellers (see Table 1) concedeconsecutively according to formula (15) or (16) during bar-gaining. The sellers 1–7 who offer according to formula (15)are the absolutely Tit-for-Tat (ATFT) ones, and the sellers8–14 who offer according to formula (16) are the relativelyTit-for-Tat (RTFT) ones. The Tit-for-Tat degree of these sell-ers is represented by their behavior dependence degree (i.e.,1 − BD) on their opponents offers. Except for opponent’sbehavior, seller’s offers are also affected by time (negotia-tion rounds in this paper).

Ps(t) = (IPs − (IPs − RPs) × (t/τs)

β) × BD

+ (Ps(t − 1) − (

Pb(t − 1) − Pb(t − 2))

× (1 − BD) (15)

Ps(t) = (IPs − (IPs − RPs) × (t/τs)

β) × BD

+ (Ps(t − 1) × (

Pb(t − 2)/Pb(t − 1))

× (1 − BD) (16)

where τs represents the maximal number of rounds that theseller is willing to bargain, BD ∈ [0,1] is the weight of theseller’s behavior dependence on its own offer, 1−BD is theweight of the seller’s behavior dependence on its opponent’soffer, β is the factor reflects seller’s time dependence.

Table 2 illustrates parameters of the sellers in the secondsetup experiments. Except for offering according to formula(15) or (16), sellers 15–20 also have one or two sit-and-waitduration(s) (see column “S. & W.” in Table 2). Similar tosellers 1–7, seller 15, 17, and 19 offers according to for-mula (15) respectively, thus, they belong to the absolutelyTit-for-Tat (ATFT) sellers. In contrast, sellers 16, 19, and 20are relatively Tit-for-Tat (RTFT) ones because they offer ac-cording to formula (16). Especially, seller 21 is an absolutely“Sit-and-Wait” one because it insists its initial offer until itsdeadline is reached.

Besides these sellers who offer according to fixed func-tions, in the third set of experiments, we construct 5 kindsof chameleonic seller agents (see Table 3), and repeat eachbargaining process 200 times to rule out the influence of ran-dom strategy selection. Taking chameleonic seller 1 (noted

S.-j. Ji et al.

as Ch.seller1 in Table 3) as an example, the constructionmethod of these chameleons is explained as follows. At thebeginning of a bargaining, chameleonic seller 1 randomlychooses a strategy from strategies 1–7 and strategies 8–14with 1:1 ratio. In another words, at the beginning of a bar-gaining process, the chameleonic seller 1 randomly acts asseller 1, seller 2, . . . , seller 13 or seller 14 defined in Ta-ble 1. After offering according to the randomly chosen strat-egy for several rounds (10 is used in this paper), it willrandomly select another strategy from 1–7 and 8–14 with1:1 ratio again. This kind of strategy choosing and offeringcontinues until an agreement is achieved or the bargainingdeadline is reached. The randomly chosen ratio 1:1 meansthat the probability of acting as seller 1, seller 2, . . . , seller7 is 50 %, and the one of acting as seller 8, seller 9, . . . ,seller 14 is 50 % too. Moreover, the probabilities of actingas seller 1, seller 2, . . . , seller 7 are equal. Therefore, theprobabilities of acting as seller 1 or 2, or . . . , 14 are equally1/14(= 1/2 × 1/7). Other chameleonic sellers offer simi-larly. For example, since Ch.seller5 randomly chooses one

Table 2 Configurations of sellers who offer consecutively with sit-and-wait duration(s)

Sellers β Eqn. S.&W.

15 5 (15) [12, 17], [28, 38]

16 1 (16) [12, 17], [28, 38]

17 0.5 (15) [12, 17], [28, 38]

18 0.5 (16) [7, 45]

19 5 (15) [7, 45]

20 1 (16) [7, 45]

21 Identical

Table 3 Configuration of chameleonic sellers

Sellers Strategies/Sellers Random chosen ratio

Ch.seller1 1–7 : 8–14 1 : 1

Ch.seller2 1–7 : 8–14 3 : 1

Ch.seller3 1–14 : 21 1 : 1

Ch.seller4 1–7 : 8–14, 21 2 : 1

Ch.seller5 1–7 : 8–14, 21 1 : 2

strategy from strategies 1–7: 8–14, 21 with 1 : 2 ratio, theprobability of choosing a strategy from 1–7 is 1/3, and theone of choosing a strategy from 8–14 and 21 is 2/3. Besides,strategies 1–7 are chosen equally, hence, the chosen proba-bility of strategy 1,2, . . . ,7 is equally 1/21(= 1/3 × 1/7).Similarly, the chosen probability of strategy 8,9, . . . ,14,and 21 is 1/12(= 2/3 × 1/8) respectively.

The configuration of related parameters and constantsof the sellers and the buyers are listed in Table 4, whichare initialized as follows. (1) For simplification, BD is as-signed with 11 different values (i.e., 0, 0.02, 0.04, 0.04, 0.08,0.1, 0.3, 0.5, 0.7, 0.9, and 1) respectively. Therefore, thereare 154(14 × 11) consecutively offering sellers, 77(7 × 11)

Sit-and-Wait intervened concession sellers, and 55(5 × 11)

chameleonic sellers who bargain with the buyers that adoptthe compared strategies in these three sets of experiments.(2) The deadlines of seller agents are assigned with 50 (i.e.,τs = 50). And the deadlines of the buyers are assigned with50 (i.e., τb = 50) and 49 (i.e., τb = 49) respectively. So, forany pair of seller and buyer, there are two experiment cases(i.e., seller offers last and buyer offers last) that should be ex-ecuted. (3) Ps(0) = IPs = 50, Pb(0) = IPb = 10, Ps(50) =RPs = 10, Pb(50) = RPb = 50. As the acceptance ranges ofthe seller and the buyer are overlapped, an agreement canalways be reached between each pair of seller and buyer.(4) As the experimental environment in Sim et al. [20] issimilar to this paper, we choose the similar setting of [20](the weight of fitness function (denoted as α), the crossoverrate of genetic algorithm, and the mutation rates of the ge-netic algorithm are assigned with 0.9, 0.8 and 0.1 respec-tively). Besides, to complement the shortage of Sim et al.’swork[20] that did not explicitly assign a value for the searchspace adaptation factor(denoted as δ), in this paper, we as-sign δ with 1, 5, 10, 20, 30, 40, and 50 respectively.

6.2 Performance measure

Since the aim of this paper is to improve agent’s ability inbargaining with various kinds of opponents, individual gainsare the most important factor cared by agents. In this paper,we use two benchmarks to evaluate individual gains of a pairof bargainers and to evaluate the performance of a strategy.

Recall that bargainer’s deadline is private knowledge,thus, we can generally assume that the probabilities of seller

Table 4 Configuration ofparameters in experiments Parameters Values Parameters Values

BD 0, 0.02, 0.04, 0.04, 0.08, 0.1,0.3, 0.5, 0.7, 0.9, 1

δ in GA 1, 5, 10, 20, 30, 40,50

τs 50 α in GA 0.9

τb 50, 49 crossover rate of GA 0.8

IPs/RPs 50/10 mutation rate of GA 0.1

IPb/RPb 10/50


Table 5 Comparison between strategies when faced with consecu-tively concession sellers

Strategies ATP

Sit-and-Wait strategy (S-W) 15.35

Strategy proposed by Brozostowski et al. (B) 15.35

Strategy proposed by Lee et al. (L) 46.48

Improved BLGAN strategy (S) 15.35

offering last and buyer offering last are equal given a pair ofseller and buyer. Under this assumption, we first average thetransaction prices that are achieved when buyer offers lastand those achieved when seller offers last, and then take theaverage transaction price as a benchmark on evaluating in-dividual gains of this pair of bargainers (denoted as PATP).The higher the PATP is, the more (less) the seller (buyer)gains.

To further analyze the overall performance of each bar-gaining strategy, we average the PATPs that are achieved inall pairs of sellers and buyers and take the result as anotherbenchmark (denoted as ATP). Similar to PATP, the higherthe ATP is, the more(less) the seller (buyer) gains. There-fore, the best strategy of a buyer is the one that enables thebuyer agent to achieve the lowest ATP. And the best strategyof a seller is one that enables the seller to achieve the highestATP.

6.3 Results and discussion

6.3.1 Results when bargaining with consecutively offeringsellers

Table 5 lists the ATPs achieved in the first set of experi-ments, in which the buyers use the Sit-and-Wait strategy (ab-breviated as S-W), the Lee et al. strategy (abbreviated as L),the optimal settings of strategies proposed by Brozostowskiet al. (abbreviated as B, the optimal configuration of whichis wtb = 0 or ϕ ≤ 0.0005 that is tested by Ji and Leung[29]), and the improved BLGAN strategy (abbreviated as S)given in this paper respectively. The sellers use the consec-utively offering functions listed in Table 1. From Table 5,we can see that the buyers who use the “S-W” strategy, the“B” strategy, and the “S” strategy achieve similar ATPs (i.e.,15.35). Besides, the ATPs achieved by using these strategiesare largely smaller than the one (i.e., 46.48) achieved by us-ing the “L” strategy. Therefore, the “S-W” strategy, the “B”strategy, and the “S” strategy are equivalent to each other.And the “L” strategy is the worst one of these four strategiesfor the buyer who bargains with consecutively offering sell-ers. The above results are caused by the different characteris-tics of these strategies. Following observations and analysissummarize these characteristics.

Observation 1: The value of δ in the improved BLGANstrategy has no effect on the bargaining results.

Analysis: When bargaining with sellers that make conces-sions consecutively (i.e., 1–14 listed in Table 1), the buyerswho used the improved BLGAN strategy achieve similarPATPs no matter what value δ is assigned with.

Observation 2: The Sit-and-Wait strategy, the optimal con-figuration of strategies proposed by Brozostowski et al.,and the improved BLGAN strategy are equivalent to eachother when bargaining with sellers who will make consec-utive concessions.

Analysis: Fig. 2 shows the comparison curves of PATPsachieved in the first set of experiments. The curves inFig. 2(a), (b), (d) have following properties.

(a) The curves in Figs. 2(a), (b) and (d) are divided into 7groups respectively. No matter if the sellers are ATFTones or RTFT ones, once they have a similar value ofβ , they will achieve a similar PATPs when bargain-ing with buyers who adopt the “S-W” strategy, the“B” strategy and the “S” strategy. In another words,the curves of seller 1, 2, 3, 4, 5, 6, 7 are overlappedwith those of seller 8, 9, 10, 11, 12, 13, and 14 respec-tively. That is because the agents who adopt these threestrategies act similarly when bargain with sellers whohave similar time dependence (β) and similar behaviordependence (BD) no matter they are ATFT or RTFTones.

(b) The larger the time dependence (i.e., β), the steeperthe curve is. As the time dependences of seller 3 andseller 10 are the largest ones (i.e., β = 8), the curvesof them are the steepest one in these figures, which isfollowed by those of sellers 2 (or 9), 1 (or 8), 4 (or 11),7 (or 14), 6 (or 13), and 5 (or 12). The curves of sellers5 and 12 are the flattest curves. That is because timedependence directly affects sellers’ concession-makingpatterns, which is explained in Faratin et al. [16].

(c) All the bargainers achieve similar PATPs (i.e., 29.6)when sellers’ behavior dependence (BD) equals zero.With the increase of BD, the PATPs decrease grad-ually nearly to 10, which is the reservation price ofseller. That is caused by the fact that the buyers whouse the “S-W”, “B” and “S” strategies can exploit thesellers who will to make concession without consider-ing whether their opponents make concession or not.

Observation 3: The PATPs achieved when buyers use Leeet al.’s strategy are correspondingly higher than those whenbuyers use the other strategies.

Analysis: The PATPs (shown in Fig. 2(c)) when buyers usethe strategy given by Lee et al. have following characteris-tics.

S.-j. Ji et al.

Fig. 2 PATPs when buyers bargaining with sellers 1–14. (a) Buyer usethe Sit-and-Wait strategy; (b) buyer use the optimal strategy proposedby Brozostowski et al.; (c) buyer use the strategy proposed by Lee etal.; (d) buyer use the improved BLGAN strategy. Note: (1) The vertical

axises of these figures are averaged transaction prices of each pair ofbargainers (i.e., PATP) in cases of seller offers last and buyer offers lastfor. (2) The vertical axises of (a), (b) and (d) vary from 10 to 30, andthe one of (c) varies from 20 to 50

(a) Similar to the phenomenon shown in Fig. 2(a), (b) and(d), the sellers who have similar time dependences (i.e.,β) achieve similar PATPs curves. That is because theagents who adopt the strategy proposed by Lee et al.act similarly when bargaining with sellers who havesimilar time dependence and similar behavior depen-dence no matter if they are ATFT or RTFT ones.

(b) The larger the time dependence of sellers, the steeperthe curve is. In comparison to the curves in Fig. 2(a),(b) and (d), the curves in Fig. 2(c) decrease slowlywhen sellers’ behavior dependence on its own issmaller than 0.1 (i.e., BD < 0.1), while decreasingsharply when sellers’ BD > 0.1. That is because thebuyer who uses “L” strategy can not explore its oppo-nents like the other three strategies do.

(c) When sellers’ BD equals to zero, the PATPs achievedby sellers is 49.2, which is greatly larger than those(i.e., 29.6) achieved when using the other three strate-

gies. Similar to the trends of curves in Figs. 2(a), (b),and (d), the curves in Fig. 2(c) decrease with the in-crease of sellers’ BD, which show that the agent whoadopt the “L” strategy can also exploit the sellers whomake concession without considering whether theiropponents make concession or not. But the exploita-tion ability is not as good as other strategies.

6.3.2 Results when bargaining with sellers who concedewith sit-and-wait duration(s)

The results of the second set of experiments are summarizedin Table 6, in which “S-W”, “B”, “L”, and “S” has similarmeaning of first set of experiments. In this set of experi-ments, the buyers use these four compared strategies, andthe sellers use the offering function listed in Table 2. FromTable 6, we can see that the buyers who use the “L” strategyachieve the highest ATP (i.e., 47.06). Other strategies make


Table 6 Comparison between strategies when faced with sellers whowill to sit-and-wait

Strategies ATP

Sit-and-Wait strategy (S-W) 19.34

Strategy proposed by Brozostowski et al. (B) 19.34

Strategy proposed by Lee et al. (L) 47.06

Improved BLGAN strategy (S) 19.34

buyer achieve similar ATPs (i.e., 19.34). Therefore, the “L”strategy is the worst one for buyers who bargain with sellersthat concede with one or some sit-and-wait duration(s). Forfollowing observations and analysis show the characteristicsof these strategies and explain the reason of these results.

Observation 4: The Sit-and-Wait strategy, the optimal con-figuration of strategies proposed by Brozostowski et al.,and the improved BLGAN strategy are equivalent whenbargaining with sellers 15–21.

Analysis: The PATPs curves between sellers 15–21 and thebuyers who use these strategies have following properties(see Fig. 3(a), (b) and (d)).

(a) The PATPs are affected by seller’s time dependence(i.e., β) and sit-and-wait durations. Given similar sit-and-wait duration(s), the larger the time dependence,the steeper the PATP curve is. That can be seen fromcurves of sellers 15, 16, 17 and curves of sellers 18,19, 20 in Fig. 3(a), (b) and (d). The PAPTs of seller17 are correspondingly larger than those of seller 16,no matter what value the seller’s BD is. The PAPTsof sellers 16 and seller 15 have similar property. Be-sides, given similar time dependence, the longer thesit-and-wait duration(s), the flatter the curve is. Thatcan be seen from Fig. 4, in which the buyers adoptthe improved BLGAN strategy. From Fig. 4(a), we cansee that the PATPs of seller 19, seller 15 and seller2 decrease with their sit-and-wait duration when sell-ers’ BD is smaller than 0.7. If seller’s BD is largerthan 0.8, they have similar PAPTs. This phenomenonalso exist in Fig. 4(b), (c) and (d), no matter the sell-ers are ATFT or RTFT. The results are lead by the factthat the buyers who adopt these strategies can not ex-ploit sellers with sit-and-wait duration(s) as they doon the consecutive concession sellers, especially whenseller’s BD is smaller than 0.7.

(b) When BD = 0, all the PATPs are 29.6, regardless ofthe strategies (e.g., Sit-and-Wait, Brozostowski et al.,Lee et al. and improved BLGAN) adopted by buyers,the number of the sit-and-wait intervals and the dura-tion of each interval. The PATPs decrease with the in-crease of BD. These phenomena also appear in the firstset of experiments.

(c) When bargaining with seller 21, the PATPs alwayskeep the same value (i.e., 29.6) no matter what valuesBD is assigned with.

Observation 5: The PATPs achieved when buyers use thestrategy proposed by Lee et al. are correspondingly largerthan those when buyers use the other three strategies.

Analysis: When dealing with sellers who will use sit-and-wait, the PATPs (shown in Fig. 3(c)) when buyers use thestrategy given by Lee et al. have following characteristics.

(a) The curves are divided into 4 groups. The curve ofsellers 15, 16, 17 are correspondingly overlapped withthose of sellers 19, 20, and 18. The curve of seller21 is nearly overlapped with those of sellers 17 and18. These groups are classified according to the dif-ferent value of sellers’ time dependent factor (i.e., β).The larger the time dependence is, the smaller thePATPs is. Therefore, the curve of seller 19 and seller 15are the steepest curves in Fig. 3(c). It should be notedthat the PATPs between seller 21 and buyers who adoptthe L strategy are not affected by seller’s behavior de-pendence (i.e., BD). Those results are caused by thefacts that “sit-and-wait” duration(s) of sellers 15–21has no effects on the PATPs between the sit-and-waitsellers and the Lee et al. buyers.

(b) When behavior dependence of sellers 15–21 equals tozero, the PATPs achieved when buyer use the L strategyis 49.2, which is different from the PATPs (i.e., 29.6)achieved while buyers adopt the other three strategies.These results and reasons are similar to those in thefirst set of experiments.

6.3.3 Results when deal with chameleonic offering sellers

Table 7 summarizes the results that are gained in the third setof experiments, in which the abbreviated labels of “S-W”,“B”, “L” and “S” represent the “Sit-and-Wait” strategy, theoptimal strategy of Brozostowski et al., Lee et al. and the im-proved BLGAN strategy respectively. The columns of thistable list the PATPs when buyers use the four comparedstrategies. The rows of this table illustrate the PATPs of thechameleonic sellers who are constructed according to Ta-ble 3. According to the “Average” row, we can see that thebuyers who use the improved BLGAN strategy achieve thelowest ATPs (i.e., 15.28). In contrast, the buyers who usethe strategy given by Lee et al. achieve the highest ATPs(i.e., 46.62). Therefore, the “S” strategy can maximize buy-ers’ profit. Besides, the “S-W” strategy is slightly better thanthe “B” strategy because using the former one the buyerscan achieve lower ATPs. Following observations and analy-sis explain the results listed in Table 7 and the characteristicsof these four strategies (see Fig. 5).

S.-j. Ji et al.

Fig. 3 PATPs when buyers bargaining with sellers 15–21. (a) Buyeruse the Sit-and-Wait strategy; (b) buyer use the optimal strategy pro-posed by Brozostowski et al.; (c) buyer use the strategy proposed byLee et al.; (d) buyer use the improved BLGAN strategy. Note: (1) The

vertical axises of these figures are averaged transaction prices of eachpair of bargainers (i.e., PATP) in cases of seller offers last and buyeroffers last for. (2) The vertical axises of (a), (b) and (d) vary from 10to 30, and the one of (c) varies from 25 to 50

Observation 6: Even when bargaining with chameleonicsellers, the PATPs curves of buyers using the improvedBLGAN strategy are close to those when buyers use theSit-and-Wait strategy and the optimal strategy proposed byBrozostowski et al. And all these curves are considerablylower than the curves when buyers use the strategy pro-posed by Lee et al.

Analysis: Recalling the conclusion drawn from the first andsecond sets of experiments, the “S” strategy is equivalentto the “S-W” strategy, the “B” strategy, and is consider-ably better than the “L” strategy when bargaining with con-secutively conceding sellers and the sellers may sit andwait when making offer according to fixed patterns. There-

fore, although the chameleonic sellers frequently changetheir concession-making patterns, these patterns are se-lected from the basic concession-making patterns such as1–14 and 21. Hence, during the rounds that sellers do notchange their concession-making patterns, the PATP curvesstill exhibit the characteristics of fixed concession-makingpatterns (i.e., observation 1–5 drawn from the first and sec-ond set of experiments). The difference among the curveslabeled with “S-W”, “B”, “S” and “L” is due to the fact thatthese strategies have different prediction and concessionmechanisms, which consequently lead to different abili-ties in dealing with the change of sellers’ offering patterns.Because the ability of dealing with changing concession-


Fig. 4 PATPs when buyers bargaining with sellers who have similartime dependence and different sit-and-wait duration(s). Note: (1) Thevertical axises of these figures are averaged transaction prices of each

pair of bargainers (i.e., PATP) in cases of seller offers last and buyeroffers last for

making patterns of the “S” strategy is close to the strate-gies of “S-W” and “B”, the PATP curves of the “S” strat-egy is close to those of the “S-W” strategy and the “B”strategy. Besides, as all these strategies have better abilitiesto deal with changing concession-making patterns than the“L” strategy, hence, their PATP curves are lower than thatof “L” strategy.

Observation 7: When bargaining with chameleonic seller 1and chameleonic seller 2, who will not sit-and-wait dur-ing bargaining, the differences among the PATPs curves ofthe improved BLGAN strategy, the Sit-and-Wait strategyand the strategy proposed by Brozostowski et al. are small.In contrast, when dealing with chameleonic sellers 3, 4 and5 who may sit-and-wait during bargaining, the differencesbetween the PATPs curves of the improved BLGAN strat-egy and those of the other strategies are large.

Analysis: That is because the prediction and concessionmechanisms of the “S” strategy enable the buyers to swiftlyadapt its concession according to the frequent change ofthe sellers’ concession patterns. In contrast, the agentswho adopt the “S-W” strategy, the “B” strategy, and the

Table 7 Average transaction prices when sellers are chameleons

S-W B L S

Ch.seller1 15.30 15.31 46.45 15.19

Ch.seller2 15.31 15.38 46.67 15.16

Ch.seller3 16.42 16.55 46.69 15.47

Ch.seller4 16.07 16.13 46.64 15.31

Ch.seller5 16.20 16.17 46.65 15.27

Average 15.86 15.91 46.62 15.28

“L” strategy can not catch up with the frequently al-teration of sellers concession-making patterns especiallywhen buyers adopt these strategies are bargaining with sell-ers who frequently change their actions from making con-cessions to not making concessions (sitting-and-waiting)(e.g., chameleonic seller 3–5).”Based on the results obtained from the first, second andthird sets of experiments, we can draw the following con-clusions.

(1) When dealing with the sellers who make consecutiveconcessions (i.e., sellers 1–14) and the sellers who

S.-j. Ji et al.

Fig. 5 Averaged transaction prices (ATPs) between chameleonic sellers and the buyers who use the compared strategies


concede with one or some duration(s) of sitting-and-waiting (i.e., sellers 15–20), the improved BLGANstrategy is as good as the Sit-and-Wait strategy and thestrategy proposed by Brozostowski et al. All of themare better than the strategy proposed by Lee et al.

(2) When faced with the sellers who Sit-and-Wait inthe whole process of bargaining (i.e., seller 21) andthe sellers whose behavior dependence are zero (i.e.,BD = 0), the improved BLGAN strategy can intel-ligently and quickly make buyers chose Sit-and-Waitaction in response.

(3) When the buyers who adopt these strategies bargainwith sellers that frequently vary their offering patterns(i.e., chameleonic sellers 1–5), the improved BLGANstrategy is better than the Sit-and-Wait strategy, thestrategies proposed by Brozostowski et al. and Lee etal. This is because the BLGAN strategy enables buyersto swiftly adjust their actions between making recipro-cal concession and sitting-and-waiting.

7 Conclusions

Under the incomplete information and various rationalityenvironments, the design of optimal bargaining strategy isa tough task. Although lots of bargaining strategies are pro-posed, there still exists some improvement space to optimizethe automated agent’s bargaining strategy. Sim et al. [8] gavea BLGAN strategy and proved that this strategy is an ex-cellent strategy for dealing with opponents when the dead-line, reservation price and rationality of opponents are un-known. Whereas Sim et al.’s [8] BLGAN strategy was onlydesigned to deal with consecutively-conceding opponents.The significance and contributions of this work is improvingthe BLGAN strategy to deal with multifarious opponents,including consecutively-conceding opponents, sitting-and-waiting opponents, opponents who concede with fixed pat-terns and opponents who concede without any certain pat-tern.

In this paper, we improve the formulas about opponent’sdeadline estimation and the formulas about concession fac-tor adaptation in BLGAN strategy by adding formulas fordealing with the condition that opponents may take sit-and-wait action in bargaining. To future verify the performanceof this improved strategy, three sets of experiments are de-signed and implemented. Experimental results show thatthe improved BLGAN strategy is as good as the Sit-and-Wait strategy and the strategy proposed by Brozostowskiet al. when dealing with opponents who concede consecu-tively and who use sit-and-wait in the bargaining process.Especially, the improved BLGAN strategy outperforms thecompared ones when bargaining with the chameleonic sell-ers who frequently change their offer strategies for anti-learning.

Although we design 231 kinds of fixed offering sellersand 55 chameleon sellers in the three sets of experiments, allthese sellers are constructed according to tactics proposedby Faratin et al. [16], which are time and behavior dependantones. Recently, Bahrammirzaee et al. [30] proposed anotherthree kinds of bargaining tactics that are adaptive to the en-vironment’s changes (i.e., reservation interval, time dead-line, and opponent’s behavior). When dealing with oppo-nents who adopt these tactics, the property of the improvedBLGAN strategy is worth studying in the future. Besides,there are some new bargaining strategies that have been pro-posed [6, 31] are proposed in recent months, therefore, thesuperiority of the improve BLGAN strategy when bargain-ing with agents who use these strategies needs to be verifiedin future work. Moreover, whether this strategy is suitablefor the situation where bargaining item’s value decays withtime is another problem that needs to be explored.

Acknowledgement This paper is supported by the Natural Sci-ence Foundation of China (Nos. 71240003, 71303140, 61170079,61202152), the Natural Science Foundation of Shandong Province(No. ZR2012FM003), the National key basic research and develop-ment plan (973) of China (No. 2012CB724106, No. ZR2013FM023)and the Shandong Provincial International Cooperation Program forExcellent Lectures of 2009.

References

1. Rosenschein JS, Zlotkin G (1994) Rules of encounter: design-ing conventions for automated negotiation among computers. MITPress, Cambridge, pp 1–171

2. Sim KM (2009) Unconventional negotiation: survey and new di-rections. In: Proceedings of the 9th international conference onelectronic business, pp 901–907

3. Sim KM (2010) Grid resource negotiation: survey and new direc-tions. IEEE Trans Syst Man Cybern, Part C, Appl Rev 40(3):245–257

4. Kersten G, Michalowski W, Szpakowicz S, Koperczak Z(1991) Restructurable representations of negotiation. Manag Sci37(10):1269–1290

5. Lomuscio A, Wooldridge M, Jennings NR (2003) A classificationscheme for negotiation in electronic commerce. Int J Group DecisNegotiation 12(1):31–56

6. Gwak JH, Sim KM (2013) A novel method for coevolving PS-optimizing negotiation strategies using improved diversity con-trolling EDAs. Appl Intell 38(3):384–417

7. Wu D, Baron O, Berman O (2009) Bargaining in competing sup-ply chains with uncertainty. Eur J Oper Res 197(2):548–556

8. Sim KM (2012) Agent-based cloud computing. IEEE Trans ServComput 5(4):564–577

9. Sim KM (2013) Complex and concurrent negotiations for multipleinterrelated e-markets. IEEE Trans Cybern 43(1):230–245

10. Sim KM (2010) Towards complex negotiation for cloud economy.Lect notes comput sci, vol 6104. Springer, Berlin, pp 395–406

11. Son S, Sim KM (2012) A price-timeslot negotiation for cloud ser-vice reservation. IEEE Trans Syst Man Cybern, Part B, Cybern42(3):713–728

12. Sim KM, Shi B (2010) Concurrent negotiation and coordinationfor controlling grid resource co-allocation. IEEE Trans Syst ManCybern, Part B, Cybern 40(2):753–766

S.-j. Ji et al.

13. Sim KM (2006) A survey of bargaining models for grid resourceallocation. ACM SIGECOM: E-Commerce Exch 5(5):22–32

14. Sim KM (2006) G-commerce, market-driven g-negotiation agentsand grid resource management. IEEE Trans Syst Man Cybern, PartB, Cybern 36(6):1381–1394

15. Sandholm T, Vulkan N (1999) Bargaining with deadlines. In: Pro-ceeding of the national conference on artificial intelligence, pp 44–51

16. Faratin P, Sierra C, Jennings NR (1998) Negotiation decision func-tions for autonomous agents. Robot Auton Syst 24(3–4):158–182

17. Hou CM (2004) Modeling agents behavior in automated negotia-tion. Technical report. KMI-TR-144, Open University

18. Lee FM, Li LH, Chen PH (2005) A study on dynamic bargainingstrategy under time constraints and with incomplete information.In: IEEE/WIC/ACM inter conf intelligent agent technology, pp640–645

19. Brozostowski J, Kowalczyk R (2006) Predicting partner’s behav-ior in agent negotiation. In: 5th international joint conference onautonomous agents and multiagent systems, pp 355–361

20. Sim KM, Guo Y, Shi B (2009) BLGAN: Bayesian learning andgenetic algorithm for supporting negotiation with incomplete in-formation. IEEE Trans Syst Man Cybern, Part B, Cybern 39(1–2):198–211

21. ANAC (2010) Automated negotiating agents competition, http://mmi.tudelft.nl/negotiation/index.php/Automated_Negotiating_Agents_Competition_(ANAC). Accessed 18 July 2013

22. Raiffa H (1982) The art and science of negotiation. Harvard Uni-versity Press, Cambridge, pp 1–300

23. Yang YP, Singhal S, Xu YJ (2009) Offer with choices and acceptwith delay: a win-win strategy model for agent based automatednegotiation. In: 30th international conference in information sys-tems. Paper 180

24. Malhotra D, Bazerman M (2008) Psychological influence in nego-tiation: an introduction long overdue. J Manag 34(3):509–531

25. Sun R (2009) Motivational representations within a computationalcognitive architecture. Cogn Comput 1(1):91–103

26. Tauber EM (1972) Why do people shop? J Mark 36(4):46–4927. Lewicki RJ, Litterer JA, Minton JM, Saunders DM (1994) Irwin

Negotiation, 2nd edn. Burr Ridge, pp 1–5028. Baumeister RF, Zhang L, Vohs KD (2004) Gossip as cultural

learning. Rev Gen Psychol 8:111–12129. Ji SJ, Leung HF (2010) An adaptive prediction-regret driven strat-

egy for bilateral bargaining. In: 22nd international conference ontools with artificial intelligence, vol 2, pp 11–14

30. Bahrammirzaee A, Chohra A, Madani K (2013) An adaptive ap-proach for decision making tactics in automated negotiation. ApplIntell, Published online: 20 April 2013

31. Gwak JH, Sim KM (2013) An augmented EDA with dynamic di-versity control and local neighborhood search for coevolution ofoptimal negotiation strategies. Appl Intell 38(4):600–619

Shu-juan Ji is an Associate Pro-fessor of the College of Informa-tion Science and Engineering inShandong University of Science andTechnology. She received her B.Sc.,M.Sc. and Ph.D. degrees in Com-puter software and Theory fromShandong University of Science andTechnology, Qingdao, P.R. China.

Chun-jin Zhang is an Engineer ofthe Center of Modern Educationin Shandong Shandong Universityof Science and Technology. He re-ceived his B.Sc. degree in Com-puter Science and Technology fromCollege of Information Science andEngineering at Shandong Univer-sity of Science and Technology, andreceived his M.Sc. degree in Con-trol Theory and Control Engineer-ing from College of Information andElectrical Engineering at ShandongUniversity of Science and Technol-ogy, Qingdao, P.R. China.

Kwang-Mong Sim received theB.Sc. degree (Hon) (summa cumlaude) from the University of Ot-tawa, and the M.Sc. and Ph.D. de-grees from the University of Cal-gary. He is the Medway Chair andProfessor of computer science at theUniversity of Kent. He served as aReferee for several national researchgrant councils including the Na-tional Science Foundation in USA.Prof. Sim is the Associate Editor-in-Chief of the Springer’s Applied In-telligence Journal, an Associate Ed-

itor of the IEEE TRANSACTIONS ON CYBERNETICS, and wasan Associate Editor of the IEEE TRANSACTIONS on SYSTEMS,MAN, AND CYBERNETICS, PART C. He is also the Guest Editorof five (IEEE) journal special issues in agent-based grid computingand automated negotiation, including a special issue on grid resourcemanagement in the IEEE Systems Journal.

Ho-fung Leung received his BScand MPhil degrees in ComputerScience from The Chinese Univer-sity of Hong Kong, and his PhDdegree and DIC (Diploma of Im-perial College) in Computing fromImperial College London. Now, heis a Professor and the Chairman ofthe Department of Computer Sci-ence and Engineering at The Chi-nese University of Hong Kong. Pro-fessor Leung was the chairpersonof ACM (Hong Kong Chapter) in1998. He is a Chartered Fellow ofthe BCS, a Fellow of the HKIE,

a Senior Member of both the ACM and the IEEE, and a full memberthe HKCS. He is a Chartered Engineer registered by the EngineeringCouncil.

http://mmi.tudelft.nl/negotiation/index.php/Automated_Negotiating_Agents_Competition_(ANAC)



A one-shot bargaining strategy for dealing with multifarious opponents

Documents

Transcript of A one-shot bargaining strategy for dealing with multifarious opponents