Proceedings - Georgia Institute of Technology · J´erˆome Buhl, Jacques Gautrais, Jean-Louis...

(Carl Anderson & Tucker Balch, Eds.)

Proceedings

Supported in part by a grant from the National Science Foundation Digital Societies Program

PROCEEDINGS

of the

2nd International Workshop on theMathematics and Algorithms of

Social Insects

Georgia Institute of Technology,Atlanta, GA 30332

December 15–17, 2003

Edited by Carl Anderson and Tucker Balch

Sponsored, in part, by the National Science Foundation

2nd International Workshop on the Mathematics and Algorithms of Social Insects

CONTENTS

PLENARY SPEAKERS

Ronald C. Arkin:Biologically inspired robot behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

Eric Bonabeau:Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Guy Theraulaz:Aggregation dynamics, pattern formation and collective decision-making in pre-social andsocial insects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

PAPERS

Carl Anderson:Linking Micro- to Macro-level Behavior in the Aggressor-Defender-Stalker Game . . . . . . . . . . . . . . . . . . . . . 9

Eric Bonabeau, Pablo Funes & Belinda Orme:Exploratory Design of Swarms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

Tim Brown:Modeling Behavioral Rules and Self-organization in New World Army Ant Swarms . . . . . . . . . . . . . . . . . 25

Jerome Buhl, Jacques Gautrais, Jean-Louis Deneubourg, Pascale Kuntz & Guy Theraulaz:Simple Rules of Growth Can Account for the Complexity of Tunnelling Networks in the antMessor sancta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33

Ivan D. Chase, Abhijit V. Deshmukh & Naga Krothapalli:How do Ants Decide Between Food Sources of Different Values? An evaluation of the CurrentExplanation and Associated Mathematical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Anna Dornhaus & Nigel R. Franks:Rules of decision making: trade-offs in collective house hunting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Adam Feldman & Tucker Balch:Automatic Identification of Bee Movement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

Chris Jones & Maja J Mataric:Towards a Multi-Robot Coordination Formalism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

Sanjay S. Joshi & Jeffrey C. Schank:Of Rats and Robots: a New Biorobotics Study of Norway Rat Pups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Franziska Klugl, Cornelia Triebig & Anna Dornhaus:Studying Task Allocation Mechanisms of Social Insects for Engineering Multi-Agent Systems . . . . . . . 75

Kristina Lerman:A Model of Adaptation in Collaborative Multi-Agent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Ling Li, Alcherio Martinoli & Yaser S. Abu-Mostafa:Diversity and Specialization in Collaborative Swarm Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Daniel Merkle & Martin Middendorf:Dynamic Polyethism in Social Insect Societies - a Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

— Page 1 —


Abubakr Muhammad & Magnus Egerstedt:Topology and Complexity of Formations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107

Sunil Nakrani & Craig Tovey:On Honey Bees and Dynamic Allocation in an Internet Server Colony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Daniel W. Palmer, Marc Kirschenbaum, Jon Murton, Ravi Vaidyanathan& Roger D. Quinn:

Development of Collective Control Architectures for Small Quadruped Robots Based onHuman Swarming Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Liviu A. Panait, Sean LukeEvolving Foraging Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Zhanna Reznikova & Boris Ryabko:In the Shadow of the Binary Tree: of Ants and Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Thomas Schmickl & Karl Crailsheim:Costs of Environmental Fluctuations and Benefits of Dynamic Decentralized ForagingDecisions in Honey Bees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145

Sam Scholes, Ana B. Sendova-Franks, Chris Melhuish & Matt Wilson:Evolution versus Engineering - The Collective Intelligence of Sorting; Size Matters . . . . . . . . . . . . . . . . . 153

Patrick Ulam & Tucker Balch:Niche Selection for Foraging Tasks in Multi-Robot Teams Using Reinforcement Learning. . . . . . . . . . .161

Ashish Umre & Ian Wakeman:Cost/Benefit: Information Dissemination in Distributed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

POSTERS

Carl Anderson & Nigel R. Franks:Teamwork in Animals, Robots, and Humans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175

Emma Despland:Regulation of Activity Patterns in a Social Caterpillar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

Anna Dornhaus, Franziska Klugl, Christoph Oechslein, Lars Chittka & Frank Puppe:Foraging Success, Recruitment Benefits and Spatial Resource Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 177

Nina Fefferman, Rebeca Rosengaus, Daniel Calleri, Marcio Pie & James Traniello:Modeling Disease Resistance Through Social Interactions in Termites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Root Gorelick, Susan M. Bertram, Peter R. Killeen & Jennifer H. Fewell:Using Normalized Mutual Entropy to Quantify Division of Labor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Justin Hayes:Evolving Swarm Intelligence Solutions for the Foraging Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

Michael Jones:Adapting Negative Feedback in Honey Bee Forager Allocation to Parallel LTL ViolationDiscovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Istvan Karsai, Gabor Balazsi & John W. Wenzel:Organization of Nest Construction via a Natural Substance: Models and Field Studies . . . . . . . . . . . . . 182

Oran Kittithreerapronchai & Carl Anderson:Do Ants Paint Trucks Better Than Chickens? Markets Versus Response Thresholds forDistributed Dynamic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Sean Luke, Gabriel Catalin Balan & Liviu Panait:MASON: A Java Multi-Agent Simulation Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

— Page 2 —


Dhruba Naug, Graham Davis & John Wenzel:The influence of group size and resource distribution on a group of central place foragers . . . . . . . . . . 185

Keith J. O’Hara:Navigation Networks: Biological Inspiration for Large-Scale Multi-Robot Navigation . . . . . . . . . . . . . . . 186

Liviu Alexandru Panait & Sean Luke:Ant Foraging Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Chris A. C. Parker & Hong Zhang:Implementing Collective Robotic Construction with Blind Bulldozing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

H. Van Dyke Parunak, Peter Weinstein, Sven Brueckner & John Sauter:Hybrid Stigmergy for Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

Kevin M. Passino:Modeling, Analysis, and Biomimicry of Honey Bee Distributed Decision Making . . . . . . . . . . . . . . . . . . . 190

Holger Scharpenberg & Robin F.A. Moritz:Retinue Behavior in Honeybee Colonies: Self-Organization or Pheromonal Control? . . . . . . . . . . . . . . . . 191

Ana B. Sendova-Franks:A Measure of Two-Dimensional Sortedness Based on Brood Sorting in Ants . . . . . . . . . . . . . . . . . . . . . . . 192

Dylan A. Shell & Maja J Mataric:On the Use of the Term “Stigmergy” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Notes (blank pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194–196

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

— Page 3 —

PLENARY SPEAKERS


Biologically inspired robot behavior

c ©R

.M

enze

l

Ronald C. Arkin

Mobile Robot Laboratory, College of Computing, Georgia Institute of Technology,Atlanta, GA 30332-0280

http://www.cc.gatech.edu/aimosaic/faculty/arkin/

E-mail: [email protected]

Abstract

The Georgia Tech Mobile Robot Laboratory has been studying the biological basis of behaviorregarding its application to robotics systems for over 15 years. This talk presents an overview ofrelevant research drawn from our schema-theoretic approach. Included are amphibian models ofdetour behavior, visuomotor control systems of the praying mantis, Tolman’s schematic sowbug,and most recently canine and human ethology as applied to Sony’s AIBO and QRIO. Many of theunderlying organizational principles have been extended to robotic teams as part of our researchfor DARPA, including formation control, communication-sensitive behavior, and user-friendlymechanisms for mission specification.

— Page 5 —


Swarm Intelligence

Eric Bonabeau

Icosystem Corporation,10 Fawcett St., Cambridge, MA 02138

http://www.icosystem.com


About the Speaker

Eric Bonabeau is one of the most active and visible proponents of complexity theory and swarmintelligence as a means of solving high-dimension, non linear problems and real-world bio-inspired applications. Trained as an engineer and physicist, he spent several years as the Intervalpostdoctoral fellow at the Santa Fe Institute (Santa Fe, NM). He is a prolific writer, having pub-lished more than 100 articles, many in top-ranking journals such as Science, Nature, ScientificAmerican and Proc. Natl. Acad. Sci. , and Harvard Business Review, and is Co-Editor-In-Chiefof Advances in Complex Systems. In addition, he has coauthored three books (all with anotherof our plenary speaks, Guy Theraulaz): Self-Organization in Biological Systems (2001; Prince-ton University Press), Swarm Intelligence: From Natural to Artificial Systems (1999, OxfordUniversity Press), and Intelligence collective (1994; Hermes, Paris). He is founder, chairmanand chief scientific officer of Icosystem Corporation, a research organization based in Cambridge,Massachusetts.He coauthors a paper starting on page 17 of these proceedings.

— Page 6 —


Aggregation dynamics, pattern formationand collective decision-making in pre-social

and social insects

Guy Theraulaz

Laboratoire d’ethologie et cognition animale,CNRS, ERS 2382, Universite Paul Sabatier,

118 route de narbonne, 31062 Toulouse Cedex 4, Francehttp://cognition.ups-tlse.fr/_guy/guy.html


About the Speaker

Guy Theraulaz, a research associate at the Centre National de la Recherche Scientifique (CNRS)in France, is based at the Research Center on Animal Cognition, Universite Paul Sabatier inToulouse, where he heads the Collective Intelligence in Social Insects and Artificial Systemsgroup. For many years, he has been a leading light in the field of swarm intelligence, pri-marily studying social insects but also on distributed algorithms, e.g. for collective robotics,directly inspired by nature. His research focusses on the understanding of a broad spectrumof fascinating collective behaviors by studying, quantifying and then modeling the individuallevel behaviors and interactions, thereby elucidating the mechanisms generating the emergent,group-level properties. He has published dozens of articles, many in top-ranked journals includ-ing Science, Nature, Scientific American and Proc. Natl. Acad. Sci., and has coauthored fourbooks: Self-Organization in Biological Systems (2001; Princeton University Press), Swarm Intel-ligence: From Natural to Artificial Systems (1999, Oxford University Press), Auto-organisationet Comportement (1997; Hermes, Paris) and Intelligence Collective (1994; Hermes, Paris). In1996, he was awarded the CNRS Bronze Medal for his scientific achievements.He coauthors a paper starting on page 33 of these proceedings.

— Page 7 —

PAPERS


Linking Micro- to Macro-level Behavior inthe Aggressor-Defender-Stalker Game †

Carl AndersonSchool of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA.

Current address: Icosystem Corporation, 10 Fawcett St., Cambridge, MA 02138, USA.E-mail: [email protected]

Abstract

In many multiagent systems, small changes in individual-level rules may lead to very largechanges at the group-level. This phenomenon is striking in the “aggressor-defender game,” asimple participative game in which each participant randomly selects two others from the group(A and B). In the aggresor game, everyone tries to position themselves so that A is alwaysbetween themselves and B. In the defender game everyone tries to position themselves betweenA and B. Despite these exceedingly simple rules and the seemingly small difference betweenthem, the two games exhibit very different dynamics. The aggressor game produces a highlydynamic group that rapidly expands over time whereas the defender game quickly collapses toa tight knot. I analyze these games and provide some insight as to how these two group levelbehaviors arise, thereby linking the micro- and macro-levels. I also introduce and analyse a new,related and simpler game, the “stalker game,” in which each participant selects and pursues asingle participant, and which also produces a collapsing group.

Keywords: swarming, self-organization, participative game, aggressor, defender, stalker

1 Introduction

In many multiagent systems, small changes in the rules and interactions at the individual level may leadto very large changes in group-level dynamics. The best way for students to appreciate this importantproperty of complex systems is to experience it. A simple, clear and fun demonstation is provided by the“agressor-defender” game. This is an extremely basic participative game dating at least to the FratelliTheater Group at the 1999 Embracing Complexity conference, and recently promoted and developed byBonabeau and colleagues (2002a,b; Bonabeau & Meyer, 2001; Bonabeau et al., 2003a,b; Funes et al., 2003).In the aggressor subgame, every participant selects two others at random from the group, say A and B, andattempts to position themselves so that A is always between themselves and B—imagine this as a defender Aprotecting you from an agressor B. In the defender subgame, everyone tries to position themselves betweenA and B—imagine this as you defending A against B. Desite these simple rules, and the seemingly smalldifferences between them, the two games exhibit very different dynamics. The aggressor game produces ahighly dynamic group that expands over time whereas the defender game quickly collapses to a tight knot—see http://www.icosystem.com/game.htm for an online demonstation (see also Bonabeau et al., 2003a;pp. 17–24, this volume).

The great challenge here (and in most other complex systems, essentially by definition) is to explainthe link between the micro-level and the macro-level. That is, how do the individual level rules, and theinteractions among those individuals and the environment, give rise to the far more complex, emergentproperties and system-level behavior? And, conversely, how does the group level dynamics feedback tothe lower level and affect those individuals? Taking an intuitive, geometrical approach, I provide some

† This paper is a condensed version of C. Anderson, in review, which provides a more detailed and in depthmathematical treatment, and is availbable upon request from the author.

— Page 9 —


significant insight as to these two group level patterns occur, thus linking micro- to macro-level behavior. Ialso introduce and analyse a related, simpler game which I term the “stalker game” in which each participantselects and pursues just a single participant and which also produces a collpasing group, as well as othermore surprising behavior.

2 Defender game

Rather than deal with a finite number of people in an ill-defined space, we assume a well organized, infinitenumber of participants, now points, uniformly distributed on a disk of unit radius and center C (Figure 1).Consider random points A, B and U on the disk and assume that the game rule is that U attempts to headto the midpoint M of a line joining A and B. When ∠MUC, denoted α, is less than π

2then U is moving

closer to the center (at least initially). If α ≥ π2, then U is heading away from the center. Showing that

E(α) ≤ π2

partially demonstrates that individuals tend towards the center.

C

A

M

B

Uα

Figure 1: The main components of the model: the focal individual (U),the complete set of participants (the disc with center C), and U ’s twoselections, A and B. In the defender game, U heads for M which is themidpoint of the line AB. Angle α represents U ’s deviation from the centerwhen heading to M .

The real goal here is to calculate the joint probability density function (hereafter, pdf) of M , therebymapping the initial distribution of participants, that is, uniformly distributed across a disk, to the firstiteration of the game. Such a mapping will reveal whether the game is expected to produce an initialexpansion or contraction. Unfortunately, however, writing down an integral covering all possible positions ofA, B and U does not easily yield a closed form solution for M ’s joint pdf. Thus, I develop a more geometricalapproach.

��

��

Ur

1C

ED U

Figure 2: The hatched area represents position of M in which U is movingaway from the center C. The non-shaded area of the circle is denotedT (rU).

The symmetry of the situation implies that the distribution must be rotationallly invariant. Therefore,we need only consider a distance, say rU ∈ [0, 1], from the center along a typical radius rather across a disk.For a particular rU , we take the line perpendicular to the radius at this point (that is, a line perpendicularto line CU). This line intersects the perimeter of the disk at positions D and E (Figure 2). The area beyondthis line (that is, between rU and 1) is the area in which α > π

2and the complementary area is where α ≤ π

2,

hereafter area T (rU). Importantly, for all positive rU , area T (rU ) ≥ 0.5 thereby indicating that movementtowards the center is favored. Further, because of the rotational invariance, this is true whatever the trueunderlying distribution of M . A special case of interest is if M were uniform on a disk, which would occur

— Page 10 —


if the game rule were “choose one individual from the group at random and pursue them.” Thus, I predictthat this game, which I term the “stalker game,” will also produce a collapsing group (see section 4).

2.1 Full joint pdf of M

As M ’s joint pdf is rotationally invariant we can calculate an unscaled “radial pdf” along a typical radius—this provides a relationship between the relative probabilty of U moving to that distance in the next iterationof the game. Then, this “pdf” is rotated about about center C to produce a sold of revolution whose volumeis used to normalize the whole joint pdf. First, we consider our focal M at rU = 0. Any point A on the diskwill have some complement B that has its midpoint as M . Thus, one can consider that there are π different,valid sets of {A, B} with midpoint M . At the boundary, rU = 1, there are no valid solutions as A mustcoincide with B. For intermediate rU , we have a more complex situation as we lose some symmetry. For agiven position on M , 0 < rU < 1, the region of possible positions of A for which we can find a B such thattheir midpoint is M is a pseduo-ellipse shown in figure 3—one can verify this by imagining a line centeredat M and rotating it through angle α ∈ [0, 2π).

·

·

C

rU = 0.7

α

Figure 3: Given point M = (rU , 0), here rU = 0.7, the thick solid linebounds the region for which each point A has a complement B such M isline AB’s midpoint.

Here, the “relative number of solutions,” denoted S (rU ), is twice the shaded area in figure 2:

S(rU ) = 2(π − T (rU)) = 2

(cos−1 (rU )− rU

√1− r2

U

), (1)

where rU ∈ [0, 1]. Thus, we have a function that gives us the relative heights of the joint pdf from rU = 0to rU = 1. Next, we integrate about the center to get a solid of revolution. Letting V represent this solid’svolume, then

V = 4π

∫ 1

0

rU


√1− r2

U

)drU =

π2

4. (2)

Therefore, in polar coordinates, the joint pdf of the first iteration of the defender game is

f(rU , θ) =8

π2


√1− r2

U

), (3)

where rU ∈ [0, 1] and θ ∈ [0, 2π). This is plotted in figure 4.

2.2 Conclusion

The above calculations show that, distributionally, the defender games essentially maps an initial uniformon a disk, i.e. a cylinder (radius 1, height 1/π), to a cone (radius 1, height 4/π). Clearly, after the firstiteration, our typical individual U is not only likely to be closer to the center anyway, but its selections, Aand B, will also be closer to the center. Thus, given the game rule, the distribution will condense even morein each subsequent iteration. While this is not a formal proof, it certainly provides significant insight as towhy the group collapses.

— Page 11 —


0.0 0.2 0.4 0.6 0.8 1.0

rU

f(ru)

01

π2

π3

π4

π

Figure 4: Normalized radial pdf (equation 3) for the defender game.

3 Aggressor game

In the aggressor game, U tries to position him/herself so that A is between U and B. Sticking with Mrepresenting U ’s intended position, I assume that A is the midpoint between M and B (Figure 3; thisassumption is relaxed in another paper: C. Anderson, in prep.), in other words, that the game rule is‖M −A‖ = ‖A−B‖.

��

��

D

U

EU

C

B

M

A

αr Figure 5: The aggressor model in which A is midway between B and M .M is no longer confined to the unit disk and therefore U is heading awayfrom C whenever M is in the region above the line DUE (and henceα > π

2).

In this game, the joint pdf of M is not confined to the initial disk of unit radius. In the extreme,with A and B lying opposite each other on the perimeter of the disk, and because of the condition that‖M − A‖ = ‖A − B‖, then M may occur in a disk up to three units radius from C (hence, after eachiteration, the bounding area of M ’s joint pdf expands nine times). Thus, although the same properties ofrotational invariance and T (rU) > 0.5 (that is, bias towards the center) hold, what is very different from theearlier defender scenario is that individuals are likely to overshoot the original unit disk. So, even thoughthere may some tendency towards the center and not outwards per se, the overshoot—that individuals mayhead through the center and head out many units radius from the center—means that the group likely willexpand rapidly.

3.1 Full joint pdf of M

To obtain an explicit form for the joint pdf, I adopt the same geometrical consideration as for the defendergame: obtain a radial pdf representing the relative probability of finding M at a given distance, here with

— Page 12 —


rU ∈ [0, 3), and then normalize it with a solid of revolution. For rU ∈ [0, 1] any point B on the disk hasa complement A for any point M on the disk. Therefore, the joint pdf is flat for rU ∈ [0, 1] with relativeheight π. For rU = 3, there is a single solution: A on the unit disk’s perimeter at the point that is closestto M , with B on the exact opposite of the unit disk’s perimeter. As before, we can consider this as zero“solutions.” Finally, we must consider the more tricky intermediate case 1 < rU < 3.

Consider rU along the positive horizontal axis. We are interested in pairs of points on the disk thatsatisfy two conditions: first, that a line passing through those points also passes through M = (rU , 0), andsecond that ‖B − A‖ = ‖A−M‖ (where B is farther from M than A). For this horizontal line, the criticalx-value is 2− rU . That is, if B = (2− rU , 0) then A must be on the margin of the disk, A = (1, 0), so that‖B−A‖ = ‖A−M‖ which in this case equals rU − 1 (See figure 3.1). I call this the critical value because ifB were any larger, A would have to be outside the disk. Thus, this critical value delineates the invalid setof solutions, to the right, and the valid ones to the left—whose area is the relative height of the pdf.

0 rU1−1

A ·B · ·

·

·M

Figure 6: The unit disk and point (rU , 0). The dashed arc delin-eates valid solutions: to the right of this arc it is not possible tosatisfy ‖B−A‖ = ‖A−M‖ with A remaining on the disk. There-fore, the relative height of the pdf for a given rU is the area withinthe unit disk but to the left of the arc.

The mathematics to calculate the area to the left of this arc is straightforward geometry but there isinsufficient space here to provide details (but see C. Anderson, in review). The result, however, is that therelative number of solutions for rU ∈ (1, 3) is

S(rU ) = 2

∫ (3−r2U )/2rU

−1

√1− x2dx (4)

+ 2

∫ 2−rU

(3−r2u)/2rU

√(rU − x)2

√4− r2 − 2rUx− x2

2(rUx− 1)

(r −

√(r2

U + rUx− 2)2

(rU − x)2

)dx.

Unfortunately, this integral is very messy. However, we can find this relative number of solutions forrU ∈ (1, 3) numerically and add in the simpler result for rU ∈ [0, 1] to obtain an unnormalized radial pdf.Finally, as before, we find the volume of the solid of revolution to normalize the curve so that it truly is apdf:

V = π2 + 2π

∫ 3

1

rUS(rU )drU (5)

≈ 4π2 to 7 decimal places.

(The first term, π2, is the volume from rU ∈ [0, 1].)Therefore, in polar coordinates, the joint pdf for the first iteration of the aggressor game is

f(rU , θ) =S(rU )

≈ 4π2. (6)

which is plotted in figure 7.

— Page 13 —


0.0 0.5 1.0 1.5 2.0 2.5 3.0

rU

f(ru)

01

16π

18π

316

π1

4π

Figure 7: The normalized radial pdf of the aggressor game.

Table 1: Expected radial distances for three distributions: the initial uniform and the first iteration of the defenderand aggressor games. The value for the defender is less than that for uniform, and hence the group collapses,whereas the opposite is true for the aggressor game.

Distribution Expected radial distance

Uniform on a disk 1/√

2 = 0.707Defender game 128/90π = 0.452Aggressor game 1.457

3.2 Conclusions

In this game, the aggressor game maps an initial uniform on a disk (cylinder radius 1, height 1/π), toessentially a frustrum of a cone (radius 3, height of cone 4π/3, height to cutoff of frustrum 1/4π). Onaverage, an individual will end up farther from the center than from a uniform on a disk distribution,and may even end up as far as 3 units distance from C. Additionals calculations (not shown, but see C.Anderson, in review) provide the expected radial distance for the three scenarios (Table 1). As would beexpected from our above calculations, not only are the expected radial distances are ordered defdender <uniform < aggressor, but the expected value for the aggressor game is especially high, thus implying rapidexpansion.

4 Stalker game

Earlier, I defined the new “stalker game” as one in which each participant U selected and pursued a singleparticipant from the group. Despite being so simple, these games (technically there are two forms: chaseA or chase B), are surprisingly complex. While it is true that T (rU ) > 0.5 for rU ∈ [0, 1]—implying groupcollapse—it is also true that the joint pdf will be uniform on a disk—implying group stability. The joint pdfis easy to calculate, either by integration or by the same logic used in the other two games: for any point Mon the disk, any point A (or B) on the disk satisfies the game rule, implying π “solutions” for rU ∈ [0, 1].Thus, the unscaled radial pdf is π and the joint pdf f(r, θ) is simply 1/π for r ∈ [0, 1] and θ ∈ [0, 2π).

If there is a one to one mapping between the initial uniform distribution on a disk and the same uniformdistribution at the end of the iteration, does the game really generate group collapse and if so, how? The

— Page 14 —


0 10 20 30 40 50

0.0

0.5

1.0

1.5

2.0

2.5

3.0

iteration number

MC

P a

rea

of g

roup

Figure 8: Stalker game. Each curve is a separate simulation of 100 participants and plots area of minimum convexpolygon of group versus iteration number.Clearly, there is an initial collapse (complete collapse in 2 of the 20replicates) and stability thereafter. In these simulations, individuals jump to M (see text).

answer is an interesting yes, it collpases, and yes, it is stable. Figure 8 plots minimum convex polyon (MCP:the smallest convex polygon that encompasses a set of locations in 2D space) area versus iteration numberfor 20 replicates. There is distinct initial collapse for all replicates (completely so for 2 cases) but thenstability thereafter. How can we explain these strange results?

There are two different answers depending upon an assumption or rather implementation detail of thegame. I refer to the rate at which individuals reassess their heading. Real participants of the aggressor-defender games continuously adjust their heading to account for the movements of their A and B whereasthose of the simulation in figure 8, each participant essentially jumps to their intended location M simulta-neously.

First, let us consider continuous reassessment. Suppose we have a finite group of participants; they eachselect their A (which is also their M), correct the heading, and a fraction of a second after they set off, wefreeze frame the game. What has happened to the size of the group? Our earlier result T (rU ) > 0.5 tells usthat individuals are more likely to head (roughly) towards the center of the disk than away. Therefore, ourfreeze frame image will show a group size smaller than the unit disk. Now we unfreeze the game and askthe participants to readjust their heading and set off again. No one will be heading outside this group, onlywithin this reduced disk and the distribution of headings across this reduced disk should be rotationallyinvariant (on average). Thus the same results holds true on this smaller disk: T (rU ) > 0.5. As such, whenthe reassessment rate is high, the group dispersion gets smaller each iteration (but see below).

Now, we contemplate the other, more theoretical, extreme: instantaneous and simultaneous jumping fromcurrent location to intended location M . In this situation, we have a finite and fixed set of locations. Apartfrom the initial random locations and the random “partner” choices (A), this is a completely deterministicsystem. Thus, it may end up chaotic, in which case the MCP area of the group might fluctuate aperiodically,or it may become periodic, or come to a fixed point solution. In figure 8 we observe the latter two outcomes(a chaotic system that produces exactly the same MCP area each iteration is highly unlikely). In the twocases that collapse completely, this is the fixed point solution: all participants arrive at the same location(MCP area = 0). In the other 18 cases with stable MCP area, it seems that periodic behavior has arisen.This would imply that each participant moves around a circuit of locations. More detailed investigations(not shown) reveal that this is precisely what happens. (One problem with this scheme is that there is ahigh probability of fragmentation, that is, separate groups of participants chasing each other.) Why thoughshould the system always collapse a little? A location at the perimeter would fail to be jumped onto if no-onehad selected (= chasing) the participant at that location. For n participants, this probability is (n−2

n−1)n−1.

This probability quickly approaches its limit 1/e, which is reasonably large. Thus, even for small n almostcertainly some of the outer particpants fail to be selected; at the first jump they move inwards and the

— Page 15 —


locations they leave behind are never jumped upon, and so the area of the group reduces.

5 Conclusions

My goal in this study was to provide some insight as to the link between the micro-level, that is, theindividual level rules, and the macro-level, the group dynamics, of two simple spatial participative games.My geometrical approach which yields some closed form solutions, also leads to a new third game, thestalker game, which despite being incredibly simple also exhibits some rather interesting, and at first sightcounterintuitive, behavior. In each case, we are able to provide some mapping between the initial distributionof participants and the joint pdf of their distribution after the first iteration. With such simple games, wecan further argue (although I have not, unfortunately, yet provided formal proof) that similar mappings andfeedback mechanisms will occur in subsequent iterations.

It seems that 2 pieces of information can hint strongly at what sort of behavior one might expect fromthe group. First, the probability of moving towards C, T (rU). Although both aggressor and defendergames yielded probabilities greater than 0.5, this need not always be the case. For instance, a “repulsive”anti-stalker game whereby U move to be say three times as far from A as you currently are, might yielda probability of moving towards C much less than 0.5. Second, the expected radial distance. This metricindicates how far from C, on average, participants end up after this first iteration and Table shows significantdifferences among the three games. With an expected radial distance much greater than 1, as in the aggressorgame, it is not surprising that such a rapid rate of expansion occurs. This analysis therefore, might providesome hope of understanding the micro-macro level mappings in other, simple self-organized systems.

Acknowledgements

I am grateful to the Anderson/Interface Visiting Assistant Professorship in Natural Systems at ISyE for theoppotunity to pursue this research. Thanks also to the students of courses ISyE 8800C and CS 8803 atGeorgia Tech who showed me in vivo that the aggressor-defender game really does work.

References

Bonabeau, E. 2002a. Agent-based modeling: methods and techniques for simulating human systems.Proceedings of the National Academy of Sciences 99: 7280–7287.

Bonabeau, E. 2002b. Predicting the unpredictable. Harvard Business Review 3: 109–116.

Bonabeau, E. & Meyer, C. 2001. Swarm intelligence. A whole new way to think about business. HarvardBusiness Review 5: 107–114.

Bonabeau, E., Funes, P., & Orme, B. (2003a). Exploratory Design Of Swarms. Pages 17– 24 of Proceed-ings of the Second International Workshop on the Mathematics and Algorithms of Social Insects (C.Anderson & T. Balch, eds.), Georgia Institute of Technology.

Bonabeau, E., Hunt, C.W., & Gaudiano P. (2003b) Agent-based modeling and designing novel decentralizedcommand and control systems paradigms. Presented under the Modeling and Simulation and Network-Centric Application Topics for the 8th International Command and Control Research and TechnologySymposium, June 17–19, 2003, National Defense University, Washington, DC.

Funes, P., Orme, B. & Bonabeau, E. 2003. Evolving emergent group behaviors for simpe humans agents.Pages 76-89 in: 7th European Conference on Articial Life (ECAL 2003): Workshop and Tutorials (P.Dittrich, J.T. Kim, eds.), Dortmund, 14-17 September, 2003.

— Page 16 —


Exploratory Design Of SwarmsEric Bonabeau, Pablo Funes, Belinda Orme

Icosystem Corporation, 10 Fawcett Street, Cambridge, MA 02138, USA.Corresponding author : [email protected]

Abstract

In order to fulfill the true promise of swarm intelligence, and more generally of decentralized, self-organizing intelligence, a major design problem has to be overcome. Designing the individual-level rules of behavior and interaction that will produce a desired collective pattern in a group ofhuman or non-human agents is difficult because the group’s aggregate-level behavior may not beeasy to predict or infer from the individuals’ rules. While the forward mapping from micro-rulesto macro-behavior in self-organizing systems can be reconstructed using computational modelingtechniques, the inverse problem of finding micro-rules that produce interesting macro-behaviorposes significant challenges, all the more as what constitutes “interesting” macro-behavior maynot be known ahead of time. An exploratory design method is described in this paper. It relieson interactive evolution. We show how it can be used to discover new, “interesting” patterns ofcollective behavior when one does not know in advance what the system is capable of doing, ageneric situation in swarm design.

Keywords: collective behavior, swarms, design, interactive evolution

1 Introduction

Designing the individual-level rules of behavior and interaction that will produce a desired collective patternin a group of human or non-human agents is difficult because the group’s aggregate-level behavior maynot be easy to predict or infer from the individuals’ rules. For example, the aggregate-level properties ofa traffic jam (Helbing et al., 2000), a crowd evacuating a public space (Still, 1993) or the stock market(Palmer et al., 1994) cannot easily be derived from knowing the rules of behavior and interaction of drivers,people or investors. Agent-based modeling (ABM) (Reynolds, 1987; Epstein & Axtell, 1996; Axelrod,1997; Bonabeau, 2000, 2002), or micro-simulation, is often the only way to capture the emergent propertiesresulting from the behavior and interactions of the group’s constituent units or agents, particularly but notonly when the individual-level rules are discrete. While ABM is useful in producing aggregate-level patternsfrom individual-level rules, finding the appropriate rules still requires manual search and tinkering when (1)the collective-level patterns may be difficult to formalize into a mathematical detector and therefore theevaluation of a solution cannot be automated, and/or (2) the collective-level patterns made possible by theindividual-level rules are not even known ahead of time. If the desired collective-level pattern is known andits detection can be automated, then traditional search and optimization algorithms can be utilized. Thispaper focuses on cases where (1) and/or (2) is true. We show that in such cases a technique originallydeveloped to generate “interesting” images and pieces of art (Dawkins, 1987; Sims, 1991, 1992, 1993) can beused to design the individual-level rules of behavior and interaction to produce “interesting” collective-levelpatterns.

The technique (see Takagi, 2001 for a review) is a directed search evolutionary algorithm which requireshuman input to evaluate the fitness of a collective-level pattern (here, the fitness might be how close thecollective-level pattern is to the desired pattern, or how interesting the pattern is) and uses common evo-lutionary operators such as mutation and crossover (Forrest, 1993) to breed the individual-level rules thatproduced the fittest collective-level patterns. Using a simple example of a human game that can be playedin small groups (Bonabeau, 2002; Bonabeau et al., 2003; Funes et al., 2003), we show that this approachis particularly powerful as an exploratory design technique, when the aggregate-level capabilities of the

— Page 17 —


system are not known. Interactive evolutionary computation (IEC), as this technique is known, combinescomputational search with human evaluation (Takagi, 2001).

Some of the individual-level rules discovered using IEC are presented together with their correspondingstriking, unexpected patterns. In itself, the discovery of the rules is important as it shows that it is possibleto design simple rules to produce robust (genetic robustness being a by-product of the evolutionary method:to be discovered, a fitness peak has to be reasonably stable under a number mutations), collective-levelpatterns; in addition, the IEC technique used to discover these rules is very generic and makes it possibleto systematically discover novel phenomena in self-organizing systems.

2 The aggressor-defender game

To illustrate the approach, a game of aggressors and defenders is used. Two rules can be used by the players:

• Rule #1: Pick two people A and B in the group. Then start moving in such a way as to always putB (your defender) between yourself and A (your aggressor).

• Rule #2: Pick two people A and B in the group. Then start moving in such a way as to always putyourself between A (the aggressor) and B (the defendee).

If for example everyone in the group follows Rule #1 and picks A and B randomly, the resulting collective-level pattern is chaotic, sometimes room-filling motion across the room, often constrained by the walls, whichleads to some wall following. If on the other hand everyone in the group follows Rule #2 and picks A and Brandomly, the whole group collapses onto a single cluster. These two simple versions of the game can easilybe played with a group of 8 or more people (our experience includes playing the game with up to 400 people)and produce aggregate-level patterns that cannot easily be predicted from an examination of the individualrules. For example, in both versions there is exactly the same number of aggressors, defenders, defendees.While it is difficult to predict the outcome of even the simplest versions of the game, Anderson (2003) [p. 9,this volume] has established elements of a mathematical proof for the two situations described above (allparticipants following the same rule and picking their A’s and B’s randomly). A complementary approachconsists of simulating the mapping from micro-rules to macro-behavior through agent-based modeling (ABM)(Epstein & Axtell, 1996; Bonabeau, 2002). Figure 1 illustrates the patterns observed in the two simple casesdescribed above when simulated using agent-based modeling. See Funes et al. (2003) for more detail on theABM.

Figure 1: Micro-rules #1 (a) and #2 (b) lead to two dramatically different collective behavior in reality and inthe predictive agent-based simulation.

— Page 18 —


In some situations other than the two simple ones described above, we discovered by playing the game livethat the aggregate-level pattern could be quite different from the above-mentioned patterns. For example, ifby chance everyone picks the same person as aggressor or defender of defendee, the resulting aggregate-levelpattern is very different. In other situations the relationship graph has several disconnected components(although the probability of observing more than one component if participants pick A and B uniformlyrandomly tends to 0 as n−3; C. Anderson, unpubl. ms), leading to the formation of several clusters. Afterthese phenomena were observed in vivo, the corresponding micro-rules were simulated using ABM and couldbe re-created in silico, proving the predictive power of ABM. The observation of these unexpected collectivepatterns of behavior prompted us to ask the following question: would it be possible to design social networks(characterized by relationship graphs: who interacts with whom and what is the nature of the interaction?),instead of creating random ones, to produce interesting aggregate-level patterns, with no a priori knowledgeof what interesting patterns this system could produce?

More precisely, every individual is characterized by the following set of rules or properties:

• Does the individual follow Rule #1 or Rule #2?

• Who is A?

• Who is B?

The size of the space of relationship graphs (which we will also call rule space) grows fast with the numberof individuals N. While rule space is large, it is likely that most resulting collective patterns will either berandom-looking or reducible to one of the two basic patterns described above (chaotic behavior or clustering).However we do not know what such a system can or cannot do, that is, we do not know what kind ofcollective-level patterns to expect other than chaotic motion and clustering. In Section 3 we describe amethod to search for the relationship graphs, or individual-level rules, that will produce the most interestingcollective-level patterns with only a vague notion of what interesting means: neither random nor reducibleto one of the two basic patterns.

3 Interactive evolution

3.1 Search mechanism

The IEC search method works as follows. A small initial population of relationship graphs is generated. Theresulting collective-level dynamical patterns are generated using an agent-based simulation and presentedto a human observer. The observer selects the collective-level patterns that are the most interesting—the fittest individuals in the population according to whatever set of objective and subjective criteria theobserver may be using. A new population (new generation) of relationship graphs is generated by applyingmutation and crossover operators to the relationship graphs that correspond to the previous generation’sfittest collective-level patterns (Forrest, 1993). The new population is then simulated and the resultingcollective-level patterns presented to the observer, and so forth. This procedure is iterated until interestingpatterns emerge from the search.

The user interface, which is a critical component of the method as it is based on visualizing the solutionthat the observer evaluates solutions, is shown in figure 2. Obviously this method can only work if thepopulation size is kept small and if interesting patterns emerge after a reasonably small number of gener-ations. Whether or not the method can work depends on the nature of the fitness landscape and on thedimensionality of the problem. For example, a highly rugged fitness landscape or a flat landscape is unlikelyto lend itself to this type of search method (Theraulaz & Bonabeau, 1995; Bonabeau et al., 2000).

3.2 Details of the evolutionary algorithm

The evolutionary algorithm used at every generation replaces the current generation of games be eitherrecombination or mutation of the selected game. The notation we use to pertain to a given generation is

— Page 19 —


Figure 2: User interface with six playgrounds. Clicking on the playground that displays the user’s preferredbehavior results in the remaining five becoming mutants or recombinations of the selected one, as described insection 3.2.

the subscript notation such that game i in generation n and n + 1 is given by in and in+1 respectively.

Selection of the fittest game

One game, jn, is selected to be preserved in generation n+1 from the previous generation, n. The remaininggames, in+1 (i �= j), will consist of agents who are derived from either a recombination or a mutation ofsome the agents in the original game, jn. This implies mutation and recombination occur with a probabilityof 0.5.(For example: If we are evolving 9 games, on average: 1 will be identical to the game chosen in the previousgeneration; 4 will have some agents in their population altered by recombination; and 4 will have someagents in their population altered by mutation.)

Recombination

If the rules for game in+1 are to be created from a recombination of the rules describing agents in game jn,then up to 10% of the total agents in game in+1 are derived in the following manner. A typical agent selectedto be changed is called the target agent p. A source agent, q, is chosen randomly from the population toprovide the rules to replace some of the rules in the target agent p. The rules which can be substituted fromq into p are any of

1. Person A,

2. Person B,

3. Rule of the game to be played (# 1 or #2).

Each person or agent in the game has their own set of these three rules. In recombination, while the sourceagent’s rules remain unchanged some of the target agent’s rules can be replaced by those from the sourceagent. The number of rules from p which are substituted for those in q is chosen randomly.

— Page 20 —


Mutation

If the rules for game in+1 are to be created from a mutation of the agents’ rules in game jn, then up to 10%of the total number of agents in game in+1 are derived in the following manner. To mutate an agent’s rules,three possible mutations can occur based on the three rules for each of the agents. The evolution randomlychooses, with equal probability, to mutate either

1. Person A,

2. Person B,

3. Rule of the game to be played (# 1 or # 2).

Each mutation has a probability of one third of occurring.

4 Results

A number of patterns discovered using the IEC technique described above are presented in this section.Figure 3 shows several such patterns.

Figure 3: A few examples of evolved behaviors. (a) Circle: agents chase each other around in a circle. (b) Juggle:two blobs fuse and reemerge and sometimes toss a smaller blob at each other. (c) Corner-middle: two groups ofagents go to opposite corners while one stays in the middle. (d) Pursuer-evaders: an agent follows a larger groupthat slows down, is reached by the pursuer, then escapes again. (e) Chinese streamer: a D shape that movesaround. (f) Somersault: a thick line that makes a 360 degree turn, then stops, then turns back in the oppositedirection.

Figure 4 shows snapshots of the spatio-temporal dynamics of one of the patterns, the Chinese streamer(CS). The CS pattern is particularly interesting in that it is unexpected (it was impossible to predict thatthe system could display this type of behavior under the right relationship graph), robust to mutations

— Page 21 —


Figure 4: “Chinese streamer” pattern. From a random initial placement, a pattern quickly emerges (a-d) andstarts turning, stabilizing in a shape with a handle and trailing ribbon which rotates smoothly. The direction ofrotation can be clockwise or counterclockwise (as here), presumably depending on the initial positions.

Figure 5: Three rules designed for swarms of 10 agents, evolved using the IEC interface (a-c) and then givento a group of people (d-f). Rule “circle” (a,d makes all agents run around in a circle; rule “align” (b,e) madethem form a straight line and rule “Chinese streamer” resulted in a central cluster with a tail or “ribbon” circlingbehind.

— Page 22 —


(a by-product of evolutionary search) and totally insensitive to initial conditions (when the appropriaterelationship graph is in place, the Chinese streamer always forms regardless of the initial positions of theparticipants; initial conditions influence the rotation of the pattern, which can be clockwise or counter-clockwise depending on the details of the participants’ locations). Lastly, Figure 5 illustrates the ultimatereal-world test of the approach: when the rules discovered by applying IEC to the dynamic output of thesimulation are given to humans, the predicted patterns do emerge.

5 Conclusions

We have illustrated with a simple game example how to tackle the swarm design issue. Using this approachrequires a shift in mindset from the traditional top-down design approach: here, design is exploratory becausethe aggregate-level capabilities of the system are not known ahead of time. Some of the patterns discoveredin the simple game system were extremely surprising and could not be foreseen by any human engineer.Obviously the patterns discovered may sometimes be difficult to understand and need to reverse-engineered.For example the CS pattern has been reverse-engineered and is now well understood. This approach todesigning self-organizing systems has a wide range of applications, from collective robotics to distributedcontrol. One example: radio-frequency tags, known as RFIDs. Although RFIDs have recently become verypopular, most users intend to use them with a centralized mindset without knowing what a swarm of RFIDsmight be able to do collectively. Our exploratory design approach enables an open-minded search for thehidden capabilities of such a system.

References

Anderson, C. 2003. Linking micro- to macro-level behavior in the aggressor-defender-stalker game. Pages 9–16 in: Proceedings of the Second International Workshop on the Mathematics and Algorithms of SocialInsects (C. Anderson & T. Balch, eds.), Georgia Institute of Technology.

Axelrod, R. 1997. The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration.Princeton University Press, Princeton, NJ.

Bonabeau E. 2000. Business applications of agent-based simulation. Adv. Complex Syst. 3: 451–461

Bonabeau, E. 2002. Agent-based modeling: methods and techniques for simulating human systems. Proc.Nat. Acad. Sci. USA 99: 7280–7287

Bonabeau, E., Guerin, S., Snyers, D., Kuntz, P., Theraulaz, G. & Cogne, F. 2000. Complex three-dimensional architectures grown by simple agents: an exploration with a genetic algorithm. BioSys-tems 56: 13–32.

Bonabeau, E., Hunt, C.W., & Gaudiano P. 2003. Agent-based modeling and designing novel decentral-ized command and control systems paradigms. Presented under the “Modeling and Simulation andNetwork- Centric Application Topics” for the 8th International Command and Control Research andTechnology Symposium, June 17-19, 2003, National Defense University, Washington, DC.

Dawkins. R. 1987. The Blind Watchmaker. W. W. Norton, New York.

Epstein J. M., Axtell R. L. 1996. Growing artificial societies: social science from the bottom up. MITPress, Cambridge, MA.

Forrest, S. 1993. Genetic algorithms: Principles of adaptation applied to computation. Science 261:872–878.

Funes, P., Orme, B. & Bonabeau, E. 2003. Evolving emergent group behaviors for simpe humans agents.Pages 76-89 in: 7th European Conference on Articial Life (ECAL 2003): Workshop and Tutorials (P.Dittrich, J.T. Kim, eds.), Dortmund, 14-17 September, 2003.

— Page 23 —


Helbing, D., Farkas, I. & Vicsek, T. 2000. Simulating dynamical features of escape panic. Nature 407:487–490.

Palmer, R. G., Arthur, W. B., Holland, J. H., Le Baron, B. & Tayler, P. 1994. Artificial economic life: asimple model of a stockmarket, Physica D 75: 264–274.

Reynolds, C. 1987. Flocks, herds, and schools: a distributed behavioral model. Computer Graphics 21:25–34.

Sims, K. 1991. Artificial evolution for computer graphics. Computer Graphics 25: 319–328.

Sims, K. 1992. Interactive evolution of dynamical systems. Pages 171-178 in: Towards a Practice ofAutonomous Systems: Proceedings of the First European Conference on Artificial Life (F. J. Varela& P. Bourgine, eds.), MIT Press, Cambridge, MA.

Sims, K. 1993. Interactive evolution of equations for procedural models. Vis. Comput. 9: 446–476.

Still, K. G. 1993. New computer system can predict human behaviour response to building fires. Fire 84:40–41.

Takagi, H. 2001. Interactive evolutionary computation: fusion of the capabilities of EC optimization andhuman evaluation. Proc. IEEE 89: 1275–1296.

Theraulaz, G. & Bonabeau, E. 1995. Coordination in distributed building. Science 269: 686–689.

— Page 24 —


Modeling Behavioral Rules and Self-organizationin New World Army Ant Swarms

Tim BrownDepartment of biology, University of Utah, Salt Lake City, UT; E-mail: [email protected]

Abstract

Social insect colonies dynamically allocate individuals, resources and tasks to solve many com-putationally difficult problems without centralized control. System behavior appears out oflocal interactions between individuals with simple rule sets and no global knowledge. Socialinsect behavioral algorithms are typically robust to change, maintaining their effectiveness inhighly varied environments or resource distributions over a wide range of tasks. In this paper,I will discuss the results of my on-going work examining the behavioral mechanisms underlyingthe organization of swarming behavior in the New World army ant, Eciton burchelli. Resultsfrom analysis of digital video of army ant swarms collected in the field in Costa Rica show astrong inverse correlation between turning angle and pheromone concentration. Contrary toexpectation, ant velocity was shown to be higher for ants in the front of the swarm when thereis less pheromone. Average ant velocity and variance of velocity decrease as numbers of antsand cumulative pheromone increase. These data are being used to create an individual-based,continuous model of army ant swarming. The model will be used to examine how individualbehavioral rules generate the complex system-level organization exhibited by army ant swarms.

Keywords:Army ants, Eciton burchelli, modeling, behavior, self-organization, complexity

1 Introduction

The unique community structure of social insect colonies provides a model system for investigating the rolethat self-organization plays in the design of living systems. Self-organization is generally described as theprocess by which the behavior of a group of interacting components or agents generates complex system-levelbehaviors whose structure is not directly predictable from knowledge of component behaviors (Camazine etal., 2001). In social insect colonies, the collective actions of clearly distinct individuals with simple rule-setsand only limited knowledge of colony and environmental state can generate quite complex structural andbehavioral patterns at the colony level (e.g. Bonabeau et al., 1998b, Deneubourg et al., 1989, Goss et al.,1990, Beckers et al., 1992).

Eciton burchelli army ants are an ideal social insect system for examining self-organization. Colonies ofE. burchelli forage in massive swarms involving complex organizational processes and information sharingbetween hundreds of thousands of ants of multiple castes. Swarm behavior is not centrally organized butgenerated completely through individual interactions involving ant-ant contact and pheromone trails. Al-though E. burchelli ecology has been well described (e.g. Rettenmeyer, 1963, Franks, 1980, Gotwald, 1982,Schneirla, 1971) the exact behavioral rules through which army ants organize their raiding behavior are stillpoorly understood. Additionally, few attempts have been made to model army ant swarm behavior andexisting models incorporate only limited biology (Deneubourg et al., 1989, Sole et al., 2000. These modelswill henceforth be referred to as DS.)

This paper describes the results of on-going research to examine the behavioral rules through whichEciton burchelli army ants organize their swarms. Data on ant behavior are being collected from digitalvideo of army ant swarms taken in the field in Corcovado National Park, Costa Rica. An individual,continuous-space model of army ant swarming is being developed, extending the DS models. The modeloutputs data on individual ant behaviors (e.g. turn angle, velocity, etc.) permitting direct comparisonof model results with field data. Results from analysis of video data and preliminary model results will

— Page 25 —


be presented and their implications in understanding organization of E. burchelli swarm behavior will bediscussed.

The Study System: Each day, E. burchelli army ants efficiently distribute up to 200,000 nearly blindindividuals, covering an area of over 1,500 square meters in a 12–14 hour day. The ants capture, processand retrieve more than 30,000 mobile prey items while sharing only local information (Franks, 1989, Franks,1985, Franks and Fletcher, 1983). Prey can be hundreds of times larger than the ants themselves andmust be broken into pieces small enough to be transported to the nest by single ants or ants working inteams. Army ant swarms are robust to drastic changes in size —from a few thousand to a few hundredthousand individuals; and huge variations in substrate type. The substrate on which they forage can varyfrom completely clear ground to massive tree-fall areas hundreds of meters square and 5–10 meters deep.

Understanding how swarms are generated through local knowledge and feedback between individuals is afascinating biological question. It also has direct application for many non-biological questions particularlyin fields related to robotics, data mining and network routing (e.g. Parpinelli et al., 2002, Dorigo andGambardella, 1997, Wagner et al., 1999, Bonabeau et al., 1998a). However, non-biological applicationsare usually based on generalized metaphors of social insect behavior rather than a direct knowledge andapplication of the rules actually used by insects (Bonabeau et al., 2000, Bonabeau et al., 1999). Consequentlythere is a great need for quantitative research combining field data with modeling work to describe clearlyhow living systems generate complex structures and behaviors through simple rule systems.

Existing models: The two existing individual-based lattice models of army ant swarms (DS) permitteda preliminary examination of factors that affect swarm patterns in army ants. Both models suggested thatraiding patterns which were visually similar to characteristic swarm patterns of existing army ant speciescould be generated through variations in resource distribution, without changes to the behavioral parametervalues. This suggests that complex and varied behavioral rules may not be a prerequisite for the generationof army ant raids or species-specific differences in raiding pattern. While an effort was made to test themodel results in the field (Franks et al., 1991) individual army ant behavior has not yet been described inquantitative detail sufficient to permit rigorous testing of models. The existing models are limited by simplebehavioral algorithms for individual movement decisions resulting from the lack of quantitative data on thebehavior of individual ants in the swarm. Likewise, swarm dynamics have not yet been described in sufficientdetail to permit comparison with model results. This lack of data has also inhibited a detailed examinationof the relation between individual behaviors and swarm-level patterns.

Research Approach: A primary interest of this current work is to determine the minimal number ofrules required to reproduce observed behaviors and swarm patterns in army ants. In the past it is has beendifficult if not impossible to describe quantitatively the relation between these factors without a manageableway to track and follow individuals throughout the swarm. However, in the last decade, advances in videoand computer technology have made such data collection possible. An effective model of these must be ableto output data permitting quantitative comparison with actual army ant behavior. This permits the modelto be tested against real-world data while in development and, once completed, to make clear predictionswhich can be tested in the field.

I have developed computer software which permits manual tracking of individual ants in digital video ofE. burchelli army ant swarms taken in the field in Corcovado, Costa Rica. Data from these videos providea profile of ant turn angles, velocities, paths and ant densities within swarms. I have also developed anindividual-based continuous army ant swarm model, extending the DS models. The model is designed tocollect and output data on ant density, individual paths, turn-angles, velocity, capture success etc. thatmatches that collected from the swarm videos.

This paper presents the first quantitative descriptions of army ant behavior throughout the profile ofa swarm. I will also discuss the relation between turn-angle, density, pheromone concentration and antvelocity and how these factors may relate to foraging behavior and swarm patterns.

— Page 26 —


2 Methods

2.1 Field Data Collection

Digital video of Eciton burchelli army ant swarms was taken in the field at Sirena Field Station in CorcovadoNational Park, Costa Rica. Swarms were videotaped by placing the video camera on a tripod in the pathof an approaching swarm front prior to the arrival of the first ants. Video was recorded beginning withthe arrival of the first ants in the swarm until the densest part of the swarm had passed. The camera wasplaced approximately 46cm above the ground with the lens facing straight down, parallel to the ground. Thearea of ground captured by the camera measured 41.6 cm wide and 27.6 cm high. Video clips ranged fromabout 2 to 4.5 minutes long (≈3600 to 7500 frames). While the swarm was filmed, data were recorded onthe distance of the swarm front from the bivouac, time of day, temperature, humidity, and the approximatewidth and depth of the swarm.

2.2 Data Collection From Video

Ant paths and densities: Software was developed in Visual Basic permitting manual collection of coordinatedata for individual ant paths and of body angles and total numbers of ants in individual frames of video.Ideally, one would track the trajectory of every single ant in the swarm front as it passed below the camerabut this would require manually tracking the position of more than 1,600 ants. As an alternative to trackingevery individual, two methods of data collection were employed using the tracking software. First, pathsof selected individuals were tracked from the moment they arrived at the top of the screen until they leftthe screen or were lost. Second, the body angle and head and tail coordinate positions of every ant on thescreen was recorded in every hundredth frame throughout the entire video clip. This enabled measurementof ant density throughout the swarm front. Relative angles of individuals with respect to each other couldalso be collected from the these data. All output from the tracking program was analyzed using Matlab.

Pheromone concentration: Cumulative number of ants passing a given location was used as a metric forpheromone concentration.

2.3 Army Ant Swarm Model

An individual-based continuous model of army ant swarming was developed using Visual Basic. The currentimplementation of the model reproduces the DS models with improved temporal and spatial scales andallows implementation of alternative movement algorithms for comparison with those used by DS. Themodel outputs behavioral data for individual ants in the same format as that collected from the video data.

3 Results

Swarm video: Seven swarm fronts were videotaped. Due to the time-consuming nature of processing theswarm videos, only one video sequence has been analyzed; all results presented below refer to data fromthis swarm sequence. The video clip analyzed was 4 minutes 25 seconds long (7652 frames). Direction ofmovement of the swarm was from the top of the screen towards the bottom. Relative and absolute turnangles and ant velocities were calculated by sub-sampling the data at a chosen time-step (TS) between 0.5and 1 second (15-30 frames). For all analyses, an effort was made to chose a time-step that made the mostbiological sense while minimizing the effects of noise and maximizing resolution. Changing the time-stepwithin reasonable limits did not have a qualitative impact on results in any of the analyses.

Ant Paths (Figure 1): The paths of individual ants were tracked by marking the coordinate position ofthe ant’s head as it moved across the screen. 100 individual ant paths were followed. Ants chosen to betracked were picked at random from those entering at the top of the screen. Ant were followed until they left

— Page 27 —


Figure 1: Ant paths tracked (N = 100). Axis Coor-dinates are in pixels.

Figure 2: Density profile of swarm as measured bynumber of ants per frame.

the screen or could no longer be followed (e.g. disappeared under a leaf, etc.). At the beginning of the clip,every ant present was tracked until the number of ants present became prohibitively large at approximatelyframe 1500. After frame 1500, 3–4 ants at a time were tracked throughout the remainder of the video clip.Length of ant paths ranged from 24 frames (≈ 1 second) to 961 frames (30 seconds).

Ant positions, density and pheromone concentration: In every hundredth frame from frame 1 to frame7301, the head and tail positions of all ants in a frame were marked (N = 3855 ants; number ants/frame =0 to 250; Figure 2). These data will be referred to as the “density data.”

Turn angle and Pheromone: From the path data, relative turn angle was measured as the difference inangle between an ant’s trajectory in the first time-step and that of its trajectory in the next time-step (TS= 15). Absolute angle was measured as the angle of the ant relative to zero degrees. For convenience allangles were calculated such that an ant heading straight down (i.e. following the general direction of theswarm through the video) had a trajectory of zero degrees.

Relative turn angles (Figure 3) and variance of angles (not shown) was calculated for all ant paths.Absolute ant angle and variance of angles relative to each other was calculated from the density data (Figure4,5). Results from the density data were analyzed as the mean of angles in a given bin unless otherwisenoted. All measures of turn angle for both path and density data showed a decreasing trend (Figures 3,4,5).

Pheromone: To create a metric for pheromone concentration, the screen-area was broken into 1 cmsquares (“bins”) and pheromone concentration in each bin for each frame in the video was approximated bycounting the cumulative number of ants that had passed through a given bin. For ease of discussion, theterm “pheromone concentration” will be used to refer to this measure of cumulative number of ants havingpassed through a given area. It should be kept in mind that there is currently no other known method formeasuring pheromone in the field, and the current method only provides an approximation the pheromonequantity. This method ignores any variation there might be in types or quantities of pheromone dropped byindividuals, however, my results suggest that at least for these analyses, it is a reasonably accurate measureof pheromone concentration.

Density data were also used to examine the effects of pheromone independently of time by calculatingthe variance of angles between ants in a given bin with respect to the current pheromone concentration inthat bin (Figure 6). That is, individual absolute ant angles were are plotted by the amount of pheromonepresent in the bin they were in, independent of time.

Ant Velocity : Ant velocity was calculated from path data by measuring the distance traveled every 10frames throughout an entire ant’s path. Increasing or decreasing the frame sample size did not appear tohave significant effect on the results (frame number works as a good metric for position in the swarm because

— Page 28 —


Figure 3: Relative turn angle in radians from pathdata (N = 100 paths).

Figure 4: Absolute angles of ants in radians (N =3855 ants).

Figure 5: Variance of absolute angle in radians plot-ted with ant numbers per frame.

Figure 6: Variance of ant angle with respect topheromone.

— Page 29 —


Figure 7: Ant velocity grouped by frames. a) Mean ant velocity in cm/second, grouped by bin (bin size = 250frames). Numbers next to dots are numbers of paths per sample. b) Variance of mean velocity.

the swarm was passing under the camera). To examine changes in velocity with respect to position in theswarm, the video clip was broken into bins that were 250 frames long. Mean velocities for each path werecollected in each bin. Increasing or decreasing bin size did not have a significant impact on the results; thebin size of 250 frames was chosen as a compromise between minimizing noise and maximizing acuity. Themean velocity and variance of each bin was then calculated (Figure 7). Both ant velocity and variance ofvelocity begin low (although note N is only 3 ants), increase dramatically by the second bin (frames 251–500)and then mostly decrease throughout the remainder of the swarm passing.

Model Results: Model runs using the DS movement algorithms yielded results similar to those of DS.Model runs were also performed with an alternative and more simple algorithm where ant velocity wasdirectly determined by pheromone concentration. This algorithm was parameterized from field data of antsrunning on the trails (Couzin and Franks, 2003). This algorithm also yielded very similar raid patterns tothose demonstrated by the DS model. However, all of the algorithms showed a high and somewhat arbitrarysensitivity to initial conditions.

4 Discussion

One of the most fascinating features of the natural world, and particularly of self-organized systems is that ahigh level of behavioral complexity can often be generated from a relatively simple set of rules. In describingand modeling swarming behavior in army ants, a useful null hypothesis is that the majority of quantifiableant and swarm behavior can be generated with a minimal number of behavioral rules. This work is in part,an attempt to discover what the minimal set of rules are required for a model to reproduce as many aspectsof army ant swarm behavior as can be accurately measured.

Direct measurement of ant velocity and variance of velocity indicate that the pioneer ants at the frontof the swarm actually move more quickly than those in the densest part of the swarm. This is interestingbecause it is conventionally assumed that army ants move more slowly when there is less pheromone present.As long as E. burchelli behavior has been described in the literature, it has been assumed that ants at thefront of the swarm move more slowly and that this differential velocity plays a large part in organizing theswarm (e.g. Schneirla, 1940, Schneirla, 1934, Franks, 1985, Deneubourg et al., 1989 and others). Changesin forward velocity related to pheromone concentration were also an essential feature of both of the previousmodels of army ant swarming (SD).

From examination of the data presented above a more complete hypothesis of swarm organization begins

— Page 30 —


to emerge. In particular, it would appear that swarm organization is largely driven by variance in the sizeof turn angles driven by pheromone concentration. The turn angle of pioneers is very high compared to antsin areas of higher pheromone. This is conventionally explained by the lack of pheromone present, whichcauses ants at the front of the swarm to turn away frequently. What was unexpected was that these antsare actually running faster than those behind them. When pheromone is removed from a section of trunktrail, the ants that encounter the gap quickly search through the blank area until they re-find the oppositeside of the trail and a new pheromone trail is laid in to bridge the gap. Continuing with the hypothesis ofmaximum simplicity, my working null hypothesis for the behavior of pioneers at the swarm front is that anyarmy ant encountering an area without pheromone, immediately increases in agitation and begins searchingfor the lost trail. In other words, ant behavior at the front of the swarm is no different than anywhereelse ants might encounter a sudden lack of trail. An ant running forward suddenly crosses into an areawithout pheromone and the loss of trail elicits an increased level of excitation leading to a higher variancein running velocity and also an increase in probability of turning. The ant runs around in the blank areaslooking for pheromone while leaving a trail at the same time. As pheromone in the area increases, runningspeed rapidly become more constant and probability of turning decreases eventually reaching its minimum.This hypothesis could be tested by videotaping a well laid-in trail, removing a portion of the trail, and thentracking the paths of ants as they search for the lost trail.

Another interesting phenomenon can be seen in the variance of turn angle data (Figure 5). The variancereaches its lowest point when the number of ants peaks, and then rises again as the densest part of the swarmpasses and the number of ants begins to decrease. If this result is borne out by data from other swarms, itmay indicate that density effects begin to take precedent over turn angle in the densest part of the swarm(i.e. ants may have to turn less than they would have otherwise because there is no where for them to turn).

Future work will ground-truth the model by comparing model results with field data on swarm movement,trunk trail composition and foraging success. The model will also be extended to incorporate energy usedata from the literature (Bartholomew et al., 1988, Feener et al., 1988a, Feener et al., 1988b) to permit amore accurate measurement of colony fitness and foraging efficiency. With the inclusion of energetics it willbe possible to compare the fitness effects of variations in model parameter values and foraging success withreal-world data on swarm behavior.

Acknowledgements

Larry Gilbert for help with logistics and everything else in Corcovado; Fred Adler for support, input, adviceand programming help; Sylvia/Gardner Brown Sr. for additional funding.

References

Bartholomew, G. A., Lighton, J. R. B. and Feener, D. H., Jr. (1988) Physiol. Zool., 61, 57–68.

Beckers, R., Deneubourg, J. L. and Goss, S. (1992) Insect. Soc., 39, 59–72.

Bonabeau, E., Dorigo, M. and Theraulaz, G. (1999) Swarm Intelligence: From natural to artificial systems,Oxford University Press, New York.

Bonabeau, E., Dorigo, M. and Theraulaz, G. (2000) Nature (London), 406, 39–42.

Bonabeau, E., Henaux, F., Guerin, S., Snyers, D., Kuntz, P. and Theraulaz, G. (1998a) Lecture Notes inComputer Science, 1437, 60–?

Bonabeau, E., Theraulaz, G., Deneubourg, J. L., Franks, N., Rafelsberger, O., Joly, J. L. and Blanco, S.(1998b) Philosophical Transactions of the Royal Society of London - Serie B Biological Sciences, 353,1561–1576.

Camazine, S., Deneubourg, J.-L., Franks, N. R., Sneyd, J., Theraulaz, G. and Bonabeau, E. (2001) Self-organization in biological systems, Princeton University Press, Princeton, NJ.

— Page 31 —


Couzin, I. and Franks, N. R. (2003) Proc. R. Soc. Lond. B, 270, 139–146.

Deneubourg, J. L., Goss, S., Franks, N. and Pasteels, J. M. (1989) J. Insect Behav., 2, 719–725.

Dorigo, M. and Gambardella, L. M. (1997) Biosystems, 43, 73–81.

Feener, D. H., Jr., Lighton, J. R. B. and Bartholomew, G. A. (1988a) Funct. Ecol., 2, 509–520.

Feener, D. H., Jr., Lighton, J. R. B. and Bartholomew, G. A. (1988b) Proceedings of the 18th InternationalCongress of Entomology, p. 233.

Franks, N. R. (1980) Ph.D. dissert., The University of Leeds, Leeds, England.

Franks, N. R. (1985) In Experimental behavioral ecology and sociobiology: in memoriam Karl von Frisch,1886–1982, Vol. 31 (Ed, Lindauer, M.) Sinauer Associates, Sunderland, Mass., pp. 91–107.

Franks, N. R. (1989) Am. Scient., 77, 139–145.

Franks, N. R. and Fletcher, C. R. (1983) Behav. Ecol. Sociobiol., 12, 261–270.

Franks, N. R., Gomez, N., Goss, S. and Deneubourg, J. L. (1991) J. Insect Behav., 4, 583–607.

Goss, S., Beckers, R., Deneubourg, J. L., Aron, S. and Pasteels, J. M. (1990) In Behavioural Mechanismsof Food Selection, Vol. 20 (Ed, Hughes, R. N.) Springer-Verlag, Berlin, Heidelberg, pp. 661–678.

Gotwald, W. H., Jr. (1982) In Social insects. Volume 4(Ed, Hermann, H. R.) Academic Press, New York.385 p., pp. 157–254.

Parpinelli, R. S., Lopes, H. S. and Freitas, A. A. (2002) IEEE transactions on evolutionary computation,6, 321– 332.

Rettenmeyer, C. W. (1963) Univ. Kans. Sci. Bull., 44, 281–465.

Schneirla, T. C. (1934) Proc. Natl. Acad. Sci. U.S.A., 20, 316–321.

Schneirla, T. C. (1940) J. Comp. Psychol., 29, 401–460.

Schneirla, T. C. (1971) Army ants. A study in social organization. (Edited by H. R. Topoff.), W. H.Freeman & Co., San Francisco.

Sole, R. V., Bonabeau, E., Delgado, J., Fernandez, P. and Marin, J. (2000) Artificial Life, 6, 219–226.

Wagner, I. A., Lindenbaum, M. and Bruckstein, A. M. (1999) IEEE Transactions on Robotics and Au-tomation, 15.

— Page 32 —


Simple Rules of Growth Can Account for theComplexity of Tunnelling Networks in the ant

Messor sanctaJerome Buhl1, Jacques Gautrais 1, Jean-Louis Deneubourg2, Pascale Kuntz3, Guy Theraulaz1

1. Centre de Recherches sur la Cognition Animale, CNRS UMR 5169, Universite Paul Sabatier, 118route de Narbonne, 31062 Toulouse Cedex 4, France.Corresponding author : [email protected]

2. Center for Nonlinear Phenomena and Complex Systems, Universite Libre de Bruxelles, C.P. 231,Campus Plaine, B-1050 Brussels, Belgium

3. Ecole Polytechnique de l’Universite de Nantes, 2 rue de la Houssiniere, BP 92208, 44322 Nantes Cedex03, France

Abstract

The aim of this study was to link simple rules of tunnel growth to the emerging network topology.We studied networks that were produced by the workers of the ant Messor sancta over 3 daysin a standardized thin sand disk. The topology of each gallery network was characterized byspatial and graph analysis. We showed that these networks belong to a particular class in whichthe degree distribution is characterized by a power law. We then quantified the characteristicsof tunnel growth in terms of initiation, propagation and termination of new digging sites andobserved that tunnel growth can be well described by simple probabilistic laws. We showedthat a model that simulates the growth of tunnels using these simple laws can account for theemergence of several topological properties that we observed in our experimental networks. Wethus propose the use of such a methodology to extend this study to other networks growth.

Keywords: network topology, network growth, tunnelling networks, Messor sancta

1 Introduction

Many species of ants build their nest by excavation. A typical underground nest structure is composed ofa large number of chambers interconnected with a network of galleries (Brian, 1983; Cerdan, 1989; Delye,1971; Frisch, 1975; Rasse, 1999; Thome, 1972). Despite the importance of these structures, there are veryfew detailed descriptions of subterranean networks in the field, and their quantification is scarce (but seeCassill et al., 2002). Structure often affects function. One important question is whether the efficiency ofthe colony to perform certain tasks depends on the network topological organization and if natural selectionhas favoured some nest organization. For example, is there any relationship between the spatial organizationof a galleries network and its efficiency with regards to the traffic of ants inside the nest or its robustnessagainst disruptions that may occur in the network? Moreover, networks are the result of growth processes.A second major question is to understand what underlying mechanisms could account for the emergenceof network structures. In particular, can simple rules of tunnel growth lead to different forms of complexnetworks?

The aim of this study was to link the characteristics of tunnel growth to the emerging nest topologyat the colony level. We studied the growth of gallery networks that are produced by workers of the antMessor sancta in a standardized set-up. In these conditions, the structures are not submitted to strongheterogeneities of environment and it is possible to quantify the growth dynamics (Rasse, 1999; Rasseand Deneubourg, 2001). This set-up enabled us to study under well-controlled laboratory conditions (1)the topology of gallery network by spatial and graph analysis. As we had access to the complete growth

— Page 33 —


Figure 1: Examples of networks obtained after 3 days in (a) experiments, (b) PG simulation and (c) LG simulation.

history of these networks, our experimental set-up represents a unique opportunity to understand how aparticular topology emerges during growth processes (2) the growth characteristics of the tunnels, suchas initiations of new tunnels and trajectory characteristics (orientation and speed of the tunnel advance).We then developed a model in which rules of tunnel growth concerning tunnel initiation, propagation andtermination were implemented according to the parameters quantified in the experiments. This model wasthen used to test which set of rules was sufficient to reproduce the topological properties of the networksobserved at the collective level.

2 Material and methods

2.1 Experimental set-up

The general experimental set-up consisted of a sand disk of 20cm diameter and 5 mm height. We usedyellow sand (brusselian) of a very fine and homogenous granularity that was poured in a mould and thenhumidified by vaporised water (25 ml). The mould was then removed and the sand disk covered by a glassplate (25 cm × 25 cm). An arena (diameter=50 cm), its wall of being coated with fluon r©, was put aroundthe sand disk to prevent ants from escaping. Each experiment (N=19) lasted 3 days, began with the randomdispersal of a group of 200 ants around the disk, and was recorded from above with a high-resolution digitalcamera (SONY DCR-VX1000E) in a time-lapse mode.

2.2 Graph topology and spatial analysis

The data was acquired on 1 frame every 20 minutes with a software that allowed the identification of networkcomponents (by pointing and clicking). On each frame, we considered the network as a graph G(N,E) whereN was a set of nodes characterized by their (x, y) position, label and diameter, and E a set of edges thatlinked pair of nodes, characterized by their width and length. In this graph, edges corresponded to galleriesand nodes to intersections between galleries or between a gallery and the edge of the sand disk. A label wasassigned according to the way the node was constructed. “Peripheral germs” corresponded to the initiationof a new tunnel on the periphery of the sand disk, and “lateral germs” to all events of initiation of newtunnels inside the sand disk. If a lateral germ didn’t emerge at the position of an existing node, then a“lateral node” was created. A connected component represents a subset of the graph G, where there alwaysexists at least one path between each pair of nodes of this subset. To determine whether a network wascomposed of a main connected component or fragmented into several ones, we computed the ratio betweenthe number of nodes in the largest connected component Nl and the total number of nodes N . We will referto this ratio as the relative size of the largest connected component (Nl/N).

— Page 34 —


In spatial analysis, we counted the number of nodes observed inside a circle whose radius was madevarying from 0 to the radius of the sand disk. Peripheral germs were excluded. If the nodes are dispersedhomogeneously from the center to the periphery, then the number of nodes counted inside the circles willrelate in a linear way with the surface covered by these circles. In a log-log transformation of this relation,the slope obtained by a linear regression test, representing the exponent of the power-law thus tested, wouldbe equal to 1. An exponent value lower than 1 will indicate a tendency for nodes to be more frequentnear the center, while a value higher than 1 will indicate that nodes are more frequently distributed in theperiphery of the sand disk.

The degree of a node correspond the number of edges connected to it. When we determined the meandegree and the distribution of degrees for a network, we excluded peripheral germs. All statistical testswere realized with SPSS 10.0 for Windows. The T3 Dunnet test was used for post-hoc comparisons withα = 0.05.

2.3 Quantification of tunnel growth

In all experiments, we determined the evolution over the time of the total length of the network and of thenumber of peripheral and lateral germs. In the first 10 experiments, we determined the angle formed betweenthe direction of the 2 first centimeters of the tunnels growing from peripheral germs and the perpendicularto the tangent to the sand disk at the position of the peripheral germ considered (N=117 observations). Todetermine the mean speed of growth for the tunnels, the position of growing tunnels were determined duringthe first 15 hours of the experiments (N=60).

3 Networks topology: experimental results

The networks obtained after 3 days are small graphs in terms of nodes (N = 51 ± 19.9 SD) and edges(E = 60 ± 30 SD; see table 2). They are well connected in main connected component that included86.3 % (±11 SD) of the total number of nodes. As regards spatial distribution, nodes were distributed quitehomogeneously from the center to the periphery of the sand disk. Indeed, the exponent value δ obtainedwith the linear regression on the log-log transformation was close to 1 (r2 = 0.97; δ = 1.2± 0.12 SE).

The mean degree was 〈k〉 = 3.52 ± 0.22 SD. The maximal degree value observed was kmax = 8. Degreedistributions were characterized by a power law (fig. 2), with an exponent value γ = −5.195 ± 0.5 SE(r2 = 0.96).

4 A model of network emergence by simulating tunnel growth

4.1 Description of the model

In our model, a simulated network was described as a graph in the same way as in the experiments. Thetime unit was 1 minute and a simulation ended at 72h. We considered two types of tunnels, peripheral andlateral germs, that differed only in their initiation.

1. Initiation of new tunnels: a peripheral germ can start to grow at each minute according to a probabilityPg . It is initiated at a random position on the periphery of the sand disk, but must be distant atleast Dn centimeters from another one. Lateral germ can be initiated randomly along the edges witha probability Pl per unit of length and time. Its position is then randomly chosen along the edges,and if it is within the range (distance< Dn) of one the two nodes connected by the edge from whichit emerges, then it is considered to emerge from this node; else a new “lateral node” is created at itsposition and with a diameter Dn. Both types of tunnels start with an angle β that is generated froma Gaussian distribution characterized by a mean B and a standard deviation σ. B is the direction of

— Page 35 —


Figure 2: Distribution of degrees in experiments (�), PG simulation (♦) and LG simulation (•) in a linearrepresentation (a), and log-log plot (b). Regressions line are shown for the experiments (short dashed line), PGsimulation (plain line) and LG simulation (long dashed line). Examples of networks obtained after 3 days in (a)experiments, (b) PG simulation and (c) LG simulation.

the perpendicular to the tangent to the periphery of the sand disk for a peripheral germ and to thedirection of the edge where initiation took place for a lateral germ.

2. Tunnel growth: at each cycle, each active tunnel progresses with a constant speed S and keep itsorientation constant.

3. Termination of tunnel growth occurs when it intersects with an edge/node or with the periphery ofthe sand disk.

4.2 Quantification of the parameters

4.2.1 Initiation of peripheral germs

The number of peripheral germs increased with a maximal speed at the beginning of experiments and reacheda plateau after 3 days, which can be described by:

dGp/dt = bGmax(1−Gp/Gmax). (1)

where Gp represents the number of peripheral germs, Gmax the maximum number of peripheral germs thatcan be initiated, and b a constant. It thus comes that

Gp(t)/Gmax = 1− e−bGmaxt. (2)

Gmax was estimated by the mean number of peripheral germs observed at the end of the experiments(Gmax = 17.79 ± 6.6 SD). We validated this relation on all experiments by performing a linear regression(r2 = 0.97; b = 6.85.10−5 min−1 ) on the linearised form of relation (2). In the model, we will then considerthat there exists a set of Gmax potential peripheral germs, each one being able to start to grow at eachminute according to the probability Pg = bGmax = 0.0011 min−1.

— Page 36 —


4.2.2 Initiation of lateral germs

We tested a model in which our hypothesis was that the rate of initiating lateral germs Gl is proportionalto the total length L and a constant probability Pl (equation 4)

dGl/dt = PlL. (3)

To simplify, we will focus on the first 48h of the experiments, where the length was growing in a linearway (equation 5)

dL/dt = a. (4)

The constant a = 0.056 cm−1 min−1 was estimated by a linear regression between the mean total lengthof tunnels and time during the first 48h of experiments (r2 = 0.95). Coupling equation (4) and (5), it thuscomes that:

Gl(t) = (Pl/2a)L2(t). (5)

This relationship was validated by a non-linear regression test for a power law on all pairs (N=4123)of number of lateral germs and total length (r2 = 0.9; exponent λ = 2.041 ± 1.53.10−3 SE; Pl/2a =4.693 × 10−4 ± 3.8 × 10−5 SE; see fig. 3). In the model, we will then consider that on each centimeter ofeach edge can emerge at each minute a lateral germ with the probability Pl = 5.256 × 10−5 cm−1min−1.

4.2.3 Diameter of the nodes, orientation and speed of growth of the tunnels

The distribution of orientation of peripheral germs was Gaussian with a mean value α = −1.3◦ (very closeto the perpendicular to the tangent to the sand disk) and a standard deviation σ = 8.44 (KS test, N=117,Z=0.697, p=0.716). The distribution of the mean speed of growing tunnels was Gaussian with a mean valueS=5.1 mm/h ±2.98 SD (KS test, N=60, Z=1.119, p=0.16). The distribution of node diameter was unimodaland appeared to be non-Gaussian. In the model, we used the mean value Dn = 14.8 mm (N=558).

4.3 Networks topology: model results

We tested 2 conditions in our simulation, the first with Pl = 0 (only peripheral germs were initiated; PGsimulation) and the second with Pl = 5.256 × 10−5 min−1cm−1(both peripheral and lateral germs wereinitiated; LG simulation). For each condition, 100 realizations were performed.

4.3.1 Basic characteristics

The three groups were significantly different concerning the mean number of nodes and edges (table 2). BothPG and LG simulations leaded to smaller networks. In the LG simulation, lateral nodes were significantlylower than in experiments (t=4.22, p < 0.001). However, the mean number of lateral germs was notsignificantly different. This indicates that, in experiments, tunnels were more often initiated along thetunnel walls and far enough from existing nodes, thus leading to the emergence of new “lateral nodes”, whilein the model the tunnels were initiated more often from existing nodes.

The mean relative size of the largest connected component Nl/N was significantly lower than in theexperiments only in the PG simulation. In the LG simulation, this mean value appeared to be similar to theone observed in experiments, indicating that lateral germs help the formation of one main cluster of nodes.

— Page 37 —


Figure 3: Experimental and predicted relationship (see equation 6) between the number of lateral germs and thetotal length of the network.

Table 1: Model parameters and values.

Name Value Unit Description

Pg 0.0011 min−1 Probability of initiation of a peripheral germPl 5.256 × 10−5 min−1cm−1 Probability of initiation of a lateral germσ 8.44 ◦ Standard deviation of the angle of initiationS 5.1 mm.h−1 Mean speed of tunnel growthDn 14.8 Mm Mean node diameter

4.3.2 Spatial distribution of nodes

We applied the same log-log analysis than for the experimental data, and the linear regression gave anexponent δ value lower than 1 in the PG simulation (r2 = 0.99; δ = 0.728±0.001 SE) and the LG simulation(r2 = 0.99; δ = 0.825± 0.001SE). The low value of the exponent d obtained in the PG simulations indicatesthat the distribution of nodes tended to be biased toward the center. In the LG simulation, though the sametendency was observed, the nodes were more homogeneously distributed.

4.3.3 Node degree distribution

Mean degree was lower for both types of simulations than in experiments (table 2). Maximal degree appearedto be lower in the PG simulation (kmax = 6), while it was similar in experiments and in the LG simulation(kmax = 8). Already in the PG simulation, the distributions of degree showed a high similarity with theexperiments (fig.2): r2 obtained in linear regression on the log-log representation of the degree distributionwere high for both PG (r2 = 0.91) and LG (r2 = 0.98) simulations. Thus, the distribution is a power-lawin both cases. When only peripheral germs where simulated, the exponent value (γ = −8.02 ± 1.8 SE) wasclearly higher than in the experiments. The observed effect of lateral germs was to lower the value of thepower law exponent (γ = −6.32± 0.48 SE), that became close to the one observed in experiments.

— Page 38 —


Table 2: Basic characteristics of the graphs obtained in experimental and simulated networks (N : number ofnodes; E: number of edges; Nl/N : relative size of the largest connected component; 〈k〉 : mean degree; γ:power law exponent in degree distribution; * F statistic for all tests excepted for the lateral germs where thestatistic was F1,117; ± represent SD for all mean values, SE for γ)

Group N E Lateral Lateral Nl/N 〈k〉 γnodes germs

Experiments 51± 19 60± 30 13.5 ± 7 19.6± 12 0.86± 0.1 3.52 ± 0.2 −5.19± 0.5PG simulation 28.2±1 25.5 ± 2 – – 0.71± 0.2 3.31 ± 0.2 −8.02± 2LG simulation 37.4± 3 43.2 ± 6 6.2± 2 19.1 ± 5 0.89± 0.1 3.32 ± 0.1 −6.32± 0.5

ANOVA: F2,216* 134.4 195.8 – 0.2 31.7 13.9 –p < 0.001 < 0.001 – 0.647 < 0.001 < 0.001 –

5 Discussion

In this study, we have characterised the topology of tunnelling networks produced by the ant Messor sanctain a laboratory set-up. In these networks, the main part of the nodes was clustered into a main connectedcomponent, and spatial distribution appeared to be homogeneous. Degree distributions were characterizedby power law tails. In this study, several characteristics of tunnel growth were analysed. In particular, wehave shown that the initiation of new tunnels could be described by two constant probabilities, whether thetunnels were initiated on the periphery or inside the sand disk. These remarkable characteristics allowed usto design a model that used few simple rules for tunnel growth and that model was able to reproduce severalof the main topological characteristics of the networks that we observed in our experiments. In particular,we have shown that power laws observed in the degree distribution can emerge from simple process oftunnel growth in a finite space, even when the networks only result from new sites of tunnelling exclusivelyinitiated on the periphery of the sand disk. Lateral germs help to establish the emergence of a single clusterof connected nodes and tend to decrease the value of the exponent in the power law.

It has been shown that many large networks found in nature and engineered systems share strikingsimilarities regarding several of their topological properties. In particular, it has been observed that degreedistribution also follows a power-law in many networks (Albert and Barabasi, 2002; Barabasi and Albert,1999; Jeong et al., 2000) and such networks have been called “scale-free” networks. However, we know veryfew things about the underlying mechanisms that could account for these properties. In several models,the probability for a new node to connect to an existing one increases with this last node degree. Thisclass of model, called “preferential linking model” was able to reproduce power laws in degree distribution(Albert and Barabasi, 2002; Barabasi and Albert, 1999). However, no model previously took in accountan important phenomenon observed in many natural networks: the growth of elements that correspond to“edges” can contribute to the emergence of nodes. In our model, which was based on this phenomenon, weobserved that it is not necessary to implement any rule of preferential linking to observe emergence of powerlaw in the degree distribution.

We propose the use of this type of methodology to extend this study to other phenomenon of networkgrowth. Our model should be extended to other geometries than the disk. In particular, how do thetopological properties change when sites are initiated in a square, or when tunnels start to grow from thecenter and extend toward the periphery? Can we find invariant properties that are robust to these changesof geometry? What is the sensitivity of topological properties to the parameter values? Is it possible togenerate very different forms of networks with the same rules?

Acknowledgements

This research was supported by the Programme Cognitique. J Buhl was supported by a research grant of

— Page 39 —


the Ministere de l’Education Nationale, de la Recherche et de la Technologie. JL Deneubourg is researchassociate of the Belgian National Foundation for Scientific Research. We thank Vincent Fourcassie andPhilippe Rasse for many helpful discussions and comments.

References

Albert, R., and Barabasi, A. L. (2002). Statistical mechanics of complex networks. Rev. Mod. Phys.74(1): 47–97.

Barabasi, A. L., and Albert, R. (1999). Emergence of scaling in random networks. Science 286: 509–512.

Brian, M. (1983). Social insects : Ecology and Behavioural Biology. Chapman and Hall, London.

Cassill, D., Tschinkel, W. R., and Vinson, S. B. (2002). Nest complexity, group size and brood rearing inthe fire ant, Solenepsis invicta. Insect. Soc. 49: 158–163.

Cerdan, P. (1989). Etude de la biologie, de l’ecologie et du comportement des fourmis moissonneuses dugenre Messor (Hymenoptera, Formicidae) en Crau. PhD thesis, Univ. de Provence, Aix-Marseille I.

Delye, G. (1971). Observations sur le nid et le comportement constructeur de Messor arenarius. Insectessoc. 18: 15–20.

Frisch, K. von (1975). Animal architecture. Hutchinson, London.

Jeong, H., Tombor, B., Albert, R., Oltval, Z. N., and Barabasi, A. L. (2000). The large-scale organizationof metabolic networks. Nature 407: 651–654.

Rasse, P. (1999). Etude sur la regulation de la taille et sur la structuration du nid souterrain de la fourmiLasius niger. PhD thesis, Univ. Libre de Bruxelles, Bruxelles.

Rasse, P., and Deneubourg, J. L. (2001). Dynamics of Nest Excavation and Nest Size Regulation of LasiusNiger (Hymenoptera: Formicidae). J. Insect. Behav. 14: 433–449.

Thome, G. (1972). Le nid et le comportement de construction de la fourmi Messor ebenius, Forel (Hy-menoptera, Formicoıdea). Insect. Soc. 19: 95–103.

— Page 40 —


How do Ants Decide Between Food Sources ofDifferent Values? An evaluation of the Current

Explanation and Associated Mathematical ModelsIvan D. Chase1, Abhijit V. Deshmukh2, and Naga Krothapalli2

1. Department of Sociology, State University of New York at Stony Brook, Stony Brook, NY 11794-4356,USA. Corresponding author : [email protected]

2. Department of Mechanical and Industrial Engineering, University of Massachusetts, Amherst, MA01003, USA.

Abstract

When given two food sources, one higher in sugar content than the other, simultaneously andat equal distances from the nest, most of the foragers from colonies of mass recruiting antswill feed from the source higher in sugar. The current explanation holds that the ants visitingthe source higher in sugar mark the trail to it with greater amounts of pheromone than thosevisiting the source lower in sugar. This creates a divided pheromone trail with ants preferringthat branch leading, unbeknownst to them, to the better source. Researchers have developeda series of mathematical models based upon this explanation. In this paper we evaluate thosemodels pointing out a number of fundamental problems including that in some cases they makeants act in opposition to the current explanation and choose trails marked with less pheromone.We develop a series of discrete event simulations that do fit the current explanation closely,but find that in these simulations ants are as likely to prefer better sources as they are poorerones. In light of these results we suggest that the current explanation is incomplete, that othermechanisms must be involved, and that we need additional empirical investigations to discoverthese mechanisms.

Keywords: ants, food source decisions, discrete event simulation, distributed decision making, self-organization,agent models

1 Introduction

If an experimenter offers a colony of mass recruiting ants, one of the species using pheromones to markfood trails, two food sources simultaneously and at equal distances from the nest, but one is higher in sugarcontent than the other, most of the foragers will usually go to the source higher in sugar. Some of theforagers will feed from the source lower in sugar, but on the average, their numbers will be much lower thanthose going to the better source. This is of course a good decision for the survival and reproduction of thecolony: the ants concentrate on the food source that provides the most calories with the least amount ofeffort. But how do they do this? How do they “decide” which source is better and how do they coordinatetheir efforts so as to exploit it preferentially?

Biologists have been seeking answers to these sorts of questions for ants and other social insects for morethan a half-century, and recently researchers from such fields as operations research, industrial engineering,computer science, and applied mathematics have joined them. While they may not have the same substantiveinterests in social insects as biologists, these other researchers see the activities of ants and other socialinsects as examples of what can be called distributed decision making or self-organizing processes. Here wewill refer to them as distributed decision making processes. In these processes, groups of some individualentities, say, ants, machines, or people, find themselves in situations in which each individual entity hasrelatively limited information about the overall situation that the group faces and relatively limited ability

— Page 41 —


to communicate what knowledge they do have to other group members. The challenge in distributed decisionmaking processes is to determine what actions the entities should take so as to make group-level decisionsthat are as good as possible under these less than ideal conditions. Because evolution has worked oncolonies of ants and other social insects for millions of year, we assume that their ways of making distributeddecisions are fairly efficient. Consequently, these researchers from other fields hope that understanding howthese decisions are made in social insects will provide some fundamental insights into how similar kinds ofdistributed decisions might be made by teams of humans or things made by humans.

The explanation currently proposed for how ants decide between food sources suggests that ants marktrails to better sources with greater amounts of pheromone and differentially choose these trails over thoseto lesser sources marked with smaller amounts of pheromone (see below for more detail). Researchers haveproposed a number of mathematical models, most using differential equations, that attempt to test thefeasibility of this explanation. In this paper we first examine the fit between these models and the currentexplanation. We show that although the models do reproduce common experimental results, they do soby making a basic departure from the current explanation and that they would not reproduce the resultswithout making this departure. We next develop a series of discrete event simulations that do follow thecurrent explanation closely. However, when we do this, we find that there is a fundamental problem withthe current explanation. It cannot reproduce the experimental results if food sources are found randomly,as they are for real ants, but instead must implicitly assume that ants always find better sources first orboth sources at approximately the same point in time. We attempt to save the current explanation byadding a number of additional behavioral mechanisms to our simulations, but to no avail. In response to theapparent failure of the current explanation we suggest further experimental work to discover what additionalmechanisms ants use to make these decisions. We conclude by considering one of the broader conceptualissue raised by the problem of distributed decision making in ants: Given a specific group-level pattern oforganization, how do we design agents, what properties do we give them, so that they can produce thatpattern of organization?

1.1 The Current Explanation

In the explanation currently proposed for food source decisions, an ant that finds and drinks from a sourcelays a pheromone trail from the source back to the nest. If the source is of high value, for example, rich insugar, the ant is more likely to lay a pheromone trail, and if she does so, to mark it with more pheromonedroplets per centimeter, than an ant finding a source lower in sugar. Observational work on several speciessupports these assumptions (e.g., Hangartner, 1969; Beckers et al. 1992). Thus, the explanation continues,if an experimenter gives a colony one food source high and another source low in sugar simultaneously andat equal distances from the nest, a scout ant finding the better source will lay a better marked trail fromthis source to the nest while a scout finding the lesser source will lay a less well marked trail back to thenest. Then, as an ant leaves the nest in search of food, she will find a pheromone trail that splits into twobranches: one better marked and leading, unbeknownst to the ant, to the better source, and the other branchless well marked and leading to the poorer source. Ants “prefer” trails that are even slightly better marked,and so the ant looking for food takes the better marked trail and is more likely to follow it successfully tothe end than a less well marked trail. If she finds the food source, she in turns lays a trail back to the nest,reinforcing the efforts of the ant first finding the better source. The next ant out makes the same decision,and thus this trail becomes more and more well marked as each additional ant repeats the decision of theearlier ants. Eventually recruitment to the source ends either because the ants become sated or the foodis exhausted, either leading to the gradual evaporation of the pheromone trail. Ants only find the poorersource and lay a trail from it back to the nest if they randomly find it after losing the trail to the bettersource.

— Page 42 —


1.2 The Current Mathematical Models

Most of the mathematical models derived from the current explanation use differential equations to predictchanges in the number of ants going to either source over time. While these models show minor variationsthey all share a term in their differential equations indicating the number of ants recruited to a source ateach time unit. It consists of a constant, experimentally derived or just assigned, times the number of antsthat took the trail to the source, or are feeding at the source, depending upon the particular model, in thejust previous time unit. For example in equations (1) and (2) below from the pioneering work of Beckers etal. (1990) and in equations (3) and (4) from the very recent and thoughtful work of Sumpter and Beekman(2003), the recruitment terms are aAXA, aBXB and βAXA, βAXB , respectively. The recruitment terms aremultiplied times the total number of ants available for foraging but not already occupied with source A orB to give the number of ants taking the trail to A or B at a particular time. Other terms in these equationsgive the number of ants per time unit losing a trail, finding a source randomly after losing a trail, alreadyfeeding at a source, etc. but these need not concern us further here.

dXA/dt = aAXAfA(N −XA −XB − E)− bXA + cE (1)

dXB/dt = aBXBfB(N −XA −XB −E)− bXB + cE (2)

dXA/dt = (α + βAXA)(N −XA −XB)− sXA/(K + XA) (3)

dXB/dt = (α + βBXB)(N −XA −XB)− sXB/(K + XB) (4)

At first glance the recruitment terms appear to be good representations of the density of pheromonemarking on a trail. That is, the more ants that chose a trail in the previous time period, the larger therecruitment term, and the more ants that choose a trail in the present time period. However, on closerinspection, it can be seen that the recruitment terms can sometimes make the model ants act in ways thatare diametrically opposed to the ways that real ants are described as acting in the current explanation.Specifically, the recruitment terms can, under some conditions, make ants go to a source, even if the trail toit is less well marked than the trail to the other source. To see this, consider a scenario in which the betterand lesser food sources are given to a colony. Scout ants randomly search for food, and by chance, a scoutfinds the poorer source first and lays a weakly marked trail from it back to the nest. The next ant out findsonly one trail, the one leading to the poorer source, takes it, and reinforces it after feeding, and this processcontinues for a while with other ants. At some point a scout or an ant losing the trail to the poorer sourcefinds the better source and lays a trail back to the nest for it. Now there is a forked trail, but accordingto the current explanation, if the branch leading to the poorer source has been marked by enough ants tobe more well marked than the nascent trail leading to the better source, the next ant out should take thebetter marked trail to the poorer source.

However, the formulations of the models guarantee that this scenario can never occur. In the models,one food source can never be discovered before another. If we attempted to follow this scenario in themodels, assigning several ants to the poorer source, the models would automatically indicate that a fractionof an ant, something less than .5 of one, for example, in the model of Sumpter and Beekman (2003), wouldrandomly discover the better source in the same time period. Then this fraction of ant would be multipliedtimes the rate constant in the recruitment term to determine how many ants would go to the better sourcein the next time period. These ants or fractions of ants would go to the better source even if enough antshad gone to the poorer source to make its trail more well marked than the trail to the better source. Thusthe ants in the models would not be following the rule in the current explanation that states that ants shouldtake the more well marked trail at a fork. This combination of allowing fractional ants, making sure thatat least a fraction of an ant finds the better source in the initial time period, and a recruitment term thatforces some ants to take the trail to the better source even if it is not as well marked as the one to the poorer

— Page 43 —


source, allows the less well marked trail to the better source to get its foot in the door, so to speak. Fromthen on the trail to the better source steadily gains to attract more ants ultimately than the initially morewell-marked trail to the poorer source. Only in extreme circumstances, such as when a large contingent ofants randomly and simultaneously finds the poorer at the instant foraging begins and no ants find the bettersource will the poorer source be able to preserve its initial advantage over the better source. For example, inthe Sumpter and Beekman (2003) model, nine or more ants have to find the poor source as soon as foragingbegins with none finding the better source.

In sum, while these models do roughly approximate experimental results in that they usually predictthat nearly all the foragers will eventually go to the better source in the conditions under which they arenormally tested, they do so by using formulations that sometimes must violate the current explanation andby insisting that both sources are found at the same time, even if it is by fractional ants. So the questionremains: Can the current explanation account for the foraging decisions of real ants? In the next sectionwe describe a series of discrete event simulations that do replicate the current explanation of how ants makefood source decisions, and we use the results of the simulations to answer this question.

2 Discrete Event Simulations

We developed the discrete event simulations using Swarm, a software package specifically designed for multi-agent simulations of complex situations. In the simulations a large, two-dimensional foraging arena wascreated and on this arena we placed a nest for the ants and two food sources, one higher in value thanthe other, and at equal distances from the nest. When a specific run of a simulation started, those antsdesignated as scouts left the nest and began random walks in search of food. When a scout found a foodsource, she “ate” some of the food, and returned to the nest laying a pheromone trail. The trail consisted ofdiscrete batches of pheromone “droplets” laid on the grid of small squares composing the arena. The trailfrom a good source was laid with two and a half times more droplets per batch than the trail from a poorersource. When the ant returned home, she was free to once again leave the nest in search of food, but withno memory of having previously found a food source.

When an ant left the nest, she searched for food and pheromone in that order. If there was only onepheromone trail in place, she followed it away from the nest toward the food with a small probability oflosing the trail at each step along the way. An ant losing a trail walked randomly in search of food. If twopheromone trails led away from the nest, an ant always took the trail with more droplets per batch. If anant successfully followed a trail to the food source, she ate some food (each source was inexhaustible overthe course of a run) and reinforced the trail by adding the appropriate number of droplets for the level offood source to those already existing at each grid location on her way back to the nest. When she returnedto the nest, she was again ready to go out in search in food, but would do so by looking for the most wellmarked pheromone trail, if two existed, and without memory of where she had been previously. If an antdoing a random walk in search of food came across a pheromone trail, she followed it in a direction awayfrom the nest. The program counted the total number of ants feeding at each source over the course of aspecific run of the simulation.

The first line of Table 1 gives the results of 100 individual runs of this simulation. When we first sawthese results we found them very surprising. Rather than showing that most of the ants always, or evenusually, visited the better food source, the results were that for about half of the runs nearly all the antsvisited the better source, but in the other half of the runs nearly all the ants visited the poorer source.In other words, the ants as a whole decided on one source or the other, but which source they decidedon appeared to be random, and about the same total number of ants usually visited the source that waspreferred. Unlike the continuous formulations in which ants could split themselves into fractions and violatethe principles of the current explanation, when the ants found sources randomly, as real ants do, and pickedpheromone trails according to the current explanation, our simulation showed that the current explanationwas not sufficient for the ants to favor better food sources over poorer ones. The problem that occurred for

— Page 44 —


the current explanation was that whichever source was first found by a scout gathered an advantage, since,for some period, it was only the pheromone trail, before one to the other source was laid. Even though eachant finding the better source marked the trail to it with two and a half times as many pheromone dropletsas to a poorer source, the trail to the better source was not able to overcome the initial advantage of thetrail to the poorer source. Something like this may occur in the small number of cases in which experimentsshow that colonies prefer poorer sources, even though they were given a better source at the same time anddistance from the nest as the poorer source. While it might be possible to reverse our results through somecombination of arena size, number of scout ants, walking speed, amounts of pheromone droplets depositsfor trails, etc., our results strongly suggest that real ants may be using additional mechanisms in decidingbetween food sources that alleviate the potential problem of scouts randomly finding a poorer source beforea better one.

Table 1: Results of simulations in which ants chose the better-marked pheromone trail

Percent of runs in Ratio of ants visiting Average number of antsMechanisms in which better source preferred source to visting preferred sourcesimulation was preferred other source when when the source is. . .

the source is . . .Better Poorer Better Poorer

No rejection,no recruitment,no expectation

47.0 792.5 784.9 67302.4 67198.0

Rejection,no recruitment,no expectation

56.0 819.5 243.3 65480.5 39719.1

No rejection,recruitment, 59.0 1839.2 820.2 79553.8 65713.6expectation,No rejection,recruitment, 54.0 1919.9 1884.0 81792.7 81106.3no expectationRejection,recruitment, 60.0 1821.6 335.4 79301.8 455589.1no expectation

In response to these results, we developed a number of additional simulations to see whether we couldsalvage the current explanation. These simulations incorporated several other behavioral mechanisms thathave been observed in ants along with the rule of deciding on the better-marked pheromone trail. Themechanisms were recruitment (an ant finding a better source activates more ants to leave the nest in searchof food than one finding a better source), rejection (if an ant finds a better source, she is more likely to feedand leave a trail from it than if she finds a poorer source), and expectation (if activated by an ant finding abetter source, an ant finding the better source is very likely to feed and lay a trail from it, but very unlikely tofeed and lay a trail from a poorer source; if activated by an ant from a poorer source, an ant finding a pooreror a better source is very likely to feed and lay a trail from it). Unfortunately, these additions mechanisms,as shown in Table 1, did not significantly change our previous results. In all the new simulations, the antsstill overwhelmingly concentrated on one source or the other in each run of a simulation. However, for thesimulations involving either rejection or expectation, in those runs in which the ants did concentrate on thepoorer source, fewer ants overall fed from the source: although many ants may have found the poorer source

— Page 45 —


in those runs, they did not feed from it. In contrast, for the runs in which the better source was preferred,most of the ants finding the source would fed from it.

3 Discussion

In this paper we have demonstrated that, although the present differential equation models of foragingdecisions in ants roughly fit the experimental data, they do so by allowing a number of conditions thateither contradict the current explanation or are contrary to the conditions of real ants. These conditionsincluded some of the ants choosing poorly marked pheromone trails in preference to better ones in somesituations, fractions of ants that could select trails and find food, and rather than finding food sourcesrandomly, the ants found better sources at the same time as poorer ones. Further, when we developeda series of discrete event simulations that more closely followed the current explanation, we found thatmost of the ants in a colony did not usually feed from the better food source but randomly switched fromconcentrating their efforts on one source or the other in different runs of the simulation.

Although we have pointed out some of the problems with the present differential equation models andoffered some alternative models based upon discrete event simulation, our chief goal here is neither tocondemn the present models nor to tout the superiority of our own. To the contrary, these differentialequation models have been of great importance in focusing attention on the problem of food source decisionsin ants, contributing to the many applications using ant-based algorithms, and more generally advancingthe study of self-organization in animal systems. And we are sure that our simulations have a number ofshortcomings. Instead our aim has been to suggest that distributed decision making processes in ants area much richer and more complex phenomenon than has been realized up to now and that we have yet todiscover some of the key behavioral (or other) mechanisms that allow ants to make these decisions in arobust way under varying circumstances. In particular, it seems we do not know how mass recruiting antsare still able to concentrate on better sources in those cases when poorer sources are found first (but we dohave some idea of how this might be done in group recruiting ants).

More generally, our apparent lack of understanding how ants choose between food sources of differentvalues is a specific illustration of what has been called the “inverse problem”. In this problem we observesome, often simple, pattern of organization in a group of entities, and then try to figure out the attributesand behaviors of the entities that have allowed them to create the organizational structure. As our workhere and other work on animal social systems suggests, this can be a very difficult puzzle to solve (Chase etal., in press). However, the many varieties of organization patterns and the ability to do both experimentsand close observation in animals suggest that their social systems are a promising venue for developing themethodological and theoretical approaches that can aid us in this kind of reverse engineering.

References

Beckers, R., Deneubourg, J. L., Goss, S., and Pasteels, J. M. 1990. Collective decision making throughfood recruitment. Insectses Sociaux 37: 258–267.

Beckers, R., Deneubourg, J. L., and Goss, S. 1992. Trail laying behaviour during food recruitment in theant Lasius niger (L.). Insectses Sociaux 39: 59–72.

Chase, I. D., Tovey, C. and Murch, P. In press. Two’s company, three’s a crowd: differences in dominancerelationships in isolated versus socially embedded pairs of fish. Behaviour.

Hangartner, W. 1969. Structure and variability of the individual odor trail in Solenopsis geminata Fabr.(Hymenoptera, Formicidae). Zeitschrift fur vergleichende Physilogie 62: 111–120.

Sumpter, D. J. T. and Beekman, M. 2003. From nonlinearity to optimality: pheromone trail foraging byants. Animal Behaviour 66: 273–280.

— Page 46 —


Rules of decision making: trade-offs incollective house hunting

Anna Dornhaus, Nigel R. Franks

School of Biological Sciences, University of Bristol, Bristol, BS8 1UG, England.Corresponding author : [email protected]

Abstract

Social insect colonies often display collective behaviors that transcend those of the individualgroup members. House hunting in the ant species Leptothorax albipennis is an example for sucha behavior. Individuals have very limited information (usually only on one alternative), but theinformation collected by many individuals is collated at the level of the group, such that thecolony usually chooses the best of many potential nest sites available. In this collective decision-making process, the ants have to make a compromise between speed and accuracy. They flexiblyadjust this compromise depending on the current need for a quick decision by tuning a singleparameter (the “quorum threshold”). With a low quorum threshold, less information is collectedand a quicker decision made. A quorum threshold of 1 equals individualistic decision-making; atthe other extreme, the information collected by a large number of scouts is integrated to makea very careful and hence a slow decision. This occurs, for example, when colonies living in anintact nest discover a superior one. We have also investigated the influence of group size on thespeed and accuracy of this decision-making process, and the results indicate that larger coloniesare able to make faster decisions without sacrificing accuracy.

Keywords: social insects, Leptothorax albipennis, self-organization, speed-accuracy trade-off, colony size,area measurement

1 Introduction

Making a correct decision can involve the collection and processing of a considerable amount of information.Information collection takes time. Hence there is a speed vs. accuracy trade-off in decision-making, andindeed such a trade-off is well known in psychology and the study of individual decision-making (Busemeyerand Townsend, 1993; Osman et al., 2000; Nikolic and Gronlund, 2002; Roitman and Shadlen, 2002; Chittkaet al., 2003). We have examined the relationship of speed and accuracy in a collective decision-makingprocess: the house hunting of Leptothorax albipennis ants. These ants live in fragile rock crevices, whichmay be frequently destroyed by weathering processes. When this happens, the colony has to look for newnest sites, choose one of them, and coordinate the move of all members to the chosen site. We show that therelative importance of speed and accuracy in this decision-making system can change, and that the behaviorof individuals is adapted to the existing trade-off. This shows one of the strengths of social insect colonies,namely their ability to integrate the local information gathered by individuals such that at a collective level,a well-informed decision can be made (Franks et al., 2002). This collective effect relies on many individualscollecting and collating information.

Even single scout ants are able to examine potential new nest sites and judge their suitability by measur-ing several characteristics of these sites and integrating this information (Franks et al., 2003). For example,they are able to assess the floor area of a nest site (Mallon and Franks, 2000). Individual ants are thus ableto compare different nest sites and make sensible choices between them. Experience and learning probablyplay a significant role in the performance of individuals (Langridge et al., in prep). Nevertheless, the dis-tributed information collection employed by the colony allows much quicker surveys of a larger area, since

— Page 47 —


many scouts can search for nest sites in parallel. Indeed we find that the amount of information used tomake a decision varies with colony size: larger colonies make faster decisions. A sufficient group size maytherefore be crucial to make the collective decision-making algorithm efficient. With smaller colony sizes,the ants may have to rely more on the decisions of individual scouts.

2 Collective house-hunting

In house-hunting, as in any decision-making process, information is usually collected on potential alternatives,this information is evaluated and a decision is then made (Franks et al., 2002). This is true as much forhumans as it is for social insects. In the case of a colony of bees or ants, however, the problem has to besolved on the level of the collective, a group of individuals, where each member may have a slightly differentperception of the relative costs and benefits of the available options, but indecisiveness or lack of consensuscan carry high costs. If the decision is not made rapidly, the entire colony may be put at risk, particularlyif the old nest has already become uninhabitable. Likewise, fragmentation of the colony caused by lackof consensus can prevent the colony from resuming its normal life. The colony thus has to make a swiftdecision, which nevertheless makes use of as much information as possible, and needs to achieve a consensusamong colony members.

We have studied an example of a decision-making strategy that fulfills these goals. Ants of the speciesLeptothorax albipennis live in colonies of up to a few hundred individuals. These colonies inhabit rockcrevices, shallow spaces between flakes of rock, which may break off and thus expose the colony to predatorsand harsh climatic conditions. If this happens, colonies are able to find new suitable nest sites, choose one ofthese, and move all ants, brood and the queen there within a few hours. These colony emigrations consist ofthree distinct phases. (1) First scout ants leave the nest to search for new nest sites. (2) As soon as a scoutant finds a nest site, the ant evaluates the site and then returns to the colony. After a time interval, whichis shorter for better nest sites, the scout will start recruiting other ants to its chosen site using tandem runs.In a tandem run, one ant is lead by the scout ant, with the follower keeping antennal contact. This is a slowand laborious process, since the two ants have to find each other again every time the antennal contact isinterrupted. Nevertheless, ants are recruited to the new nest sites, and then return to the old nest to startrecruiting themselves. This causes a positive feedback, with better sites attracting more ants more quicklythan low-quality sites. At some point, the number of ants present in one of the new nest sites exceeds thequorum threshold. If more ants than this quorum are present in a nest site, the ants change their recruitmenttechnique: they switch from tandem running to social carrying. In social carrying, an individual is carriedby another individual, with the disadvantage that the carried ant can probably not learn the route to thenew nest, but the advantage of proceeding at about 3 times the speed of tandem running. The start of socialcarrying indicates the choice of one nest site by the colony. The quorum at which carrying starts is usuallyreached first in the best site available, since that will have attracted most ants and the earliest tandem runs.After rapid carrying has started, the positive feedback for the chosen site accelerates to a degree that othersites are usually soon abandoned. In the ensuing phase (3) of the emigration process, all remaining adultants and brood items are quickly transported to the chosen new nest by means of social carrying (Mallon etal., 2001; Franks et al. 2002; Pratt et al., 2002).

In their house-hunting decisions, L. albipennis thus make use of scouts who individually look for andassess the quality of available options. The information collected by these scouts however is then evaluatedon a collective level, by employing two different recruitment strategies. The use of a quorum thresholdusually ensures that a collective decision is made and a consensus reached.

— Page 48 —


Figure 1: When we experimentally set ants under time pressure by creating a harsh environment (dry air flow),they lowered their quorum thresholds and made faster but more error-prone decisions.

3 Nest-quality assessment

The scout ants, when they find and explore a new potential nest site, have to evaluate the quality of thatsite. There are a number of properties that a cavity must have in order to be suitable as a nest site forthe ants. We know that light level, cavity floor area and height, and the width of the entrance(s) influencethe ants’ judgment of a site’s quality (Franks et al., 2003). The optimal cavity is dark, has sufficient floorarea and height, and only a very narrow (easy to defend) entrance. It is likely that more factors play arole. Individual scouts must be able to measure all of these things to judge the overall quality of a cavityas nest site. One of the more difficult measurements is estimating the floor area. Imagine standing in anunknown, dark cavity of unknown shape and unknown number of entrances, and having to estimate it’s floorarea! Adding to this is the fact that the cavity will be of a size that would house a few hundred people.L. albipennis ants solve this problem by using individual-specific pheromone markings and an algorithmcalled “Buffon’s needle.” When the scout first explores the cavity, it lays a pheromone trail while wanderingaround in it. It then leaves the cavity and comes back a little later to explore it again. On this secondvisit, it measures the number of times it crosses its own pheromone trail laid earlier. From the number ofcrossings, an estimation of the area of the cavity can be computed (Mallon and Franks, 2000; Mugford etal., 2001).

This simple method allows individual scouts to arrive at estimates for cavity size without constructinga map or employing complicated distance-measuring capabilities. One might think that the second visit tothe cavity is a waste of time, because theoretically the method would also work if the ant just explored thesite once, and laid a pheromone trail at the same time as measuring how often it crossed the trail (one-passstrategy). But the ants use two visits to perform their area measurement, completely separating the trail-laying from the measuring phase (two-pass strategy). Using a simulation of this process, we could show thatin fact ants would not save any time if they did not separate these phases and explored the cavity only once(Marshall et al., 2003). Considering that there is thus no obvious advantage to using a one-pass strategy,ants might use a two-pass strategy to improve their ability to perceive the pheromone trails, which mightbe difficult if they were producing new, presumably stronger trails at the same time as measuring how oftenthey intersected with old ones. In addition, the two-pass strategy actually saves pheromone, because thetrail is only laid in half of the time the ant spends exploring. The two-pass strategy actually used by the antsthus seems to be an elegant solution to the problem of measuring a large area with only a short perceptionradius.

— Page 49 —


4 Decisions with speed and accuracy

Any decision is a choice between alternatives. The consequences of choosing one or another of these al-ternatives may be good or bad, or, more precisely in a biological context, they might be associated withdifferent reproductive success (which is what biological organisms are selected to maximize). In the caseof ant house-hunting, the choice of a nest site may influence the colony’s fitness in a number of ways. Thequality of the new home determines not only future exposure to predators, costs in terms of building andthermoregulation, but the nest’s size may also limit colony growth, and a fragile nest may make an earlysubsequent move necessary (Dornhaus et al., in press). Therefore, when choosing a new nest, the colony’sfuture reproductive success is critically influenced by its ability to select the best of alternative potential nestsites. On the other hand, the colony is also under pressure to make a quick decision. If the colony’s old nesthas deteriorated, any time taken to make a decision extends the time that the queen and brood are exposedor inappropriately protected in the old nest. If queen or brood are lost during the emigration process, thecolony risks losing all reproductive potential. The house-hunting decision thus has to be made with bothspeed and accuracy. However, these two factors represent a trade-off, as the information collection requiredto make an optimal decision takes time. A compromise thus has to be made between deciding quickly andsurveying all available information.

Figure 2: Individualistic decisions, where scouts start carrying brood to a new nest without waiting for other antsto discover and accumulate in it, only occur in harsh conditions, when speed gains a higher importance relativeto accuracy.

Considering that we expect ant colonies to maximize the speed of their emigrations, the house-huntingof L. albipennis seems to involve unnecessary built-in time lags. When they have discovered a potential nestsite, scouts wait for a certain time period before they start recruiting with tandem runs. This time periodcorresponds inversely to the quality of the discovered site, with shorter waits for better sites. Second, therecruitment method of social carrying is more than 3 times faster than tandem running - so why is tandemrunning used at all? Both of these kinds of apparently deliberate time lags are part of the collective decisionmaking process. The first, the delay of the start of recruitment, serves to make the decision of the colonymore dependent on the quality of sites than their order of discovery. A good nest site, even when discoveredlater than other sites, attracts tandem runs and thus increasing numbers of ants earlier than sites of lowerquality (Franks et al., 2002). And delaying the switch from tandem running to social carrying allows scoutsmore time to search for additional alternative sites. By adjusting the quorum threshold, ants can fine-tunehow much time they will invest into searching for more potential nest sites (Franks et al., in press). Under

— Page 50 —


imminent threat, they will set a lower threshold, which leads to a quicker switch from tandem running tocarrying and thus a quicker decision; however, this comes at a cost to accuracy (Fig. 1).

In experimentally induced harsh environmental conditions, some scouts set their quorum threshold toone, thus effectively making individualistic decisions rather than integrating the information collected byother scouts (Fig. 2). If, on the other hand, the colony is housed in an intact nest, and thus has ample time,the ants set a very high quorum threshold (Dornhaus et al., in press). The colony will still move if a bettersite becomes available, but a lot of time is taken before such a decision is made.

5 Group size

In addition to the quorum threshold, the number of workers in a colony is an important factor influencingthe speed with which a collective decision is made. In colonies of naturally varying size, we found thatlarger colonies make quicker decisions. They do this by setting the number of scouts necessary to start theemigration to a new site lower in relation to colony size than smaller colonies (Fig. 3). Small colonies areprobably unable do this, because reducing the already low (absolute) number of scouts further would reducethe accuracy of the decision made. Larger colonies however have such a high absolute number of scouts thatthey can afford to have a lower proportion, i.e. a lower relative quorum threshold, than smaller colonies.This enables them to make quick decisions without sacrificing accuracy in the decision-making process.

Figure 3: When we tested colonies with different numbers of adult workers, the larger colonies had higher quorumthresholds (open symbols), but lower relative quorum thresholds (the quorum divided by the colony size; filled,gray symbols). Colonies with more workers also made quicker decisions (measured as the time from discovery tothe start of brood transport to a new nest site).

6 Conclusions

Decision-making is fundamental in animal behavior and in psychology. Moreover, beyond biology, economicscan be defined as the science of decision-making. Similarly, in engineering and software development, artificialsystems often have to make autonomous and flexible choices. In the ants we have studied we find subtleblends of individual and collective decision-making and associated flexibility in the so-called speed-accuracytrade-off. Ant colonies provide unrivalled opportunities to study individual and collective decision-makingand how the two interact. Much of their decision-making systems involve visible interactions and hence

— Page 51 —


they are uniquely susceptible to experimental manipulations and the rigorous testing of hypotheses andtheir qualitative and quantitative predictions. One key issue for now and the future will be colony size.What is the best size of organization for a given decision-making system and vice versa what is the bestdecision-making system for a given size of organization? For social vertebrates, such as ourselves, colonies ofinsects at first appear to be alien societies, but as ever there are, and will be, generic lessons to be gleanedfrom their similarities, differences and diversity.

Acknowledgements

A.D. wishes to thank the DFG for funding (DO 774/-1).

References

Busemeyer, J. R., Townsend, J. T. 1993. Decision field-theory: A dynamic cognitive approach to decision-making in an uncertain environment. Psychol Rev 100: 432–459

Chittka, L., Dyer, A. G., Bock, F., Dornhaus, A. 2003. Bees trade-off foraging speed for accuracy. Nature424: 388.

Dornhaus, A., Franks, N., Hawkins, R. M., Shere, H. N. S. 2003. Ants move to improve - colonies ofLeptothorax albipennis emigrate whenever they find a superior nest site. Anim. Behav. in press

Franks, N., Dornhaus, A., Fitzsimmons, J., Stevens, M. 2003. Speed and accuracy in ant decision making.Proc Roy Soc B in press

Franks, N., Pratt, S., Mallon, E., Britton, N., Sumpter, D. 2002. Information flow, opinion-polling andcollective intelligence in house-hunting social insects. Phil. Trans. R. Soc. 357: 1567–1583

Franks, N., Mallon, E., Bray, H. E., Hamilton, M. J., Mischler, T. C. 2003. Strategies for choosing betweenalternatives with different attributes: Exemplified by house-hunting ants. Anim. Behav. 65: 215–223.

Mallon, E., Franks, N. 2000. Ants estimate area using Buffon’s needle. Proc Roy Soc B 267: 765–770.

Mallon, E., Pratt, S., Franks, N. 2001. Individual and collective decision-making during nest site selectionby the ant leptothorax albipennis. Behav. Ecol. Sociobiol. 50: 352–359.

Marshall, J., Kovacs, T., Dornhaus, A., Franks, N. R. 2003. Simulating the evolution of ant behaviour inevaluating new nest sites. ECAL Conference paper

Mugford, S. T., Mallon, E., Franks, N. R. 2001. The accuracy of Buffon’s needle: A rule of thumb used byants to estimate area. Behav. Ecol. 12: 655–658.

Nikolic, D., Gronlund, S. D. 2002. A tandem random walk model of the sat paradigm: Response times andaccumulation of evidence. Br. J. Math Stat. Psychol. 55: 263–288.

Osman, A., Lou, L. G., Muller-Gethmann, H., Rinkenauer, G., Mattes, S., Ulrich, R. 2000. Mechanisms ofspeed-accuracy trade-off: Evidence from covert motor processes. Biol Psychol 51: 173–199

Pratt, S., Mallon, E., Sumpter, D., Franks, N. 2002. Quorum sensing, recruitment and collective decision-making during colony emigration by the ant leptothorax albipennis. Behav. Ecol. Sociobiol. 52:117–127.

Roitman, J. D., Shadlen, M. N. 2002. Response of neurons in the lateral intraparietal area during acombined visual discrimination reaction time task. J. Neurosci. 22: 9475–9489.

— Page 52 —


Automatic Identification of Bee MovementAdam Feldman and Tucker Balch

Georgia Institute of Technology, Atlanta, Georgia 30332, USA

Abstract

Identifying and recording animal movements is a critical, but time-consuming step in behaviorresearch. The task is especially onerous in studies involving social insects because of the numberof animals that must be observed simultaneously. To address this, we present a system thatcan automatically analyze animal movement, and label it, on the basis of examples providedby a human expert. For our experiments, activity in an arena is recorded on video, thatvideo is converted into location information for each animal by a vision-based tracker, and thennumerical features such as velocity and heading change are extracted. The features are usedin turn to label the sequence of movements for each observed animal. Our approach uses acombination of k-nearest neighbor classification and hidden Markov model techniques. Thesystem was evaluated on several hundred honey bee trajectories extracted from a 15 minutevideo of activity in an observation hive. Movements were labeled by hand, and also labeled byour system. Our system was able to label movements with 81.5% accuracy in a fraction of thetime it would take a human.

Keywords: Behavior Recognition, Bee Movements, Hidden Markov Models, Biological Inspiration.

1 Introduction

Bees perform many behaviors while in the hive. Currently, when a biologist (or other researcher) wants tostudy these behaviors, video footage must be studied and hand-labeled. This requires watching the videomultiple times, and is an arduous and time-consuming process. If these behaviors could be recognized andidentified by a software system, research in this area could be greatly accelerated. By using this system, ahuman would need only to label a small percentage of the data as a “training set,” while the system wouldidentify the labels of the remaining data. This would save time, which could be better used analyzing theautomatically labeled data. Our goal in this work is to create such a system.

The behaviors of interest are sequential activities that consist of several physical motions. For example,bees commonly perform waggle dances. These waggle dances consist of a sequence of motions: arcing to theright, waggling (consisting of walking in a generally straight line while oscillating left and right), arcing to theleft, waggling, and so on (Seeley, 1995). In this sense, “dancing” is a behavior, as is “following” and “activehive work.” A follower is a bee who follows a dancer, but does not perform the waggle segments, while a beeaccomplishing active hive work is neither a dancer nor a follower, yet moves around with apparent purpose.On the other hand, arcing, waggling, moving straight, and loitering are examples of motions, which aresequenced in various ways to produce behaviors. Before distinct behaviors can be labeled, a software systemmust first recognize motions. The system described in this paper is designed to label a bee’s motions (andsuch labels will be used to identify behaviors in later work).

There are several components to this system. First, bees in an observation hive are recorded. Then,the tracker extracts the x- and y-coordinate information of each bee. Some of this data is hand labeled bya person, to be used as the training set, before being passed on to the k-nearest neighbor classifier. Theoutput is then passed, as input, to the hidden Markov model, which finally outputs the predicted labels ofthe data set.

The first step in learning to identify motions or behaviors is to enable the system to “see” the bees.This is done by first videotaping bees in an observation hive. Tracking software (discussed below) is used to

— Page 53 —


track each individual bee, and record its positional information at each time step. From this raw location,features (such as velocity and heading change) are extracted. One aspect of this work was to determinewhich features were the most useful in deciding the correct motion label for a given time step.

After the features are generated, they are used as input for a k-nearest neighbor classifier, which attemptsto correctly label each data point. The labels used are:

• ARCING LEFT (AL) The bee is moving in a counter-clockwise direction

• ARCING RIGHT (AR) The bee is moving in a clockwise direction

• STRAIGHT (S) The bee is moving steadily in a fairly straight line

• WAGGLE (W) The bee is moving straight while oscillating left and right

• LOITERING (L) The bee is moving very slowly in a non-specific direction

• DEAD TRACK (D) The bee is not moving at all

The primary issues that were examined in this area were choosing the best features with which torepresent the data and finding appropriate parameters that result in the highest overall accuracy.

Finally, a Hidden Markov Model (HMM) is used to increase accuracy by smoothing the labels across thedata set. This is accomplished by using the output from the classifier as input into the Viterbi algorithm overa fully connected HMM. In this way, incorrect classifications that are statistically unlikely can be adjusted.For example, if there is a series of ARCING RIGHT data points with a single ARCING LEFT in the middle,it is likely that the single ARCING LEFT is supposed to be an ARCING RIGHT, even though the featuresquantitatively designate an ARCING LEFT. The HMM technique will correct mistakes of this nature.

Our hypothesis is that this system will provide a means of labeling unknown data with reasonableaccuracy. Note that since this recognizer is part of a larger system that has the goal of identifying behaviorsautomatically, it is not necessary to be able to correctly label every single frame. If a majority of individualmotions can be labeled properly, then it will be possible to infer the correct behavior (dancer, follower, etc).

2 Background and Related Work

The k-nearest neighbor (kNN) classifier is a classification technique that attempts to classify each datapoint based on its location in n-dimensional feature space (where n is the number of features). As with allsupervised learning algorithms, the data set to be classified is broken into two parts – a training set anda test set. The training set is manually labeled, and is used to train the system to be able to classify therest of the data (the test set). The training points are then all plotted in the feature space. Values are firstnormalized so that every dimension of the feature space is uniform (such as from 0 to 1).

Classification works by plotting each test set point in the populated feature space and finding the knearest neighbor points (geometrically). Each of these points increment the score associated with their ownlabel by a value proportional to their inverse squared distance from the test point. Whichever label has thehighest score after considering all k points is the label that is given to the test point. In this way, test setpoints are classified based on the labels of the points that they are near in the feature space (Mitchell, 1997].

This technique works well, but has a limitation when applied to this purpose. k-nearest neighbor considerseach data point individually, without considering the sequence of data points as a whole. Therefore, for ourapplication, we also employ Hidden Markov Models to take advantage of this time series information.

If we assume an observed agent acts according to a Markov model, we can employ HMM-based approachesto identify its behavior. Hidden Markov Models (HMMs) can be used as models of sequenced behavior. Theyconsist of a series of states, observations and transitions. The states represent the topography of the model,with the model being in one state at any given time. The observations correspond to the output of thesystem being modeled. For each state, there is a probability of each observation occurring. Additionally,HMMs require a probability table for the initial state. This table is the probability of beginning any sequencein each state. An HMM can be thought of as a graph where each state is a node and each transition with

— Page 54 —


Figure 1: Overview of our system.

non-zero probability is a link. Frequently, HMMs are given a topology to reflect the behavior it is trying tomodel (the top diagram in Figure 3 models a waggle dance).

Once the parameters of the HMM are created, it can be used, by applying the Viterbi algorithm, toanswer this question: Given an observation sequence, what is the most likely state sequence which createdthe given observation sequence? The Viterbi algorithm takes as input a specified HMM (states, observations,and probability distributions) and an observation sequence. It returns the most likely state sequence throughthe HMM which created that observation sequence (Rabiner, 1989).

Traditionally, hidden Markov models have been used in speech recognition tasks. However, they canalso be used for any form of gesture recognition. Unfortunately, most available HMM toolkits are geared forspeech recognition, and require adapting for general gesture recognition. In light of this, the Georgia TechGesture Toolkit GT2k (Westeyn et al., 2003) created. It was designed as an all-purpose gesture recognitiontoolkit, and supports such projects as American Sign Language recognition (Brashear et al., 2003).

Another type of behavior recognition was studied by Kwun Han and Manuela Veloso (Han and Veloso,1999). They examined identifying the behavior of autonomous robots, as applied to robotic soccer. Theirframework uses hidden Markov models to recognize the behaviors of the robotic agents.

3 Approach

Our system consists of several components. Figure 1 provides an overview, illustrating the flow of data fromone component to the next. First, a video camera records bees in the observation hive. This video is passedto a tracker, which extracts coordinate information to be used by the human labeler (creating the trainingset) and then by the kNN classifier. The output of the kNN classifier is used as an observation sequence bythe Viterbi algorithm (with an HMM) to generate the most likely state sequence. This final sequence is thelabels determined by the system.

Tracking software is necessary to convert the bee videos into data that can be used by other software.[Khan 2003] In our experiments, some bees were removed from the hive and individually painted, by applyinga drop of brightly colored paint (such as red or green) to each bee’s back. A video camera was then trainedon a section of the hive, and a recording was created. The tracker is then applied to the recording. For eachframe of the video, the tracker is able to identify the location of each painted bee that is visible. Since thespeed of the video is 30 frames per second, we now have the coordinate information of each (visible) paintedbee every 0.033 seconds. This is enough information to get a clear picture of the bee’s movements.

The TeamView software (Figure 2) is used to visualize and hand label the data sets. The files thatcontain the x- and y- coordinate information (from the tracker) are loaded into TeamView. When the filesare played, the main viewing window displays the position of each bee currently in the field. The linesbehind each bee are a trail, showing where the bee has been over the last x frames (where x is definableby the user). The labeling options allow a user to mark a segment of the video and apply any label to aspecific bee. In this way, it is possible to label the motions of each bee across the entire data set. Further,once data is labeled, the labels will be displayed next to the bee they are associated with. The advantage

— Page 55 —


to using this software is the speed with which a human can label the data, as compared to more traditionalpen and paper method of using a stopwatch and the original video.

3.1 Data Generation, Feature Extraction and Classification

The data used in this system begins as videos of bees in the hive, prepared for analysis by the tracker, asdiscussed above. Once the coordinate information for each tracked bee is obtained from the tracker, basicfeatures that will be used to determine what motion the bee is performing are extracted. All features arecalculated for each tracked bee during every frame in which it is visible. Since all values will be normalized,the units of measurement can be disregarded. Seven features that were extracted and examined for theirusefulness (where t is the current frame in time):

• Instantaneous Speed (v0) from time t-1 to t

• Speed over a Window (v1) from t-3 to t+3

• Raw Heading (h0) from t to t+1

• Heading Change over a Small Window (h1) from t-1 to t+1

• Heading Change over a Large Window (h2) from t-20 to t+20

• Speed times Heading (sh0) multiply h1 and v0

• Average Speed times Heading (sh1) average of sh0 values from t-5 to t+5

Before kNN classification can be used, the appropriate features must be determined. From the informa-tion generated by the tracker, seven features are available. It is possible to use all seven of these features,however, it is beneficial to reduce this number if not all features are useful in classification. Reducing thenumber of features (and therefore the dimensionality of the feature space) will result in simpler and quickercomputation, greatly reducing the working time of the system. Also, in some cases, certain dimensions areworse than not helpful – they can actually be harmful to classification. This is because two points close toeach other in a dimension that does not affect labeling would seem closer together in feature space than ifthat dimension were not included. For example, bee color has nothing to do with what motion a bee is per-forming, so it would not be a useful feature. Yet by including it, two bees of similar color who are performingdifferent motions may appear (in feature space) to be more similar than two bees that are performing thesame motion (and therefore warrant the same label) but are very different colors.

In order to determine which features are helpful and which are useless (or harmful) in determining thelabel of a data point, a brute force method was applied. Every combination of the seven available featuresfrom each one individually to all seven together was tested by applying the kNN algorithm to a large trainingset. The combination of features that resulted in the highest accuracy (defined as the percent of the testpoints labeled correctly) were considered the most useful, and will be the only features used in the rest ofthe experiments.

In our experiments, the training set is made up of 200 points of each type of motion. This ensures fairrepresentation, despite frequency disparities among the labels (unlike some other of selecting the trainingset). The importance of this can be found in the infrequency of our most useful label WAGGLE. This labelis very telling due to its appearance only during a dance. However, WAGGLE points make up only 0.1

3.2 HMM Smoothing

The kNN algorithm is very good at classifying data points based on features that are similar in value tothose in the training set data. However, there are several reasons why the correct label does not directlyreflect the features. For example, often while a bee is arcing right, it will jitter, causing the features to looklike there are some frames of loitering or arcing left in the middle. In this case, the classifier will label theseframes differently. What we want is to smooth these places where the data isn’t representative of what is

— Page 56 —


Observation ProbabilitiesAL AR D L W S

AL .529 .040 .003 .140 .001 .000AR .127 .765 .011 .060 .003 .000D .000 .000 .982 .017 .000 .001L .000 .000 .002 .997 .000 .001W .015 .014 .002 .000 .968 .001S .002 .003 .007 .016 .001 .971

Figure 2: The HMM our system learned, after removing transitions with a probability less than 0.005 (left), andobservation probability table determined in the experiment discussed below (right). Cell (x, y) of the observationtable shows the probability of observing x while being in state y.

really going on. Since the kNN classifier only considers each point individually, this time series informationis lost. Thus, we turn to Hidden Markov Models.

Although many HMMs use a specific topology, we used a fully connected HMM, as we would like oursystem to learn this topology automatically. Instead, we want to use the HMM to statistically smooth thelabels we have already determined with the kNN classifier. Therefore, we connect all of the states, and usethe training data to determine the probability of each transition (see Figure 3). It should be noted that thistechnique may result in certain transition probabilities to drop to zero, causing the HMM to no longer befully connected.

Once the HMM is specified, it will be used by the Viterbi algorithm to determine the most likely statesequence for a given observation sequence. It does this by using time series information to correct glitcheswhich are statistically unlikely. For example, if there is a single ARCING LEFT label in the midst of a seriesof ARCING RIGHT labels, the Viterbi algorithm will decide that the ARCING LEFT is an observationwitnessed from the ARCING RIGHT state since the low transition probabilities between ARCING LEFTand ARCING RIGHT make it very unlikely that the state changed twice here. The observation sequencegiven to the algorithm is actually the output from the kNN classifier.

4 Methods

To test this classification system, we began with a data set consisting of fifteen minutes of video. Thetracker was used to extract the features, while TeamView was used for hand labeling. There were threehuman labelers, each labeling five minutes (1/3) of the data. The data was then broken into a training set,consisting of the last one third of the data, and a test set, consisting of the first two thirds. The test set wasput aside for accuracy validation after training the system.

First, the training set was prepared for use by the kNN classifier by having 200 points of each labelrandomly extracted and placed in feature space. The remainder of the training set was then labeled, usingthe technique described above. These labels, along with the manually determined “correct” labels, werethen examined to determine the observation and transition tables and the initial state probabilities.

To establish the accuracy of the system, these 1200 points in feature space and HMM parameters wereused to automatically label the test set. In this phase of the experiment, the correct labels are not knownby the system, instead they are used to evaluate its accuracy.

— Page 57 —


Table 1: Fractional breakdown of accuracy, first with the kNN classifier, then with the addition of an HMM. Thefinal column shows the number of occurances of each label in the test set.

Accuracy Accuracy Total OccurancesLabel without HMM with HMM in test set

ARCING LEFT 0.697 0.818 2059ARCING RIGHT 0.483 0.484 2407DEAD TRACK 0.882 0.888 5920LOITERING 0.719 0.848 113285WAGGLE 0.661 0.657 1550STRAIGHT 0.245 0.229 5343Total 0.702 0.815 130564

Table 2: Predicted versus actual label accuracies.

Predicted Label

Actual AL AR D L W S

AL 0.818 0.020 0.005 0.097 0.038 0.022AR 0.088 0.484 0.003 0.363 0.030 0.033D 0.001 0.002 0.888 0.107 2E-4 0.003L 0.010 0.004 0.132 0.848 0.002 0.004W 0.140 0.126 0.000 0.048 0.657 0.030S 0.084 0.076 0.010 0.579 0.021 0.229

5 Results

Every combination of the seven available features was tested by applying the k-nearest neighbor algorithmto a large training set. This resulted in 127 possibilities (zero features was not an option). The combinationof features that resulted in the highest accuracy (defined as the percent of the test points labeled correctly)is h2, v1, and sh1. Therefore we will be considering only these features in the rest of the experiments.

It is interesting to note that accuracies using these three features and combinations of other featuresranged from 58.9% to 73.0%, while the accuracy of using only these three features was 73.1%. This demon-strates that having extra features can reduce accuracy.

Table 1 shows the fractional accuracy of the system for each label type. As indicated, the system achievedan overall accuracy of about 81.5%. Further, the overall accuracy increased by over 10

Table 2 breaks down the system’s performance on the test set. Each cell (x, y) shows the fraction of testset points with the correct label x that were labeled by the system as y. The information in this table willbe useful in helping determine in what ways system fails, in the hopes that it can be improved upon in thefuture.

6 Conclusions

As we hypothesized, the use of an HMM in conjunction with a kNN classifier provides higher accuracy thana kNN classifier alone. The HMM improved overall accuracy by 11.3%, above the 70.2% accuracy of only thekNN. The two labels that correspond to the vast majority of the data (LOITERING and DEAD TRACK)are very similar to one another, both in features and in appearance. Due to this fact, and some ambiguity

— Page 58 —


among the human labelers, misclassifications between them are less important than other misclassifications.If these two labels were combined into one, the accuracy of the system would be approximately 93%.

Another label that caused many problems for the system was STRAIGHT. This label was includedbecause we wanted to make the system as general as possible. However, none of the common bee behaviors(dancing, following, active hive work) seem to rely on this label. Therefore, it would be possible to eliminatethis label. Removing all points labeled STRAIGHT from consideration would increase the accuracy by about2.5%, to 84% (or about 96% after combining LOITERING and DEAD TRACK).

Finally, another circumstance that could have resulted in a lower accuracy is the method we used tohuman label the data. The division of data for human labeling and for the two data sets (training and test)resulted in the majority of the training set to consist of points labeled by one human while the test set wasmade up of points labeled by the other two humans. Since this labeling is a subjective task, it is possiblethat differences in style between the human labelers resulted in the training set being less representative ofthe test set than would be ideal. Unfortunately, due to the importance of time series information to theHMM, the data sets could not have been divided to include points from all human labelers. In the future,we will generate more human labeled data so that we can further study the inconsistencies between differentlabelers, possibly resulting in improved accuracy.

These results show that this system can achieve a high enough accuracy to allow a subsequent system toaccurately determine the behavior of a bee from its sequence of motions. Despite a below perfect accuracy,most of the data points are correctly labeled. A behavior recognizer would be given enough valid informationto be able to identify the correct behavior. Therefore, this system successfully demonstrates its ability toreproduce the labels generated by a human, thus following the goal of removing the need for a human labeler.

Acknowledgements

We would like to thank Zia Khan and Frank Dellaert for the software used to track the bees; and KevinGorham, Stephen Ingram, and Edgard Nascimento for TeamView and for hand-labeling our data. Thisproject was funded by NSF Award IIS-0219850.

References

Brashear, H., T. Starner, P. Luckowicz, and H. Junker. Using multiple sensors for mobile sign languagerecognition. In Proceedings of IEEE International Symposium on Wearable Computing, page In Press,October 2003.

Han, K., and M. Veloso. Automated Robot Behavior Recognition Applied to Robotic Soccer. In Proceed-ings of IJCAI-99 Workshop on Team Behaviors and Plan Recognition., 1999.

Khan, Z., T. Balch, F. Dellaert. Efficient Particle Filter-Based Tracking of Multiple Interacting TargetsUsing an MRF-based Motion Model. To appear in Proceedings of the 2003 IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS-03), 2003.

Mitchell, T. Machine Learning. Boston, Massachusetts: MIT Press & McGraw-Hill, 1997.

Rabiner, L. R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.Proc. IEEE, 77(2): 257-286, 1989.

Seeley, T. The Wisdom of the Hive: The Social Physiology of Honey Bee Colonies. Cambridge, Massa-chusetts: Harvard University Press, 1995.

Westeyn, T., H. Brashear, A. Atrash, and T. Starner. Georgia Tech Gesture Toolkit: Supporting Experi-ments in Gesture Recognition. ICMI-03, November 5-7, 2003, Vancouver, British Columbia, Canada.

— Page 59 —


Towards a Multi-Robot Coordination FormalismChris Jones and Maja J Mataric

Computer Science Dept., University of Southern California, Los Angeles, CA 90089-0781, USAEmail: [email protected], [email protected]

Abstract

Coordination is an essential characteristic of any task-achieving multi-robot system (MRS),whether it is accomplished through an explicit or implicit coordination mechanism. There iscurrently little formal work addressing how various MRS coordination mechanisms are related,how appropriate they are for a given task, what capabilities they require of the robots, and whatlevel of performance they can be expected to provide. Given a MRS composed of homogeneousrobots, we present a method for automated controller construction such that the resulting con-troller makes use of internal state and no explicit inter-robot communication, yet is still capableof correctly executing a given task. Understanding the capabilities and limitations of a MRScomposed of robots not capable of inter-robot communication contributes to the understand-ing of when and why inter-robot communication becomes necessary and when internal statealone is sufficient to achieve the desired coordination. We validate our method in a multi-robotconstruction domain.

Keywords: multi-robot, coordination, formalism, construction

1 Introduction and Motivation

Coordination is an essential characteristic of any task-achieving multi-robot system (MRS). The nature ofthe coordination may take many forms, seemingly limited only by the creativity of the designer. Explicitcoordination mechanisms make heavy use of internal state maintained by individual robots and explicit inter-robot communication, often involving centralized or hierarchical control. Alternative approaches, involvingimplicit coordination, make use of fortuitous structure in the environment and its synergistic relationship tothe task definition and the robots’ sensing, control, and mobility characteristics to produce a different classof coordination techniques, often categorized by terms such as emergent, self-organized, or stigmergic.

From the perspective of the designer, the coordination mechanism employed in a given task domain isoften heavily influenced by personal preference and less so by a formal understanding of why one is moreappropriate than another or how the various coordination mechanisms are related. There is little formalwork addressing the question of rationally choosing the most appropriate MRS coordination mechanism for agiven task domain and performance requirements. Furthermore, there is little work on the more fundamentalquestion of how various classes of coordination are related.

To provide insight into these questions, we are developing a coordination formalism which provides aframework for precisely defining and reasoning about the intertwined entities intrinsically involved in anytask-achieving multi-robot system – the task environment, task definition, and the capabilities of the robotsthemselves. Our approach is novel in that it expresses a principled effort to understand the relationshipbetween explicit coordination mechanisms, such as those primarily relying on the use internal state anddirect communication, and more emergent implicit coordination mechanisms which tend to make use ofenvironment and task structure and more indirect forms of communication.

Our initial investigations have centered on multi-robot systems composed of robots equipped with internalstate but lacking the capability for explicit, direct inter-robot communication. A formal understanding ofthe capabilities and limitations of such a system contributes to the understanding of when and why internalstate alone is sufficient to achieve the desired coordination and when inter-robot communication becomes

— Page 60 —


necessary. Furthermore, we hope this work will help elucidate the currently ongoing discussions regardingthe meaning behind labels such as team robotics vs. swarm robotics, etc.

Toward this end, we present an automated multi-robot controller generation algorithm. The generatedcontroller, when run by all robots in a homogeneous multi-robot system, will correctly execute a given task.We demonstrate our formalism in a multi-robot construction domain where we are able to provide specifictask instances accompanied by formal explanations of the suitability of the use of internal state.

2 Related Work

This section summarizes some of the most relevant related work involving characterization and analysis ofcoordination in multi-robot systems. The work of Parker (1993) discusses the trade-offs of local versus globalinformation for coordination in multi-robot systems. Beckers et al. (1994) present a coordination mechanismin a multi-robot object clustering domain. Mataric (1995) presents work on group coordination in multi-robot systems using a collection of simple basis behaviors. The information invariants work of Donald (1995)addresses the problem of determining the information requirements to perform robot tasks and means inwhich this information may be acquired. Dudek et al. (1996) present a taxonomy which classifies multi-robot systems based on communication and computational capabilities. In a clustering domain, Martinoliet al. (1999) demonstrate how the collective behavior of a group of mobile robots can be accurately studiedusing a simple probabilistic model. They show how the results of the model are descriptive of the resultsobtained through experiments with real robots and in sensor-based simulations. Balch (2002) presentshierarchical social entropy, an information theoretic method of analysis used to determine the extent ofdiversity among robots in multi-robot systems. Goldberg et al. (2002) precisely define the foraging task formulti-robot systems and provide a collection of general distributed behavior-based coordination algorithmsand their empirical evaluation. Gerkey et al. (2003) present a formalism for the analysis of task allocation inmulti-robot systems with an emphasis on explicit coordination mechanisms. Lerman et al. (2003) present amathematical model of the dynamics of collective behavior in a multi-robot adaptive task allocation domain.

The domain in which we validate our approach is multi-robot construction. Related work in this areaincludes the work of Bonabeau et al. (1994) which uses a rule-based model in the construction of biologically-plausible nest structures similar to those of some wasp species. Bonabeau et al. (1999) investigate the useof genetic algorithms to generate such rules used in the construction of biologically-plausible structures andexplores the relationship between the space of rules and resulting structures. In the area of constructionby physical robots, Melhuish et al. (1999) demonstrate how a group of minimalist robots can constructdefensive walls using biologically-inspired templates. Wawerla et al. (2002) present work on the comparisonof different coordination strategies in the construction of simple 2D structures using a group of mobile robots.Jones et al. (2003) present a method by which to automatically generate controllers for rule-based agentsusing local sensing and control in an intelligent self-assembly domain.

3 Definitions and Notation

In this section we formally define the intertwined entities intrinsically involved in any task-achieving multi-robot system – the task environment, task definition, and the capabilities of the robots themselves, includingcontrol, sensing, and maintenance of internal state.

3.1 Task Environment and Definition

The task environment is the world in which the multi-robot system is expected to perform a defined task.The environment state, s, at any given time is an element of the finite set S of all possible states. Anaction, a, performed in the environment by a robot is drawn from the finite set A of all possible actions. An

— Page 61 —


environment is defined by a state transition function sj = F (si, a), which states that when action a ∈ A isexecuted in state si ∈ S, the next state will be sj ∈ S. In this work, we assume the state can transition onlyas the result of an action performed by a robot.

We define a task, T, assumed to be Markovian, as a set of n ordered environment states, Ts = {s1, ..., sn}which must be progressed through in sequence. From here on, the use of the word state refers to task state.An action a is called a task action for state si, denoted by At(si) ∈ A, if si+1 = F (si, a). A task T is saidto be executed correctly if and only if for each task state si ∈ Ts any executed action a falls into one of thetwo following categories: a = At(si) or si = F (si, a). This means that all performed actions are either taskactions or are actions which do not result in a task state transition.

3.2 Observations

An observation made by an individual robot consists of accessible information external to the robot andformally represents a subset of the task state. The content and properties of an observation are dependenton the specific sensing properties of the robot. The finite set of all possible observations is denoted as X.Since a given observation may occur in multiple states, for notational convenience we use xs to mean theobservation x as made in state s. An observation x with no sub-script, unless otherwise noted, refers to theobservation x as made in any state.

In state s, the function G(s) ⊆ X returns the set of all observations which can be made in s. Anobservation x is called unique if and only if there exists only one state s for which x ∈ G(s). The observationat the physical location where the task action of state s is to be executed is denoted by Y (s) ∈ G(s).

3.3 Robot Characterization

A robot’s internal state is denoted by m. The finite set of all possible internal state values is denoted by M.A robot’s observation, x, at any given time is an element of X. Two functions define a robot’s action in theenvironment, known collectively as the robot’s controller. The deterministic action function a = B(x,m)specifies the robot’s action, a∈A, given its current observation is x and its internal state is m. The internalstate transition function m′ = L(x, m, a) is a deterministic function specifying the robot’s next internal statevalue given its current observation x, its current internal state m, and the action it is executing a.

4 Building a Satisficing Controller

We now describe a method by which a controller using internal state can be automatically constructed suchthat a given task is correctly executed. We call such a controller satisficing. Properly executing a task in amulti-robot system has additional challenges from doing so in a single robot system. It can never be assumedthat a particular robot will or will not make a certain observation, as it could be the case that a robot iscompletely unaware of the progress of the task that is resulting from the actions of other robots. Formally,from the perspective of an individual robot, the task environment is highly non-stationary.

A satisficing controller must satisfy two conditions. First, for all si ∈ Ts, the action function must specifya rule of the form At(si) = B(Y (si), m) where m ∈M . Second, if there exists an observation x ∈ G(si) suchthat x = Y (sj) and i < j, the internal state value must be transitioned as the result of some observationwhich is guaranteed to be made after all observations of xsi and prior to the final observation of Y (sj). Theprocedure in Figure 1 presents an algorithm which constructs a satisficing controller based on the satisfactionof these two conditions.

If there exists a state sk ∈ Ts, k > j for which x ∈ G(sk) we note that since the environment is non-stationary from the perspective of the individual robot, internal state alone is not sufficient to distinguishxsj from xsk and is therefore not sufficient to guarantee correct task execution in cases where xsj = Y (sj).Assuming internal state is sufficient, the worst case in terms of necessary internal state values is ‖Ts‖ − 1.

— Page 62 —


(1) procedure Build Controller()(2) m = 0; B ← {}; L← {}; LastObsState = 0(3) for i = 1 to ‖Ts‖ do(4) if ∃sjs.t.LastObsState < j < i and Y (si) ∈ G(sj) then(5) m′ = m + 1(6) if ∃x ∈ (G(si)− Y (si)) s.t. �s ≥ LastObsState : x ∈ G(s) then(7) LastObsState = i(8) L ← L

⋃{m′ = L(x,m,·)}(9) else(10) LastObsState = i - 1(11) L ← L

⋃{m′ = L(Y (si−1),m,At(si−1))}(12) B ← B - {At(si−1) = B(Y(si−1),·)}(13) B ← B

⋃{At(si−1) = B(Y(si−1),m)}(14) endif(15) m = m′

(16) endif(17) B ← B

⋃ {At(si) = B(Y(si),m)}(18) endfor(19) end procedure Build Controller

Figure 1: Procedure for building a satisficing controller.

The best case is that no internal state is necessary, which occurs if for all si ∈ Ts the observation Y (si) isunique.

5 Validation: Coordination in Multi-Robot Construction

We experimentally demonstrate and validate our approach to the design of a satisficing controller in a multi-robot construction task. The construction task requires the placement of a series of square colored bricks,0.5 meters on a side, into a desired 2D planar structure in a specified sequence. For all examples used inthis section, a brick’s color is denoted by the letters R, G, B, and Y which stand for Red, Green, Blue, andYellow, respectively. The construction task starts with a seed structure, which is a small number of initiallyplaced bricks forming a core structure.

Experimental demonstration was performed using Player (Gerkey et al. 2001) and the Stage (Vaughan2000) simulation environment. Our construction task is conducted in a circular arena of approximately 315square meters using 10 robots. The robots are realistic models of the ActivMedia Pioneer 2DX mobile robot.Each robot, approximately 30 cm in diameter, is equipped with a differential drive, a forward-facing 180degree scanning laser rangefinder, and a forward-looking color camera with a 60-degree field-of-view and acolor blob detection system. The bricks are taller than the robot’s sensors, so the robots can only sense thebricks on the periphery of the structure.

5.1 Definition of the Construction Task

We define the environment state for the construction task as being a specific spatial configuration of bricks;therefore, a construction task is defined as a desired sequence of brick configurations – a specific constructionsequence. The actions we are interested in are the placement of individual bricks to the growing structure;we do not consider construction tasks in which robots may remove bricks from the structure nor those in

— Page 63 —


Table 1: A sequence of environment states that define a construction task. The structure images are taken froma birds-eye-view. Each brick is labeled with its color: R=Red, B=Blue, G=Green, Y=Yellow.

BR

G

S0

BR B

G

S1

BR B

G

GS2

BR B

G Y

GS3

BR B

G YR

GS4

BR B

G YR

YGS5

BR B

G Y BR

YGS6

BR B

G Y BR

YGR S7

Table 2: All observations in G(s2) from Table 1. The observation Y (s2) is marked by a “*”.

Observations in G(s2)<FLUSH G B>*<FLUSH B G><CORNER G B><FLUSH R G><FLUSH B R>

which sub-structures consisting of multiple bricks may be connected together. Other actions performedby the robots, such as moving through the environment, do not affect task state. Table 1 shows a set ofenvironment states defining the example construction task that we will use throughout this section.

5.2 Observations in the Construction Task

Since the content of an observation is dependent on a robot’s sensing capabilities, an observation in theconstruction domain is the spatial configuration and color of bricks in the field-of-view of the robot’s laserrangefinder and color camera.

There are two general categories of observations that can be made. The first is two adjacent, alignedbricks. Such an observation would be made, for example, if in state s1 in Table 1, a robot were positionedabove and oriented toward the surface of the structure made up by the Red and Blue bricks. Such anobservation is denoted as <FLUSH R B>. The second is two bricks forming a corner. Such an observationwould be made if in state s0 in Table 1 a robot were positioned in the upper right-hand corner and orientedtoward the corner formed by the Red and Green bricks. Such an observation is denoted as <CORNER R B>.The observations <FLUSH R B> and <FLUSH B R> constitute two different observations in which the spatialrelationship between the Red and Blue bricks are switched. A similar point holds for the observations<CORNER R B> and <CORNER B R>. Given the task state s2 from Table 1, Table 2 lists all observations in theset G(s2) with the observation Y (s2) highlighted.

5.3 Brick Placement Actions

The only actions in this construction domain that can transition the task state are brick placement actions.There are three such actions, with the first being the placement of a brick on the right side (from the

— Page 64 —


Table 3: Rules constituting a satisficing controller for the construction task in Table 1.

Observation m − > m′ Action<CORNER R G> 00 − > 00 <B CORNER R G><FLUSH R B> 00 − > 00 <G RIGHT FLUSH R B><CORNER G B> 00 − > 01 <Y CORNER G B><FLUSH B G> 01 − > 01 <Y LEFT FLUSH B G><CORNER B Y> 01 − > 01 <R CORNER B Y><FLUSH G Y> 01 − > 10 No Action<FLUSH R Y> 10 − > 10 <B LEFT FLUSH R Y><CORNER R B> 10 − > 11 No Action<FLUSH G Y> 11 − > 11 <R RIGHT FLUSH G Y>

perspective of the acting robot) of a pair of adjacent, aligned bricks. An example action of this type can beseen in the placement of the Green brick which transitions the state in the task in Table 1 from s1 to s2.Such an action is denoted as <G RIGHT FLUSH R B>. The second action type is similar to the first exceptthe brick is placed on the left side of a pair of adjacent, aligned bricks. An example of this action type canbe found in Figure 1 in the placement of the Yellow brick which transitions the state from s2 to s3. Such anaction is denoted as <Y RIGHT FLUSH B G>. The third action type is the placement of a brick in the cornerformed by two other bricks. An example of this action type can be seen found in Figure 1 in the placementof the Blue brick which transitions the state from s0 to s1. Such an action is denoted as <B CORNER R G>.

5.4 Satisficing Robot Controller

We now describe the satisficing controller for the construction task shown in Table 1. The robot makes anobservation, and if the current internal state value, observation, and action match one of the rules from theinternal state transition function as shown in Table 3, the internal state value is transitioned to the valuedesignated by the matched rule. Next, if the current internal state value and observation matches a rule fromthe action function shown in Table 3, the robot visual servos toward the location where the brick placementaction is to be performed, as dictated by the matched rule. Once the robot is within range to perform theaction, the brick of appropriate color is placed on the structure. If no rule in the function is matched, therobot performs a random walk, makes another observation, and the process repeats.

As can be seen, this satisficing robot controller for the construction task in Table 1 requires 4 uniqueinternal state values. Our method does not guarantee to generate a satisficing controller using a minimalnumber of internal state values; however, for this particular construction task, 4 values is the minimal numberrequired to correctly execute the task.

6 Conclusions and Future Work

We have presented a method for automated multi-robot controller generation for correct task execution.The individual robots in the system execute controllers using internal state but do not have the capabilityfor direct, explicit inter-robot communication. Given these individual robot capabilities, we have showncharacteristics the task must exhibit such that these capabilities are sufficient for correct task execution.Understanding the capabilities and limitations of a multi-robot system composed of robots equipped withinternal state but lacking the capability for explicit, direct inter-robot communication contributes insightinto the larger question of understanding the necessary characteristics of a coordination mechanism in a

— Page 65 —


multi-robot system required to correctly execute a given task. Furthermore, such understanding can aidthe designer in making modifications to the task definition, task environment, or the robot capabilities inorder to transform a situation in which internal state alone is not sufficient to one in which it is sufficient toachieve correct task execution.

Our future work includes the development of an algorithm for constructing satisficing controllers using aminimal number of unique internal state values. Applying the formal framework presented in this paper, weare also investigating the use of explicit inter-robot communication in multi-robot coordination. Specifically,we are studying in what circumstances such communication may replace or beneficially augment the use ofinternal state and when it may be required in order to correctly execute a MRS task.

Acknowledgments

This work is supported in part by Defense Advanced Research Projects Agency (DARPA) Grant F30602-00-2-0573 and in part by National Science Foundation Grant EIA-0121141.

References

Balch, T. 2002. Measuring robot group diversity. Pages 93–135 in: Robot Teams: From Diversity toPolymorphism (T. Balch and L. E. Parker, eds.), AK Peters.

Beckers, R., Holland, O., and Deneubourg, J. 1994. From local actions to global tasks: Stigmergy andcollective robotics. Proceedings of the Fourth International Workshop on the Synthesis and Simulationof Living Systems. Pages 181-189.

Bonabeau, E., Dorigo, M., and Theraulaz, G. 1999. Swarm Intelligence: From Natural to Artificial Systems.Oxford University Press.

Bonabeau, E., Theraulaz, G., Arpin, E., and Sardet, E. 1994. The building behavior of lattice swarms.Artificial Life IV (R. Brooks and P. Maes, eds.). Pages 307–397.

Donald, B. 1995. Information invariants in robotics. Artificial Intelligence 72:1–2. Pages 217–304.

Dudek, G., Jenkin, M., Milios, E., and Wilkes, D. 1996. Autonomous Robots 3:375–397.

Gerkey, B. and Mataric, M. 2003. Multi-robot task allocation: Analyzing the complexity and optimalityof key architectures. IEEE International Conference on Robotics and Automation. Taipei, Taiwan.Pages 3862-3867.

Gerkey, B., Vaughan, R., Stoey, K., Howard, A., Sukhatme, G., and Mataric, M. 2001. Most valuable player:A robot device server for distributed control. IEEE/RSJ International Conference on IntelligentRobots and Systems. Pages 1226–1231.

Goldberg, D. and Mataric, M. 2002. Design and evaluation of robust behavior-based controllers. Pages315–344 in: Robot Teams: From Diversity to Polymorphism (T. Balch and L. E. Parker, eds.), AKPeters.

Jones, C. and Mataric, M. 2003. From local to global behavior in intelligent self-assembly. IEEE Interna-tional Conference on Robotics and Automation. Taipei, Taiwan. Pages 721-726.

Lerman, K. and Galstyan, A. 2003. Macroscopic analysis of adaptive task allocation in robots. To appearin IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas, Nevada.

Martinoli, A., Ijspeert, A., and Mondada, F. 1999. Understanding collective aggregation mechanisms: Fromprobabilistic modeling to experiments with real robots. Robotics and Autonomous Systems 29:51–63.

Mataric, M. 1995. Designing and understanding adaptive group behavior. Adaptive Behavior 4:51–80.

— Page 66 —


Melhuish, C., Welsby, J., and Edwards, C. 1999. Using templates for defensive wall building with au-tonomous mobile ant-like robots. Towards Intelligent Mobile Robots. Bristol, UK.

Parker, L. E. 1993. Designing control laws for cooperative agent teams. IEEE International Conferenceon Robotics and Automation. Pages 582–587.

Vaughan, R. 2000. Stage: A multiple robot simulator. Institute for Robotics and Intelligent SystemsTechnical Report IRIS-00-393. University of Southern California.

Wawerla, J., Sukhatme, G. S., and Mataric, M. 2002. Collective construction with multiple robots.IEEE/RSJ International Conference on Intelligent Robots and Systems. Pages 2696–2701.

— Page 67 —


Of Rats and Robots: a New BioroboticsStudy of Norway Rat Pups

Sanjay S. Joshi1, Jeffrey C. Schank2

1. Department of Mechanical and Aeronautical Engineering, University of California, Davis, CA 95616,USA. Corresponding author : [email protected]

2. Department of Psychology, University of California, Davis, CA 95616, USA.

Abstract

We describe a new biorobotics research programme conducted at the University of California,Davis using Norway rat pups. The study is meant to inform both animal behavior and au-tonomous robotics. The animals we study are Norway rat pups ages 7 to 10 days. Mammalsappear to be harder to study than insects, but they may not be that much harder when studiedat a young age and viewed from a developmental perspective. For animal behavior, we will userobotics to overcome deficiencies in analytic and computational models used in the past. Forrobotics, studying infant mammals will enable the study of development of sensorimotor rulesfrom the simple to the complex. We aim to use both quantitative probabilistic models and ob-servation techniques for constructing robotic pup control rules that emulate real pup behavior.One aim of our research is to use robotic systems to help validate sensorimotor rules—both indi-vidually and in a group context. Other basic aims of the study are to use robotics to determineif these rules generalize to different environments, and test the validity and robustness of theassumptions made in initial computational models. Methodological goals of this project are todevelop infrastructure and common scientific methods to conduct a parallel biology/engineeringstudy of small mammals. We focus on our progress toward the methodological goals and reportour initial results.

Keywords: Biorobiotics, Norway Rat, Control Rules

1 Introduction

The influence of biological systems on robots has been far-reaching (Webb and Consi, 2001). In particular,recent robotics research has focused on simulating specific aspects of animals such as their biomechanics,sensory systems, or computational abilities. Two schools of thought have motivated this research. One mo-tivation is that biological systems have been optimized over generations by evolution. Thus, by simulatingbiological systems, better robotic designs can be found than could be found using conventional engineeringapproaches. A problem is that natural selection does not necessarily lead to optimal design, since the fittestorganism at any given time is relative to the current population (Gould and Lewontin, 1979). Also, recent de-velopment in evolutionary theory has revealed that a variety of constraints, particularly developmental, limitoptimality. Nevertheless, although natural selection processes may not produce optimally designed complexadaptive systems, it may be the best process for producing complex adaptive systems for complex environ-ments and problems. Thus, studying organisms from both evolutionary and developmental perspectives maybe an essential source of design information for robotics. The other motivation is that by simulating biologi-cal systems in electro-mechano-computational machines with roughly the same sensorimotor-computationalcapabilities, a better understanding of biological systems themselves will be achieved. The crucial point isthat by simulating animal behavior by instantiating actual sensorimotor rules, the specifics for how this be-havior was achieved represents a new neurobehavioral/neuroethological theory on how the biological systemitself achieves its behavior (see e.g. Webb, 1995). Robotic models allow us to investigate situated behavior

— Page 68 —


Figure 1: Huddle of Norway rat pups (left) and first robopup (right).

and cognition, which will likely fundamentally change how cognitive science views organisms. In particular,cognition is not all in the “head”—it is socially computed, situation dependent, and often the result ofheuristic computational processes that interact with natural environments.

Insects are a natural starting point for biorobotics because rules characterizing their sensorimotor systemsappear to be easier to discover and implement than in vertebrates such as mammals. However, studyingmammals with the aid of robotics would afford the opportunity to study the gradual development andchange of sensorimotor rules over time. Mammals appear to be harder to study than insects, but theymay not be that much harder when studied at a young age and viewed from a developmental perspective.Typically, mammalian development is characterized as either altricial (i.e., born less developed and not wellcoordinated sensorimotor systems) or precocial (i.e., born with well developed and coordinated sensorimotorsystems). Altricial mammals are relatively “simple” from a sensorimotor perspective since they are typicallyblind and deaf (e.g., rats, mice, dogs, cats) with limited motor and sensorimotor integration. At a descriptivelevel, rules characterizing their sensorimotor behavior may be as simple or simpler than those required tocharacterize insects.

This creates the opportunity to study group and individual behavior of an organism from a developmentalperspective. We can start by investigating the behavior of relatively simple mammals (i.e., infant rats) withrelatively simple sensorimotor rules characterizing their behavior. As these young mammals develop, theserules change (Alberts and Brunjes, 1978; Schank and Alberts, 2000), becoming more complex and eventuallyusing new sensorimotor modalities such as hearing and vision. Our long term goal is to take advantage of thedevelopment of sensorimotor behavior in altricial mammals to investigate the development of sensorimotorbehavior from the simple to the complex. This will be informative not only to fields investigating animalbehavior but also to the development of autonomous robots that can function in complex environments andsolve complex problems. In this paper, we will outline the various experimental components of our researchprogram and how they tie together to achieve this goal.

2 Rat Pups and Robopups

Norway rats (Rattus norvegicus) aggregate throughout life starting at birth (Alberts, 1978a, 1978b, Calhoun,1962; Barnett, 1963). As infants, they aggregate in huddles (as depicted in Fig. 1) in which they can conserveenergy and behaviorally thermoregulate (Alberts, 1978a, 1978b). Aggregation is therefore an importantgroup-behavioral function in Norway rats and many other species that produce multiple offspring. They are

— Page 69 —


able to aggregate despite being blind and deaf at birth and for the first two weeks of life postnatally (Alberts,1978a,b,c; Alberts and Brunjes, 1978). However, because they are extremely limited in their sensorimotorcapabilities during the first two weeks postnatally, we can begin with relatively simple computational androbotic models of their behavior.

The first robopup we designed and implemented is shown in Fig. 1. The mechanical design of our robotwas influenced by our need to emulate the relevant physical characteristics of the rat pup. These character-istics included (scaled) rat size, (scaled) rat shape, rat sensor locations, and rat locomotion. Observations ofpups showed that they primarily used their two back legs for locomotion. As a result, we constructed a robotwith rear-driven wheels with differential drive on a single chassis. The front of the robot was supported byan omni-directional ball-bearing mount.

The shape of the robot was important because it would dictate how the robot interacted with cornersand walls of the arena. Rat pups are long with somewhat pointed heads culminating at the nose (see Fig.1). The rat pups use their nose to burrow into the crevasses and corners of the arena. In order to emulatethese shape features, we created a long metal body skirt with a tapered nose.

Sensors were mounted along the edge of the skirt. Our controller and mechanical design can accommodateseveral different types of sensors simultaneously. Our initial studies were meant to explore a single sensorymodality of tactility. As a result, micro switch bump sensors were mounted at specific locations around theskirt. Because the majority of rat pup tactile interactions with the arena occur at the rat head (Schank andAlberts, 1997, 2000a,b), we mounted a number of sensors near the front of the robot. A few other sensors weremounted around the skirt to account for the rat body touching the walls of the arena. We used a Motorola-6811 based Handyboard microcontroller board (Martin, 2000). Programming of the microcontroller board isdone through a special variant of C called Interactive C. Using the embedded controller, we could investigateseveral heuristic rules and several sensory modes (see Sec. 4). We are currently designing a second versionof our robopup that will use a Java-based STAMP microcontroller.

3 Observing Rat Pups and Robopups

A fundamental problem encountered in applying precise quantitative modeling tools for animal behaviorstudies concerns how to collect and process sufficient amounts of precise data about animal motion. Theequipment and methods must be non-invasive. Research with rat pups aggregating in an arena (Schank andAlberts, 1997, 2000a,b) has shown that they are very sensitive to specific global cues. For example, theyare very sensitive to even slight elevations on the floor of an arena (1◦ or less), exhibiting a strong positivegeotaxis and quickly aggregating at the bottom of very shallow slopes (Schank and Alberts, 2000b). Theyare also very sensitive to weak thermogradients both on a surface and in ambient temperature ((Schank andAlberts, 2000b), unpublished observations).

Therefore, we have built a controlled environment that is far more precise in its ability to control factorsaffecting behavior than has ever been used in behavioral research with small infant mammals. This chamberis shown in Fig. 2a. Two types of surfaces can be installed, both made of 1

4inch copper bars. The first

is a water bath (under the work bench) allowing the surface temperature to be held constant and uniformacross the surface. The second is a copper bar on which various kinds of thermogradients can be preciselyproduced. Because both the arena and the chamber itself are constructed of 1

4inch Plexiglas, illumination

gradients can be created across the surface. Rat pups are observed with a video camera that is mountedabove the chamber. We also created an arena for the robopup that modeled a scaled-up version of theanimal arena (see Fig. 2b). Above the arena, we mounted a digital video recorder that records the robotexperiments, just as in the rat pup studies. For a typical rat pup and robot experiment, pups and robotsare observed through the video cameras for 15 minutes.

To extract precise data about the motion of our rat pups from video recordings, we developed animage analysis and data collection program using NIH Image (Schank and Alberts, 2000c). This programautomatically stores a stack of static images from a video stream at a specified interval (e.g., every 5 seconds).

— Page 70 —


The stack of images is then processed by researchers, who identify the head and tail location of each rat inthe image (Fig. 3a). Our goal was to collect the same data for our robot as we did for the rat pups. We didthis by mounting different color LEDs at specific locations on the robot (head, tail, etc.) and on the arena.Then, we developed special-purpose image processing code that could analyze a robot video stream, andstore critical robot location data to a file. To help the image processing code, we run our robot experimentsin the dark in order to highlight the LEDs. Even in the dark, there are a number of challenges in creatingimage-processing code for such an application including blurred spots, background noise, and arena-robotcalibration. Fig. 3b shows an overhead camera image of the type analyzed by our software. In actualexperiments, the room is completely dark.

Figure 2: Comparison of laboratory setups. The temperature controlled arena used to run rat-pup experiments(a). The robot arena (appropriately scaled, b). The robopup is shown within the arena on the lab floor. A digitalvideo camera is mounted above the arena. Signals from the camera are fed into analysis computers shown at theforeground and back of the picture.

Figure 3: An illustration of our tracking macro developed for the animal behavior software package “NIH Image”(a). With this macro, the nose and base of tail are recorded for each animal (a). When a robot experiment isrun in the dark, the overhead video camera records a black background with two spots that are the head and tail(front and back) of the robopup (b). The colors of the spots allow the computer program to determine positionson the robot. This photo was taken in semi-darkness to illustrate the robot and LEDs.

— Page 71 —


4 Initial Heuristic Rules on Pup Behavior

The behavior of animals is typically characterized in terms of how they orient and react to patterns andconfigurations of stimuli in their environment (Frankel and Gunn, 1940). This behavior has been describedusing different types of orienting responses (i.e., a taxis) and specific sensorimotor reactions (i.e., a kinesis). Afundamental problem in the study of behavior concerns the explanation of taxes and kineses. Our hypothesisis that pup behavior can be characterized by simple sensorimotor rules that dictate how the animal andenvironment interact. These rules change as sensorimotor capabilities change with development.

In order to study the emergent behavior of rat pups using simple rules, we program our robopup withrules and observe overall robot behavior. Because we do not want to impose overall behavior on the robot,we implement a reactive-style control architecture (Arkin, 1998) through the use of parallel low-level rules.Rules are a key subject of our biorobotics study. Ultimately, we aim to base rules on animal computationalmodels (e.g., Schank, 2001). However, rules based on observation allow us to gain a baseline understandingof how different low-level rules lead to different overall behavior. The following illustrate typical rules basedon animal observation:

Rule 1:

IF (front sensor) makes contact with a wall,

THEN back up and turn in random direction (left or right).

Rule 2:

IF (right-front sensor) hits a wall,

THEN turn left.

Rule 3:

IF (left-front sensor) hits a wall,

THEN turn right.

Rule 4:

IF (right sensor) or (right-rear sensor) hits a wall,

THEN arc to the left.

Rule 5:

IF (left sensor) or (left-rear sensor) hits a wall,

THEN arc to the right.

Rule 6:

IF (no sensors) have hit anything in (lost) seconds,

THEN start spinning in place and then move forward in random direction.

Within the control code, only a few free parameters can be varied. For example, lost defines the time inseconds since the last wall contact before entering a spinning mode. In addition, other variables define theangular speed of turns (through wheel motor powers).

5 Comparison of Rat Pups and Robopup

The end result of both the animal and robot experiments are data files that contain locations of the headand tail over time. At this point, we may manipulate the data in a variety of ways to compare the data orcreate experiment-based computational models. At this stage in the analysis, the raw data can be handledin exactly the same way- regardless of whether it was obtained from a robot or a rat.

To analyze the behavior in each experiment, we have written Matlab scripts. Currently, these scriptsautomatically categorize gross behaviors of the robots or rats. For example, one behavior is “Wall-following”and another behavior is “Corner-snooping.” Behaviors are specified by the corner or wall where the behaviortook place, the amount of time since the last behavior, and the duration of the behavior.

— Page 72 —


Figure 4: A plot of the nose trajectory for a robopup experiment.

The data collected can be analyzed in a variety of ways, and used to create several different plots thatcompare separate experiments. For example, statistical information from the different “Wall-following” and“Corner-snooping” behaviors can be plotted. Fig. 4 shows a typical plot of a robot head trajectory during asingle experiment. The various plots can be used to visually compare different experiments. In addition, theycan be used to “back out” possible low-level rule changes or parameter variations that could alter aspects ofthe overall robot behavior. The raw data can also be used to build experiment-based computational modelsof the rat pups and robots (Webb, 2001). The best metrics to compare precise behavioral data is still anopen issue, and is a subject of our study.

Empirical observations using the rules described in Sec. 4 show robot and rat pup behavior that issurprisingly similar. For example, both follow walls often and sometimes go around and around the arena.Both get stuck in corners often. When they do get stuck in corners, both try to maneuver their way out ofthe corners over and over again. Sometimes, they are stuck in the same corner for the entire experiment.Without a wall to follow, they sometimes get into a mode where they veer off in random directions acrossthe arena and then “bounce” off walls into another traverse across the arena.


We have described a new biorobotics research programme being conducted at the University of California,Davis using Norway rat pups. The first goal was to develop infrastructure and common scientific methodsto conduct a parallel biology/engineering study of small mammals. Thus far, we have developed experimen-tal equipment and common methods for observing and analyzing both rat pup and robot behavior. Ourhypothesis is that pup behavior can be characterized by simple sensorimotor rules that dictate how theanimal and environment interact. We have instantiated a set of rules on our first robopup based on animalobservation. The animals and robots behave in surprisingly similar ways. Our next goals are to formalizemetrics of comparison between animal and robot data and to formulate analytically-based rules.

— Page 73 —


Acknowledgments

We thank the National Science Foundation for support of this project under Grant Number ITR-0218927.We would also like to thank all the students who have contributed to this effort (in alphabetical order):Randall Bish, Sobranie Frank, Nick Giannini, and Lisa Hargreaves.

References

Alberts, J. R. (1978a) Huddling by rat pups: Group behavioral mechanisms of temperature regulation andenergy conservation. Journal of Comparative and Psysiological Psychology 92:231-240.

Alberts, J.R. (1978b) Huddling by rat pups: Multisensory control of contact behavior. Journal of Com-parative and Physiological Psychology 92: 220–230.

Alberts, J.R. (1978c) Sensory perceptual development in the Norway rat: A view toward comparativestudies. Comparative Perspectives on Memory Development, R. Kail & N. Spear (eds.). New York:Plenum. pp. 65-101.

Alberts, J.R. & Brunjes, P.C. (1978) Ontogeny of thermal and olfactory determinants of huddling in therat. Journal of Comparative and Physiological Psychology 92:897–906.

Arkin, R. (1998) Behavior Based Robotics, The MIT Press: Cambridge, MA., USA.

Barnett, S. A. (1963) A Study of Behavior. London: Mehuen.

Calhoun, J. G. (1962) The Ecology and Sociology of the Norway Rat. U.S. Bethesda: Department of Health,Education and Welfare Bethesda.

Frankel & Gunn (1940) The Orientation of Animals: Kineses, Taxes and Compass Reactions. ClarendonPress: Oxford, UK.

Gould, S. J. and Lewontin, R. C. (1979). The spandrels of San Marco and the Panglossian paradigm: acritique of the adaptationist programme. Proc. R. Soc. Lond. B 205: 581–598.

Martin, F.G. (2000) The Handyboard Technical Reference, MIT, USA.

Schank, J. C. & Alberts, J. R. (1997) Self-organized huddles of rat pups modeled by simple rule of individualbehavior. Journal of Theoretical Biology 189: 11–25.

Schank, J. C. & Alberts, J. R. (2000a). A general approach for calculating the likelihood of dyadicinteractions: Applications to sex preferences in rat pups and agonistic interactions in adults. AnimalLearning & Behavior, 28, 354-359.

Schank, J. C. & Alberts, J. R. (2000b) The developmental emergence of coupled activity as cooperativeaggregation in rat pups. Proceedings of the Royal Society of London 267: 2307–2315.

Schank, J. C. & J. R. Alberts (2000c). A general approach for calculating the likelihood of dyadic in-teractions: applications to sex preferences in rat pups and agonistic interactions in adults. AnimalLearning & Behavior 28: 354–359.

Schank, J. C. (2001). Beyond reductionism: refocusing on the individual with individual-based modeling.Complexity 6: 33–40.

Webb, B. (1995) Using robots to model animals: a cricket test. Robotics and Autonomous Systems Vol.16.

Webb, B. and Consi, T.R. (Eds.) (2001), Biorobotics: Methods and Applications. AAAI Press/MIT Press.

— Page 74 —


Studying Task Allocation Mechanisms of SocialInsects For Engineering Multi-Agent Systems

Franziska Klugl1, Cornelia Triebig1, Anna Dornhaus2

1. Lehrstuhl fur Kunstliche Intelligenz und Angewandte Informatik, Institut fur Informatik, UniversitatWurzburg, Am Hubland, 97074 WurzburgCorresponding author : [email protected]

2. School of Biological Sciences, University of Bristol, Woodland Road, Bristol BS8 1UG.

Abstract

In this contribution we want to transfer biological mechanisms for organization of work tosoftware agents. Based on abstraction and generalization of task allocation models of socialinsects, the functionality and efficiency of different adapted implicit coordination mechanismsare compared in a generic task allocation model.

Keywords: Task allocation, Threshold models, Individual-based simulation, Multi-Agent Systems

1 Introduction

The main question addressed by Distributed Artificial Intelligence (DAI) concerns the construction of multi-agent systems, which are as a group able to solve given problems. The resulting system should satisfyrequirements concerning efficiency, solution quality, robustness or flexibility. However, such requirementsare hardly fulfil-able with conventional coordination mechanisms. This makes the use of biological paradigmsvery attractive. Therefore it also seems to be promising to transfer organizational concepts found in socialinsects to multi agent systems. Although organization of work in social insects is a field of intensive research,final understanding why and under what circumstances collective features emerge, is still lacking. However,this is necessary as a profound basis for reverse engineering and to be able to transfer these mechanismsto multi agent system application.Therefore the central point of our contribution is propelling the transferof biological self-organization “solutions” by widening the basic understanding from a technological pointof view. Nevertheless, our results concerning the efficiency of different methods of task allocation haveimportant implications for biology as well. This is the first study to compare the performance of differentmethods of task allocation in a spatially explicit simulation. The results allow biologists to draw conclusionsabout why different such methods exist, and which can be expected to evolve under specific conditions, andwhat constraints might influence the evolution of division of labor in social insects.

We proceed as follows. After justifying our statement that existing mechanisms are missing some ofthe central requirements given above, we explain some relevant biological concepts found in social insectorganization of labor. The transfer step is described in section 3. After that a generic model of taskallocation is introduced in section 4. It is used for experiments on evaluating the different forms of taskallocation in respect of criteria for engineering described in section 5. The paper ends with a final discussionand some remarks about future work.

2 State of the Art

2.1 Task Allocation in Multi Agent Systems

Existing techniques for task allocation in the multi agent system area can be separated into more problem- ororganization-oriented (top-down) and agent-oriented (bottom up) methods. The focus of the first is directed

— Page 75 —


towards the decomposition of the problem for setting up a team of agents. Conflicts are resolved mostly byorganizational solutions. Role concepts and assignments also belong to this form of organizational task (orresource) allocation. Interaction is designed structuring the necessary communication during task allocationand execution.

In so called open multi-agent systems agents have to be coordinated that are selfishly pursuing theirindividual goals. Here, a coordination mechanism has to be as efficient as possible, e.g. in terms of commu-nication costs, should guarantee a good solution by fleecing none of the agents and resulting in the desiredcoherent global behavior. Designing such systems is sophisticated as there is nothing like an organization-oriented development method from a purely agent-oriented perspective.

An important bottom-up approach for task allocation is the Contract Net Protocol (Smith, 1980).Following the example of human business processes, an designated manager agent advertises a task andother agents – the contractors – bid for this task. Alternative approaches are based on task auctions ormarket systems (Wellmann, 1995). However, the transfer of multi-dimensional evaluation function to aone-dimensional prize is not always trivial. Additionally the centralized components of auctioneers or themarket platform itself may form bottlenecks.

2.2 Insect Task Allocation and Swarm Intelligence

On the other hand biologically inspired mechanisms for multi-agent systems are more and more used. There-fore it is worth looking onto insectoid task allocation in more detail. There are several hypotheses and modelsabout how work is organized in social insects (Beshers & Fewell, 2001; Gordon, 1996). All have in commonthat there is no central plan or decision making entity is existing.

Division of labor is achieved when each member of the colony specializes to a certain degree on a set ofactivities, like foraging or defending, etc. individuals performing a certain activity may be characterized bybelonging to a specific morphological type, being in a certain age group or having a particular physiologicalstatus (Holldobler & Wilson, 1990) There are basically two extreme categories of specialization (ignoringintermediate forms): temporal and morphological polyethism. In the latter, the behavioral repertoire of anindividual worker is restricted. Short-time regulation of task allocation is only possible based on activationand de-activation. When a larger share of the specialized worker force is lost, long-term regulation mayaccomplish this breakdown by producing new caste members (Robinson, 1992). However, such colonies maybe ergonomically more efficient as no task switching is necessary and the single animals may be “simpler.”The possibility of greater specialization of individuals which only perform one activity leads to a tradeoffbetween individual and social complexity (Anderson & McShea, 2001). A model for task allocation withtemporal polyethism is the threshold model. Intrinsic factors like for example genetical differences betweenthe individuals result in different reaction times on the same stimulus. Other models try to explain divisionof labor and different levels of activity without any intrinsic factors, just based on external stimuli orinteractions. Another model is the “foraging for work” model (Tofts & Franks, 1994) which assumes thatonly stimuli from the environment determine what task the animal executes. The stimuli highly depend onthe location of the individual, thus spatial structure is essential in creating division of labor.

3 Transfer to Multi agent Systems

At the end of the 80ies self-organization was acknowledged as a appropriate basis for coordination in multi-agent systems (Steels, 1990). Primary inspiration was seen in social insects and mass recruitment models,like pheromone trails (Pasteels et al., 1987). Starting from the area of “collective robotics,” social insectsform the basis for self-organizing solutions in different applications domains like graph optimization (Dorigoet al., 1996) in e.g. routing in telecommunication networks. A survey about several application areas canbe found in (Bonabeau et al., 1999). Also, Anderson and Bartholdi (2000) and Parunak et al. (1998) giveseveral example scenarios for biologically inspired control and promote the use of insect-like mechanisms.

— Page 76 —


However, those mechanisms are far from being accepted as an alternative task allocation mechanism toestablished ones. In our opinion this is due to the fact that their functionality is not sufficiently understoodfor developing well-defined engineering mechanisms.

Based on task allocation models developed to explain the division of labor in insect societies, we identifiedthree types that are attractive for the organization of software agents:

• stimulus-based allocation is inspired by “foraging for work”-like models. An agent perceives itsenvironment and fulfils those tasks that it encounters and are required to be fulfilled. That means, allagents possess the same capabilities. Task selection is based on the agents locality.

• preference-based allocation subsumes all forms of task allocation from agents that have differentbiases for different tasks to extreme caste systems. Regulation is based on activation and existence ofthe correct share of agents that are able to execute the required tasks.

• interaction-based allocation transfers recruitment systems to task allocation. An agent motivatesother agents to perform a task, also information about task location, etc. is transferred.

In this contribution we focus on different instances of the first two allocation mechanisms, the efficiency ofwhich was studied using a multi agent simulation

4 Study of Insectiod Organization of Work

We started with a generic agent-based simulation model that represents an abstract task allocation problemsolvable by several more or less different agents. The model is intentionally designed in such an abstractway in order to facilitate the conceptional transfer of task allocation methods to multi-agent applicationdomains.

4.1 Generic Model

Onto a torus world a number of task objects is randomly distributed. Every task object has a discrete typefrom a given set of types. Its status is described by an numerical value between 0 and 1 that denotes howurgent the fulfillment of the task at this position is needed. This urgency is increasing continuously basedon some linear function with individual gradient values, however a requirement will not exceed 1. When atask object is executed, the agent decreases this urgency also linearly until the requirement equals 0. In thebasic model the “ability” of all agent is equal and fixed throughout the simulation. The value is 0.2/step. Infuture tests we will allow this ability to change after task execution using a simple reinforcement mechanism.Execution is terminated when the requirement decreases below zero.

To determine its next task every agent decides independently of others according its individual thresholdvector: a locally perceived task is selected for execution if its urgency exceeds the individual threshold. Theelements of this threshold vector may also take values between 0 and 1. When an agent is not working ona task, it performs a random walk searching for the next task to execute. We used quite dense task areas(500 tasks on a 1000x1000 positions area) and relatively large perception radius (300 positions), so agentsrarely perform the random walk, they usually move straight from task to task. Movement happens with 100points per update step. The behavior model of agents is depicted in figure 1.

4.2 Task Allocation Mechanisms

The threshold based selection is generic enough to form the basis for the implementation of the followingtask allocation strategies:

A Purely stimulus-based allocation is realized by setting all threshold value to zero. All agent are thesame and don’t possess any preference for any task types.

— Page 77 —


Random WalkMove to Task

Execute

PerceivedTasks?

yes no

Reqirement>Threshold

Figure 1: Schematic illustration of agent behavior in the generic model

B Every agent has a slightly different bias for some of the tasks. This is implemented by setting allthresholds randomly between 0 and 0.5 (uniformly distributed).

C The differences in preferences for tasks are increased. The thresholds of any agent are randomly setbetween 0 and 1. This results in higher specialization.

D Extreme caste-like specialization is tested. For every agent one task type is selected randomly. Thethreshold value for this type is set to zero, all other values are set to 0.9 (Da) – or in the more rigorouscase to 1 (Db). A threshold equal 1 means that the agent will never select any task of this type.

It is important to also compare the efficiency of these biological task allocation methods to the optimalallocation. In this generic case this is a central allocation where agents are assigned to the most urgenttasks. The generic model together with these task allocation strategies was implemented using the multiagent simulation environment SeSAm (www.simsesam.de).

5 Experiments and Results

For all results presented in the following simulation runs were done each with 2000 update cycles. The dataused for comparison is taken after 1000 time steps to avoid any influence of the start condition. We alsoexecuted longer runs without leading to different results. As measures of efficiency the mean and standarddeviation of overall task fulfillment over this observation interval were used. It is denoted by the currentmean of the requirement of all tasks, as well as its current standard deviation. Also the number of taskswith urgency equals 1 gives a good measure for the success in maintainability. We plan to analyze these andother measures in the future as well.

5.1 Comparison of Allocation Mechanism

We tested the four allocation scenarios and compared them for different numbers of agents. The valuesfor different numbers of individuals are depicted in figure 2. The figure is missing any marks for standarddeviation values because both the standard deviation within one simulation run and between differentsimulation runs was below 0.02. The values for deviations between the requirements of individual task arehighly correlated with the means (correlation coefficients > 0.92), except for mechanism Db, that seems notto be correlated for larger number of task types.

— Page 78 —


Performance (5 Task Types)

0

0.2

0.4

0.6

0.8

100 200 300 400 500

Number of Agents

Mea

n T

ask

Req

uire

men

t

Central

A

B

C

Da

Db

Figure 2: Mean requirement of tasks when agents apply the different allocation mechanisms. The higher thevalues, the lower the success as y-value refers to the amount of labor left to do. For the labelling of allocationmethods see section 4.2.

As expected the performance is improving with the number of agents (highly significant with correlationcoefficients −0.77 central, 0.89 for A, −0.85 for B and C, 0.91 for Da and −0.95 for Db). Surprising is, thatthe central approach does not totally out-performs all local approaches, even when there is only a smallnumber of agents. This might be due to the exhaustive task execution. Slightly different thresholds allow asomewhat flexible allocation, as not all agents are working but searching for more urgent tasks.

Interesting is also, that the increase in performance for extreme specialization seem to be almost linearin the number of agents. As it could be expected, the more extreme forms of caste system show worstperformance. However, one might speculate that for very large groups this kind of allocation is the mostefficient one. In the biological example, systems with morphological castes occur in ant colonies with severalmillions of ants.

5.2 Relevance of Number of Task Types

Secondly we tested how the number of task types would influence the performance of the task allocationmethods. Currently, the different task types do not possess additional semantics. Consequently they areof no interest in purely stimulus-based allocation. However, in the preference-based allocation scenario thenumber of different task may be thought to result a worse allocation when the number of task types increases.The number of task types determines the number of thresholds. Increasing this number thus also raises thenumber of tasks that an agent will never execute or is less likely to execute. Therefore it is quite surprisingthat the performance of the randomly distributed thresholds model is almost equal for the tested differenttask numbers. It is even more surprising that the values for runs with only two task types are slightly worsefor large numbers of agents. Figure 3 (left) shows these results for allocation mechanism B. The allocationmechanism with thresholds equally distributed between 0 and 1 (C) exhibits analogous results.

Another quite surprising result is the performance of the allocation mechanism with extreme specializa-tion (D). The performances is decreasing with an increased number of tasks types, i.e. the length of the

— Page 79 —


Figure 3: Mean requirement of tasks for different numbers of task types and agent numbers for preference-basedallocation with slight differences in thresholds (B) on the left and extreme specialization (Db) on the right.

threshold vector (see figure 3 on the right). The results from the still more extreme configuration (Db) areanalogous.

Comparing the task performance of the mechanism Db to the other mechanisms for the small number oftasks and large group size one, it shows that only A and the central approach are better than the specializedsetting. This is quite interesting, as this points towards an advantage of specialization without abilityimprovements in higher agent numbers. However, first tests with larger settings with ten thousand agentsand analogous number of tasks could not confirm these hypotheses.

We already have preliminary data from additional experiments as well. Simple tests about the relevanceof spatial structure were made with maps on which tasks are clustered according to their types. Anotherinteresting, yet not surprising, result was observed: The performance of extreme specialization improveswhen tasks are clustered and spatially separated. It even seems to out-perform other allocation forms inconfiguration with higher task type numbers. We are currently also testing basic assumptions, for examplethe effect of task commitment. Alternative forms are possible, e.g. a constant value for all task executionsindependent of the actual requirement of the task object.

6 Discussion and Further Work

Based on an abstract model of task allocation we presented some interesting results for different task alloca-tion mechanisms, like the effects of the number of different task types or the comparably good performancein scenarios with low agent numbers. These results are interesting not only for biologists, but also for peoplethat want to construct systems of interacting software agents.

However, there are several important aspects that should be tackled next in addition to a careful sta-tistical examination of the already available data not presented here about spatial distribution or taskcommitment tests:

• Evolution of Thresholds: In the current form of the generic model the thresholds of an individualagent are fixed. We will explore the possibility of making the preference to select a certain taskchange with the experience in executing that task. Such a feedback loop might form the basis for aself-organization model resulting in a flexible and robustness of task allocation with lower numbers ofagents.

— Page 80 —


• More complex ability models: In the current status of the generic model all agents possess thesame abilities for the execution of all types of tasks. There is neither a static difference, nor anexperience-based adaptation. The restriction in behavior repertoire might also have a positive effecton the related abilities.

• More complex task models: The assumption of tasks which’s requirements are independent ofother tasks, is heavily unrealistic. The results of our study would be more useful, if the fulfillment ofone task would increase the requirement of another related one. Importance of execution accordingto task type and other aspects of task networks may play an important role especially when we wantto predict the performance biologically inspired multi-agent systems in realistic settings

This study is the first step in a larger research effort. At the end of our study of insectoid task allocationwe hope to understand enough of the underlying concepts for developing multi-agent applications by reverseengineering. In addition, our goal is to make a well-defined method available which allows the constructionof self-organizing multi agent systems with the advantages associated with social insect superorganisms.

Acknowledgements

The work described in this paper was supported by DFG under SFB 554(D3/4) “Emergent Behavior inSuperorganisms” and DO 774/-1 (Emmy Noether Fellowship) to A.D.

References

Anderson, C. and Bartholdi, J. J. 2000. Centralized versus decentralized control in manufacturing: lessonsfrom social insects.In McCarthy, I. P. and Rakotobe-Joel, T., editors, Complexity and Complex Systemsin Industry, Proceedings University of Warwick, 19th-20th September 2000, pages 92–108.

Anderson, C. and McShea, D. W. 2001. Individual versus social complexity, with particular reference toant colonies. Biolocial Review (Cambridge), 76: 161–209.

Beshers, S. N. and Fewell, J. H. 2001. Models of division of lavor in social insects. Annual Review ofEntomology, 46: 413–440.

Bonabeau, E., Dorigo, M., and Theraulaz, G. 1999. Swarm Intelligence: From Natural to Artificial Systems.Oxford University Press, Oxford.

Dorigo, M., Maniezzo, V., and A. Colorni. 1996. The ant system: optimization by a colony of cooperatingagents. IEEE Transactions on Systems, Man, and Cybernetics - Part B, 26: 29–41.

Gordon, D. M. 1996. The organization of work in social insect colonies. Nature, 380: 121–124.

Holldobler, B. and Wilson, E. O. 1990. The Ants. Harvard University Press.

Parunak, D. V., Sauter, J., and Clark, S. 1998. Toward the specification and design of industrial syntheticecosystems. In Singh, M., Rao, A., and Wooldridge, M., editors, Intelligent Agents IV (= Proceedingsof ATAL’97), volume 1365 of Lecture Notes in Artificial Intelligence, pages 45–59. Springer.

Pasteels, J. M., Deneubourg, J.-L., and Goss, S. 1987. Self-organization mechanisms in ant societies (1)trail recruitment to newly discovered food sources. In Pasteels, J. M. and Deneubourg, J.-L., editors,From Individual to Collective Behavior in Social Insects, pages 155–176.

Robinson, G. 1992. Regulation of division of labour in insect societies. Annual Review of Entomology, 37:637–665.

Smith, R. 1980. The contract net protocol: High level communication and control in a distributed problemsolver. IEEE Transactions on Systems, Man, and Cybernetics, 12: 1104–1113.

— Page 81 —


Steels, L. 1990. Cooperation between distributed agents through self-organisation. In Demazeau, Y. andMuller, J.-P., editors, Decentralizied Artificial Intelligence, pages 175–196.

Tofts, C. and Franks, N. R. 1994. Foraging for work: How tasks allocate workers. Animal Behavior, 48:470–472.

Wellmann, M. P. 1995. Computational market model for distributed configuration design. Artificial Intel-ligence for Engineering Design, Analysis, and Manufacturing (AI EDAM), 9: 125–133.

— Page 82 —


A model of Adaptation in CollaborativeMulti-Agent Systems

Kristina LermanUSC Information Sciences Institute, Marina del Rey, CA 90292, USA. E-mail: [email protected]

Abstract

We describe a general mechanism for adaptation in multi-agent systems in which agents modifytheir behavior in response to changes in the environment or actions of other agents. Theagents use memory to estimate the global state of the system from individual observations andadjust their actions accordingly. We claim that the agents in such systems may be modeled asgeneralized stochastic Markov processes. We present a mathematical model of the dynamics ofcollective behavior and apply it to study adaptive collaboration in a group of mobile robots.

1 Introduction

Adaptation is an essential requirement for engineered systems functioning in dynamic environments thatcannot be fully known or characterized in advance. Adaptation allows agents, whether they are robots,modules in an embedded system or software components, to change their behavior in response to environ-mental changes and actions of other agents in order to improve overall system performance. Additionally,adaptation allows swarms, artificial systems composed of large numbers of agents, to remain robust in face offailure by sizeable fractions of agents. If each agent had instantaneous global knowledge of the environmentand the state of other agents, the system could dynamically adapt to any changes. In most situations, suchglobal knowledge is impractical. However, for sufficiently slow dynamics, agents can correctly estimate thestate of the environment through repeated local observations by storing them in memory (Jones, 2003).In this paper we describe a general mechanism for adaptation in multi-agent systems in which agents usememory to estimate the global state of the system from individual observations and adjust their actionsaccordingly.

We present a mathematical model of collective behavior of adaptive systems composed of many relativelysimple agents that can use memory of past observations to make decision about future actions, but do notrely on abstract representation, planning, or higher order reasoning functions. We claim such agents canbe represented as a stochastic Markov process. A differential equation, known as the stochastic MasterEquation, governs the evolution of stochastic processes. The Master Equation is often too difficult toformulate and solve for real systems; therefore, we will work with the Rate Equation, which represents thefirst moment of the Master Equation. The Rate Equation describes how the average number of agents in agiven state changes in time.

We illustrate the approach by applying it to study collaboration in groups of mobile robots. The illustra-tion is based on the stick-pulling experiments in groups of robots(Ijspeert, 2001). In these experiments, therobots’ task is to pull sticks out of their holes, and it can be successfully achieved only through collaborationbetween two robots. There is no explicit communication or coordination between the robots. Rather, when arobot finds a stick, it lifts it partially out of the ground and holds it for some period of time. If another robotfinds the first one during this time period, it grabs the stick and lifts it out of the hole completely (successfulcollaboration); otherwise, the first robot releases the stick (unsuccessful collaboration) and resumes search.We show that a simplified model, in which rather than waiting a specified period of time, a robot has someprobability of releasing the stick before the second robot has found it, produces qualitatively similar groupbehavior as the more complex model that explicitly includes the gripping time. In particular, we show thatin some parameter range, there is an optimal stick release rate that maximizes group performance, the rateat which sticks are extracted. The relevant parameter here is the ratio of robots to sticks. These findingsqualitatively agree with experiment and simulations results.

— Page 83 —


These results indicate that if the number of robots and sticks is known in advance, the robots’ stickrelease rate may be adjusted to maximize group performance. Alternatively, in the adaptive version ofcollaborative stick pulling, a robot can modify its own stick release rate based on its estimate of the numberof sticks and other robots in the environment. As it searches the arena, the robot records observations ofsticks and other robots, uses these observations to derive the density of each, and computes a stick releaserate based on these values. If the number of robots changes due to failure of robots or arrival of new ones,robot modify their individual behaviors to optimize group performance. We write down the model adaptivestick pulling and study the dynamics of the system in different parameter regimes.

2 Collective Dynamics of Stochastic Processes

Even in a controlled laboratory setting, the actions of an individual agent, e.g., robot, are stochastic andunpredictable: the robot is subject to forces that cannot be known in advance, noise and fluctuations fromthe environment, interactions with other robots with complex equally unpredictable trajectories, errorsin its sensors and actuators, in addition to randomness that is often deliberately inserted into the robotcontroller by its designer, e.g., in collision avoidance maneuvers, the robot often turns a random anglebefore proceeding. Although the behavior of an individual agent is complex and unpredictable, collectivebehavior often has a simple, probabilistic form. We claim that some agents, specifically robots, can berepresented as stochastic Markov processes. Of course, this does not apply to deliberative agents that useplanning, reasoning and abstract representations; however, it is true of many robots, including reactive andadaptive robots. In particular, an adaptive robot that makes decisions about future actions based on itsmemory of m past observations can be represented as a generalized Markov process of order m.

In past works, we have shown that time evolution of adaptive agent systems is governed by the stochasticMaster Equation (Lerman, 2003a; Lerman, 2003b). Below we recapitulate the main elements of the theory.Let p(n, t) be the probability an agent is in state n at time t.3 For a homogenous system of independent andindistinguishable agents, p(n, t) also describes the macroscopic state of the system — the fraction of agentsin state n. Let us assume that agents use a finite memory of length m of the past of the system in orderto estimate the present state of the environment and make decisions about future actions. Then the agent(and therefore, the multi-agent system) can be represented as a generalized Markov processes of order m.This means that the state of an agent at time t + ∆t depends not only on its state at time t (as in simpleMarkov systems), but also on states at times t−∆t, t− 2∆t, . . ., t− (m− 1)∆t, which we refer to as historyh. Each history is drawn from a distribution of histories drawn over all agents characterized by p(h).

Evolution of the agent’s state is governed by:

∆p(n, t) = p(n, t + ∆t)− p(n, t) =∑

h

[p(n, t + ∆t|h)− p(n, t|h)

]p(h).

We expand ∆p employing identities:

p(n, t + ∆t|h) =∑n′

p(n, t + ∆t|n′, t; h)p(n′, t|h)

and1 =

∑n

p(n, t + ∆t|n′, t; h) (Lerman, 2003a).

In the continuum limit (∆t → 0), ∆p/∆t becomes the Master Equation for adaptive systems, similarin form to the stochastic Master Equation widely studied in statistical physics and chemistry (VanKampen,1992). The stochastic Master Equation describes the evolution of the probability density for an agent tobe in state n at time t, or alternatively, the macroscopic probability density function for the agents in state

3State represents the behavior or action the robot is executing in the process of accomplishing its task.

— Page 84 —


n. In its most general form this equation is generally difficult to formulate and solve. Instead, we work withthe Rate Equation, which represents the first moment of the Master Equation and describes how Nn, theaverage number of agents in state n, changes in time:

dNn

dt=∑n′

[〈W (n|n′)〉Nn′ − 〈W (n′|n)〉Nn

], (1)

with history-averaged transition rates

〈W (n|n′)〉 = lim∆t→0

∑h p(n, t + ∆t|n′, t; h)p(h)

∆t. (2)

Equation 1 also holds for systems composed reactive robots, which can be modeled as ordinary Markovprocesses, although the history term no longer appears in it. It is important to remember that the RateEquations does not describe results of a specific experiment, rather, the behavior of quantities averaged overmany experiments. We use the Rate Equation to study collective behavior of adaptive robot systems.

3 Collaborative Stick-Pulling

The stick-pulling experiments were carried out by Ijspeert et al. (Ijspeert, 2001) to investigate dynamics ofcollaboration among locally interacting reactive robots. Figure 1 is a snapshot of the physical set-up of theexperiments. The robots’ task was to locate sticks scattered around the arena and pull them out of theirholes. A single robot cannot complete the task (pull the stick out) on its own — a collaboration between tworobots is necessary for the task to be successfully completed. Each robot is governed by the same controller:each robot spends its time looking for sticks and avoiding obstacles. When a robot finds a stick, it lifts itpartially out of its hole and waits for a period of time τ for a second robot to find it. If a second robot findsthe first one, it will grip the stick and pull it out of the ground, successfully completing the task; otherwise,the first robot times out, releases the stick and returns to the searching state.

Figure 1: Physical set-up of the stick-pulling experiment (courtesy of A. Martinoli).

In (Lerman, 2001) we have constructed a mathematical model of collective dynamics of this system andcompared the model’s predictions to experimental results. Here we examine a simplified scenario, consideredalso in (Lerman, 2003b), where instead of waiting a specified period of time, each robot releases the stickwith some probability per unit time. As we show in Sec. 3.1, the behavior of such a simplified system issimilar to that of the original system. Moreover, adaptive version of the simplified system, discussed inSec. 3.2, is readily amenable to analysis.

— Page 85 —


3.1 Collective Behavior of Reactive Systems

On a macroscopic level, during a sufficiently short time interval, each robot will be in one of two states:searching or gripping. We assume that actions such as pulling the stick out or releasing it take placeon a short enough time scale that they can be incorporated into the search state. Of course, in a modelthere can be a discrete state corresponding to every robot behavior or action in the controller, but thiswould complicate mathematical analysis without adding much to the descriptive power of the model. Wehave shown that such coarse-grained description produces a model that helps explain the main experimentalconclusions (Lerman, 2001).

Each state in the macroscopic description requires a dynamic variable: thus, the variables of our modelare Ns(t) and Ng(t), the (average) number of robots in the searching and gripping states respectively, as wellas M(t), the number of uncollected sticks at time t. The mathematical model of the stick-pulling systemconsists of a series of coupled rate equations, describing how the dynamic variables evolve in time:

dNs

dt= −αNs(t)

(M(t)−Ng(t)

)+ αNs(t)Ng(t) + γNg(t) , (3)

dM

dt= −αNs(t)Ng(t) + µ(t) , (4)

where α, α are the rates at which a searching robot encounters a stick and a gripping robot respectively, γ isthe rate at which robots release sticks and µ(t) is the rate at which new sticks are added by the experimenters.These parameters connect the model to the experiment: α and α are related to the size of the object, therobot’s detection radius, or footprint, and the speed at which it explores the arena.

The first term in Eq. 3 accounts for the decrease in the number of searching robots as robots findand grip sticks; the second term describes successful collaborations between two robots (sticks are pulledout), and the third term accounts for the failed collaborations (when a robot releases a stick without asecond robot present), both of which lead to an increase the number of searching robots. We do not need aseparate equation for Ng , since this quantity may be calculated from the conservation of robots condition,N0 = Ns +Ng. The last equation, Eq. 4, states that the number of sticks, M(t), decreases in time at the rateof successful collaborations. The equations are subject to the initial conditions that at t = 0 the number ofsearching robots in N0 and the number of sticks is M0.

We introduce the following transformations on variables in order to rewrite equations in dimensionlessform: n(t) = Ns(t)/N0 and m(t) = M(t)/M0 are fractions of searching robots and uncollected sticks at timet; β = N0/M0, ratio of robots to sticks; RG = α/α and β = RGβ. The fraction of gripping robots is simply1− n(t). Dimensionless versions of Eqs. 3–4 are:

dn

dt= −n(t)[m(t) + βn(t)− β] + βn(t)[1− n(t)] + γ[1− n(t)] (5)

dm

dt= −ββn(t)[1− n(t)] + µ′ (6)

Note that only two parameters, β and γ, appear in the equations and, thus determine the behavior ofsolutions. The third parameter β = RGβ is fixed experimentally and is not independent. Note that we donot need to specify α and α — they enter the model only through RG (throughout this paper we will useRG = 0.35).4

4The parameter α can be easily calculated from experimental values quoted in (Ijspeert, 2001). As a robot travelsthrough the arena, it sweeps out some area during time dt and will detect objects that fall in that area. This detectionarea is VRWRdt, where VR = 8.0 cm/s is robot’s speed, and WR = 14.0 cm is robot’s detection width. If the arenaradius is R = 40.0 cm, a robot will detect sticks at the rate α = VRWR/πR2 = 0.02 s−1. According to (Ijspeert,2001), a robot’s probability to grab a stick already being held by another robot is 35% of the probability of grabbinga free stick. Therefore, RG = α/α = 0.35. RG is an experimental value obtained with systematic experiments withtwo real robots, one holding the stick and the other one approaching the stick from different angles.

— Page 86 —


0 10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

1/γ

colla

bora

tion

rate

per

rob

otβ=0.5β=1.0β=1.5

(a) (b)

Figure 2: (a) Collaboration rate per robot vs inverse stick release rate 1/γ for β = 0.5, β = 1.0, β = 1.5.These values of β correspond, respectively, to two, four, and six robots in the experiments with four sticks. (b)Collaboration rate vs. the gripping time parameter for groups of two to six robots and four sticks (from (Ijspeertet al, 2001)). Heavy symbols represent experimental results, while lines represent results of two different types ofsimulations.

We assume that the number of sticks does not change with time (m(t) = m(0) = 1) because new sticksare added (e.g., by the experimenter) at the rate the robots pull them out. A steady-state solution, if itexists, describes the long term time-independent behavior of the system. To find it, we set the left hand sideof Eq. 5 to zero:

−n[1 + βn− β] + βn[1− n] + γ[1− n] = 0. (7)

This quadratic equation can be solved to obtain steady state values of n(β, γ).Collaboration rate is the rate at which robots pull sticks out of their holes. The steady-state collaboration

rate isR(γ, β) = ββn(γ, β)[1− n(γ, β)] , (8)

where n(γ, β) is the steady-state number of searching robots for a particular value of γ and β. Figure 2(a)depicts the collaboration rate as a function of 1/γ. Note, that there exists a critical value of β, so that forβ > βc, collaboration rate remains finite as 1/γ → ∞, while for β < βc, it vanishes. The intuitive reasonfor this was presented in (Ijspeert, 2001): when there are fewer robots than sticks, and each robot holds thestick indefinitely (vanishing release probability), after a while every robot is holding a stick, and no robotsare available to help pull sticks out. Also, for β < βc there is an optimal value of γ which maximizes thecollaboration rate and can be computed from the condition dR(γ, β)/dγ = ββd(n − n2)/dγ = 0, with ngiven by roots of Eq. 7. Another way to compute the optimal release rate is by noting that for a given valueof β below some critical value, the collaboration rate is greatest when half of the robots are gripping andthe other half are searching. Substituting n = 1/2 into Eq. 7, leads to

γopt = 1− (β + β)/2 for β < βc = 2/(1 + RG). (9)

No optimal release rate exists when β exceeds its critical value βc.Figure 2(b) shows results of experiments and simulation conducted by Ijspeert et al. (Ijspeert, 2001) for

groups of two to six robots. The three curves in Fig. 2(a) are qualitatively similar to those in Fig. 2(b) for2 robots (β = 0.5), 4 robots (β = 1.0) and 6 robots (β = 1.5). Even the grossly simplified model reproducesthe main conclusions of the experimental work: existence of βc, the critical value of the ratio of robots to

— Page 87 —


sticks, and the optimal release rate (or conversely, the gripping time) that maximizes the collaboration ratefor β < βc. In addition, analysis gives analytic form for important parameters, such as βc and γopt — valueswe will exploit in constructing adaptive version of collaborative stick pulling.

3.2 Collective Behavior of Adaptive Systems

Figure 2 indicates that if the number of sticks and robots is known in advance, the robot’s release rate canbe set to a value that maximizes the group collaboration rate. If the number of sticks or the number ofrobots is not known or changing in time (due to robot failure, for example), the robots can still tune theirindividual parameters to maximize group performance. They accomplish this through the memory-basedadaptation mechanism. As they search the arena, robots record local observations of sticks and other robotsin memory, estimate the density of each from these observations, and use these values to compute the stickrelease rate. The robots set the release rate according to the following rules:

γ = 1− βobs(1 + RG)

2for βobs < 2/(1 + RG) (10)

γ = 0 for βobs ≥ 2/(1 + RG),

where βobs = Nobs/Mobs, the ratio of the observed values of robots and sticks. Suppose each robot has amemory window of size h. As it makes observations, robot adds them to memory, replacing older observationswith more recent ones. For a particular robot, the values in most recent memory slot are N0

obs and M0obs,

the observed number of robots and sticks at time t; in the next latest slot, the values are N1obs and M1

obs,the observed numbers at time t − ∆, and so on. Robot computes γopt from values stored in memory:Nobs =

∑h−1j=0 Nj

obs and Mobs =∑h−1

j=0 M jobs.

Dynamics of the adaptive system are specified by Eqs. 5-6, where γ is now the history-averaged stickrelease rate, aggregate of individual decisions made according to rules in Eq. 10. It is computed in thefollowing way. When observations of all robots are taken into account, the mean of the observed numberof robots in the first memory slot is 1

N

∑Ni=1 N0

i,obs ≈ N(t), where N(t) is the average number of robots

at time t. Likewise, the mean value observed value in memory slot j is 1N

∑Ni=1 Nj

i,obs ≈ N(t − j∆), theaverage number of robots at time t− j∆. In general, the actual value will fluctuate because of measurementerrors; however, on average, it will be the average number of robots (or sticks) in the system at that time.This system is trivial — the average number of robots and sticks does not change in time. In other systems,however, parameters may depend on variables that change in time, for example, the number of searchingrobots (Lerman, 2003). The rate equations for such systems will be time delay equations, since parameterswill depend on the delayed values of the dynamic variables.

Figure 3(a) shows how the solution, the fraction of searching robots, relaxes in time in both adaptive andreactive systems. In all cases, solutions reach a steady-state. Note that in reactive systems, the steady-statevalue of ns depends on β, while in adaptive systems, by design ns = 0.5.

Figure 3(b) shows the difference between collaboration rate in adaptive and reactive systems for differentvalues of γ (the value of collaboration depends on γ only in reactive systems). The difference is alwayspositive, meaning that adaptation always improves collaboration rate, by as much as 15% in this range ofβ. The two sets of curves are for two values of RG, an experimental parameter that measures how easy it isfor the second robot to grip the stick. In experiments RG was measured to be 0.35 (Ijspeert, 2001), and thisis the value we used in this paper. Essentially, RG gives the angle the second robot can approach the firstone and still be able to grip the stick. As we can see from the figure, this experimental parameter influencescollaboration rate. If robots are redesigned, so that one robot can approach a gripping robot from a widerangle (bigger value of RG), the benefit of adaptation in such a system will be even greater.

— Page 88 —


0 10 20 30 40 500.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time

n S

adaptivereactive, β=0.5reactive, β=1.0reactive, β=1.5

0 0.5 1 1.5 2 2.5 3

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

β

∆ co

llabo

ratio

n ra

te

RG

=0.35,1/γ=1

1/γ=0.01R

G=0.75,1/γ=1

1/γ=0.01

(a) (b)

Figure 3: (a) Time evolution of the fraction of searching robots for adaptive and reactive systems. (b) Differencebetween collaboration rates for adaptive and reactive systems for different values of experimental parameters RG

and γ.

4 Conclusions

We have developed a general mathematical model to describe adaptation in multi-agent systems. In thesesystems the agents can modify their behavior in response to environmental dynamics or actions of otheragents. The agents estimate the global state of the system from individual observations stored in memoryand adjust their behaviors accordingly. In earlier works we have derived a model that describes the dynamicsof collective behavior in such adaptive systems. Here we have applied it to study adaptive collaboration inrobots, where robots compute internal parameters based on the observations stored in memory. We explicitlytook finite memory size into account, although in the aggregate approach considered here, the size of thememory window does not impact the behavior of the system. We showed that collaboration rate in adaptivesystems is greater than that in reactive systems for all values of system parameters. We have not consideredthe effect of noisy observations on collective behavior.

Acknowledgements

The research reported here was supported in part by the Defense Advanced Research Projects Agency(DARPA) under contract number F30602-00-2-0573. I would like to thank Aram Galstyan and Tad Hoggfor insightful discussions, and anonymous referee for suggesting the simplified problem.

References

Ijspeert, A. J., Martinoli, A., Billard, A. and Gambardella L. M. 2001. Collaboration through the Ex-ploitation of Local Interactions in Autonomous Collective Robotics: The Stick Pulling Experiment.Autonomous Robots 11(2):149–171.

Jones, C. and Mataric, M. 2003. Adaptive task allocation in large-scale multi-robot systems. In Proc. ofIROS-03.

Lerman, K., Galstyan, A., Martinoli A. and Ijspeert, A.-J. 2001. K. Lerman, A. Galstyan, A. Martinoliand A-J. Ijspeert, A Macroscopic Analytical Model of Collaboration in Distributed Robotic Systems.

— Page 89 —


Artificial Life 7:375–393.

Lerman, K. and Galstyan, A. 2003. Macroscopic Analysis of Adaptive Task Allocation in Robots. In Proc.of IROS-03.

Lerman, K., Galstyan, A. and Hogg, T. 2003. Mathematical Analysis of Multi-Agent Systems. Submittedto J. of Autonomous Agents and Multi-Agent Systems.

Van Kampen, N. G. 1992. Stochastic Processes in Physics and Chemistry. Elsevier Science, Amsterdam.

— Page 90 —


Diversity and Specialization inCollaborative Swarm SystemsLing Li1, Alcherio Martinoli2, Yaser S. Abu-Mostafa1

1. California Institute of Technology, Pasadena, CA 91125, USA.Corresponding author : [email protected]

2. Swiss Federal Institute of Technology, CH-1015 Lausanne, Switzerland.

Abstract

This paper addresses qualitative and quantitative diversity and specialization issues in the frame-work of self-organizing, distributed, artificial systems. Both diversity and specialization are ob-tained via distributed learning from initially homogeneous swarms. While measuring diversityessentially quantifies differences among the individuals, assessing the degree of specializationimplies to correlate the swarm’s heterogeneity with its overall performance. Starting from astick-pulling experiment in collective robotics, a task that requires the collaboration of two ro-bots, we abstract and generalize in simulation the task constraints to k robots collaboratingsequentially or in parallel. We investigate quantitatively the influence of task constraints andtype of reinforcement signals on diversity and specialization in these collaborative experiments.Results show that, though diversity is not explicitly rewarded in our learning algorithm and thereis no explicit communication among agents, the swarm becomes specialized after learning. Thedegree of specialization is affected strongly by environmental conditions and task constraints,and reveals characteristics related to performance and learning in a more consistent and clearerway than diversity does.

Keywords: collaborative swarm systems, distributed learning, specialization, diversity.

1 Introduction

Artificial swarm systems based on swarm intelligence (SI) consist of relatively simple autonomous agents.They are truly distributed, self-organized, inherently scalable since there is no global control or communica-tion mechanism, and exploit an adequate balance between explorative and exploitative behavior for robustlyfacing changes in environmental or task conditions (Bonabeau et al., 1999).

Swarm systems can be homogeneous or heterogeneous. A homogeneous system consists of physicallyidentical entities with the same hardware and software capabilities. A heterogeneous system may differentiateat different levels: at the hardware level, at the (controller) software level, or simply because each entity hasa unique identifier. In this paper, we use software agents emulating real robots that differentiate exclusivelyat the controller level, in particular endowed with different control parameters.

Homogeneous systems represent a special case of heterogeneous ones. Depending on environmentaland task constraints, a homogeneous solution may not be that achieving the best results. Learning, as anautomatic way to adjust control parameters or select rules without a priori assuming the degree of swarmheterogeneity, represents an effective tool to explore not only homogeneous solutions (Hayes et al., 2003)but also heterogeneous ones (Murciano et al., 1997; Li et al., 2002). In this paper, we are interested indistributed learning, i.e., adaptation through learning occurs exclusively in single robot’s controllers (andnot, for instance, in an external supervisor unit).

However, depending on agents’ capabilities in perception and communication, it may be extremely diffi-cult for a distributed learning algorithm to discover (near-)optimal solutions at the swarm level. In addition

— Page 91 —


to the inherently large search space characterizing a heterogeneous swarm, the credit assignment problemdue to partial perception of agents drastically increases the difficulty of distributed learning. Solutionsproposed in the literature of multi-agent learning can be roughly classified according to the type of rein-forcement signal adopted, either local or global. The local reinforcement signal (Mataric, 1998; Parker &Touzet, 2000) rewards a single agent based on the local assessment about its contribution to the swarmperformance. Although this type of reinforcement signal is immediate and exploits the inherent parallelismof the swarm, it just represents a noisy estimation of the swarm performance. The more limited and localthe communication and perception capabilities (e.g., in extreme cases no communication at all and veryshort-range sensors) are, the higher the amount of noise is in the local assessment due to partial perception.On the contrary, the global reinforcement signal (Murciano et al., 1997; Versino & Gambardella, 1997; Hayeset al., 2003), which is often equivalent to the swarm performance, is stabler and more meaningful. However,this usually implies a reliable way to measure the swarm performance (e.g., a supervisor or a fully connected,fast communication network among agents) and a more difficult interpretation of the reinforcement signalat the agent level, especially in heterogeneous systems.

In this paper, we let the distributed learning algorithm explore heterogeneous solutions, aiming to im-prove the swarm performance. We consider different task constraints and types of reinforcement signals,and quantitatively measure diversity and specialization of a team of non-communicating agents. We supportthe discussion first with a concrete collaboration experiment concerned with pulling sticks and then withits generalized versions where the collaboration is extended to k sequential or parallel operations—with theanalog of pulling longer or heavier sticks. We show that specialization can arise in all versions of experimentsas a function of task constraints and environmental conditions no matter which type of reinforcement signalis used. As long as the diversity in agents brings advantage to the swarm performance, learning can drivethe system to be specialized.

2 Diversity and Specialization

Traditionally, swarm systems have been classified on a bipolar scale as either heterogeneous or homogeneousdepending on whether any of the agents differ from the others. This view is limiting because it does notpermit a quantitative comparison between heterogeneous systems. Quantitative metrics of swarm diversityand specialization can enable the investigation of issues such as the impact of diversity on swarm performanceand the impact of task constraints on specialization.

The essential idea behind the diversity measure is to cluster similar agents according to a problem-specific difference measure and look at the pattern they form in the feature space. After some preliminarytests where we used a heuristic criterion to select the “optimal” clustering and adopted the number ofclusters as the diversity measure (Li, 2002), we adopted Balch’s social entropy (Balch, 1998) as the diversitymeasure for the stick-pulling experiments. Based on Shannon’s information entropy, Balch’s social entropymakes a meaningful and stable measure by incorporating details about the feature space such as the spatialdistribution of the clusters.

Specialization means more than just being diverse. While diversity means difference among individualsno matter whether the difference is good or bad in respect to the swarm performance, specialization, with thedefinition “structural adaptation of a part to a particular function,” also means adaptation in order to fit.When diversity is obtained via an iterative process such as learning or evolution, other reasons (e.g., noisein the replication mechanism) can also cause the system to become diverse. However, a system becomesspecialized when, given specific constraints of viability or survival at the agent level, its diversity is caused forbetter performance. Accordingly, a specialization metric should measure the part of diversity that enhancesthe performance.

When looking at a swarm system statically, it is impossible to identify the part of diversity that cor-responds to the performance improvement. We have to put the system into a dynamic process where itsperformance and diversity can change and interact. If the performance generally increases with higher di-

— Page 92 —


versity, the system benefits from being more diverse than the initial status, and the degree of specializationshould increase accordingly; otherwise, if the greater diversity does not help the performance, the degreeof specialization should decrease. That is, specialization can be measured along a dynamic process as aresult of the correlation between the diversity and the performance. If we assume the system starts from ahomogeneous setting with no diversity or specialization, and the diversity d and the swarm performance rchanges with time as correlated random variables, the correlation coefficient between d and r acts naturallyas the percentage of specialization in diversity. To put this in a formula, the degree of specialization can bedefines as

s = corrcoef(d; r)× d. (1)

Note that our specialization measure s is negative when d and r are negatively correlated.

3 Stick-Pulling Experiments

Ijspeert et al. (2001) investigated collaboration in teams of non-communicating robots engaged in a stick-pulling experiment (Figure 1, left). We call their experiment the original one since we will abstract andgeneralize it later in Subsection 3.2.

3.1 Original Stick-Pulling Experiment

In the original experiment, robots equipped with gripper turrets and proximity sensors search a circulararena and pull sticks out of the ground. The stick length has been chosen so that a single robot is incapableof pulling a stick out completely on its own, but collaboration between two robots is sufficient for this task.Each robot is characterized by a gripping time parameter (GTP) which is the maximal length of time thata robot waits for the help of another robot while holding a stick.

The behavior of a robot is determined by a simple program (Figure 1, right). The default behavior issearching for sticks, i.e., wandering in the arena until an object is detected. If a stick is detected, the robotpulls it up and determines whether another robot is already holding it by measuring the elevation speed ofthe gripper arm. If the elevation is fast, there is no other robot holding the stick and we call such a gripgrip1. Otherwise, the robot assumes that another robot is already holding the stick and therefore “braking”the elevation. Such a grip is named grip2.

After a robot makes a grip1, two cases can occur: either a second robot helps the first one before theGTP expires (we call this a successful collaboration) or the first robot times out and resumes the search forsticks. The specific values of GTPs play a crucial role in the overall stick-pulling rate (defined as the number

Obstacle Avoidance

Interference

Search

Grip & Wait

Success DanceCenterGrip2

successful

Wall

Stick

Robot

Grip1

unsuccessful

Figure 1: Left: Physical set-up for the stick-pulling experiment. Right: FSM representing the robot’s controller.Transitions between states are triggered by sensory measurements.

— Page 93 —


of sticks pulled out per unit time) which is the metric adopted in all previous papers5 (Ijspeert et al., 2001;Lerman et al., 2001; Li et al., 2002) and this paper for the swarm performance. To ensure the stick-pullingrate is reliably measured, experiments usually take a long time and a stick will be inserted back by theexperimenter after it is completely pulled out.

We use the microscopic model developed in (Ijspeert et al., 2001) as the simulation platform, whichrepresents agents as separate probabilistic finite-state machines (PFSM). The flowchart of a PFSM is basedon the blueprint of the corresponding real robot controller and its transition probabilities are computed usingsimple geometric considerations and systematic experiments with one or two real robots. Unlike macroscopicmodels (see for instance (Lerman et al., 2001; Martinoli & Easton, 2002) for the same experiment) whichintrinsically assume agents can be clustered into certain castes, microscopic models allow us to study issuesrelated to distributed learning and specialization since each agent is a separate PFSM. Furthermore, incontrast to other agent-based models, the way this model is constructed allows for quantitatively accuratepredictions while being four or five orders of magnitude faster than other popular simulation tools such assensor-based embodied simulations (Ijspeert et al., 2001). Therefore, although we have not tested our resultsusing real robots or realistic simulations, we believe that their validity is not limited to abstract agents.

3.2 Generalized Stick-Pulling Experiments

The strict collaboration property of the stick-pulling task has a major influence on swarm diversity andspecialization. In order to emphasize this effect, we abstract and generalize the original experiment so thata successful collaboration requires now k (> 2) robots instead of just two.

Sequential Collaboration: Pulling Longer Sticks One way to extend the original experiment isto assume longer sticks so that one robot can only pull a stick up by 1/k of its length. k consecutive grips,which may be called grip1, grip2, . . . , and gripk, respectively, are thus needed for pulling out a stick entirely.If the robot currently holding the stick times out, it will drop the stick so that further robots will have to startover from grip1. We call this type of collaboration required for pulling longer sticks sequential collaboration.Note that we do not really need more than two robots in order to complete the task. Theoretically, tworobots with very large GTPs are able to pull out sticks of any length but inefficiently, if they help each otheralternately.

Parallel Collaboration: Pulling Heavier Sticks Another way to extend the original experimentis to suppose the sticks are shorter but heavier so that one robot is too weak to lift a stick up. Exact krobots are needed simultaneously to lift a stick and pull it out. When a robot finds a stick, it grips the stickuntil timing out or until there are enough robots to lift the stick, whichever comes earlier. Robots do notreset their timers when a new robot joins the pulling. Distinguished from the sequential case, unless all therobots currently holding the same stick time out, the pulling process need not to be restarted from scratch.We call this type of collaboration parallel collaboration.

3.3 Learning Algorithm

We proposed and tested in (Li et al., 2002) an adaptive line-search algorithm and found that the algorithmcould achieve near-optimal performance in the original stick-pulling experiment under different conditions.In contrast to a gradient descent method, this algorithm neither requires the derivative to be calculated norassumes continuity in the search space. In this paper, we use the same algorithm for both the original andthe generalized stick-pulling experiments.

5The collaboration rate (the number of successful collaborations per unit time) was in fact used in the previouspapers. It is equivalent to the stick-pulling rate when exact one successful collaboration is required for a stick pull-out.

— Page 94 —


0 100 200 300 400 500 6000

0.5

1

1.5

Initial gripping time parameter (sec)

Stic

k−pu

lling

rat

e (1

/min

)

2 robots

3 robots

4 robots

5 robots

6 robots

2 3 4 5 60

1

2

3

4

5

6

7

Number of robots

Div

ersi

ty

LocalGlobal

2 3 4 5 60

0.5

1

1.5

2

2.5

3

Number of robots

Spe

cial

izat

ion

LocalGlobal

Figure 2: Results of the original stick-pulling experiment. Left: The dashed curves represent the performance ofhomogeneous teams with a fixed GTP; the solid curves show that of heterogeneous teams after learning underlocal reinforcement signal. Middle: The diversity under different reinforcement signals. Right: The specializationunder different reinforcement signals.

We use both types of reinforcement signals with the learning algorithm. The local reinforcement signalrewards an agent when it makes a successful collaboration, i.e., when it completely pulls out a stick orpasses the stick to another robot. The global reinforcement signal is the swarm performance. The twotypes of reinforcement signals “align” well in the original experiment as well as its parallel extension sincea successful collaboration means exactly a stick pull-out and vice versa. However, in sequential cases, asuccessful collaboration only contributes to but may not finally result in a stick pull-out, and without asupervisor or explicit communication, a robot will never know its true contribution unless it does the finalgrip. Thus the local reinforcement signal in sequential cases is not aligned with the global one.

4 Results

All the experiments we conducted started from a homogeneous system, i.e., a same initial GTP for allagents. During the experiments, agents could iteratively adapt their GTPs using either the local or theglobal reinforcement signal. The experiments lasted long enough for the learning to stabilize. Swarmperformance and diversity were recorded along the experiments using a time window so that specializationcould be measured via formula (1). We simulated 50 runs for each initial GTP and plotted the meandiversity and specialization over the runs. The error bars in all diversity and specialization figures representone standard deviation from the mean values.

In (Li, 2002), we suggested to use a difference measure of logarithmic form since both the performanceand the logarithm are less sensitive to GTP changes when GTP is large. That is, for two agents with GTPsg1 and g2 respectively, the difference between them is |log g1 − log g2|. This difference measure is used in allof our experiments.

4.1 Results of the Original Stick-Pulling Experiment

We started with the original stick-pulling experiment using the same settings as in (Ijspeert et al., 2001; Liet al., 2002), i.e., 2 to 6 robots and 4 sticks in an arena of 40 cm in radius. The learned performance underthe local reinforcement signal contrasted with the performance of a homogeneous team without learning isshown in the plot on the left of Figure 2.

The homogeneous team with a fixed GTP exhibited quite different behaviors depending on the ro-bot/stick ratio. When there were more robots than sticks, the stick-pulling rate increased monotonicallywith the GTP until reaching a plateau corresponding to the optimal rate for homogeneous teams. In other

— Page 95 —


words, since there were always robots “free” to help, waiting very long was a good strategy for robots holdingsticks. On the other side, when the number of robots was equal to or smaller than that of sticks, waitingin vain for a very long time may generate deadlock situations where every robot holds a different stickand waits for help. Previous research showed that specialization was desired particularly in this situation(Ijspeert et al., 2001; Li et al., 2002).

The stick-pulling rate of the learned system instead consistently achieved the same level independentof the initial GTP and almost always outperformed the rate obtained by the homogeneous team withoutlearning. We also tested learning with the global reinforcement signal. Probably due to high alignmentbetween the local and the global signals under the current task constraints, we did not observe significantdifference in the learned performance under these two types of signals.

The plot on the right of Figure 2 shows that specialization became much smaller for 5 and 6 robots thanfor 2–4 robots. This validates the deadlock phenomenon we just discussed, i.e., diversity is good for theperformance when there are equal or less number of robots than sticks, and becomes less relevant with theperformance when there are more robots than sticks. The diversity measure (Figure 2, middle) gave flattercurves and by itself cannot show this phenomenon clearly.

Since the local reinforcement signal is noisier than the global one, we expect that under the globalreinforcement signal truly specialized robots generate a larger portion of the diversity. This is validated inFigure 2 since the diversity under the global reinforcement signal dropped faster than that under the localreinforcement signal when the specialization was less relevant.

4.2 Results of the Generalized Stick-Pulling Experiments

In order to accommodate more robots required by the generalized experiments, we used a large arena of80 cm in radius, 16 sticks, and 6 to 24 robots. We simulated the generalized experiments with k from 3to 5. Probably due to the same “alignment” reason in the original experiment, no significant difference inperformance was spotted between the local and the global reinforcement signals in parallel cases. However,in sequential cases, the local reinforcement signal gained a small performance advantage over the global onefor almost all experimental settings.

Before looking at the specialization results (Figure 3), we had expected that the specialization in parallelcases would be higher than that in sequential cases.6 However, that happened only with large number ofrobots (say, 18) and the global reinforcement signal. An investigation of the learned GTPs shows that whenthe number of robots is small in parallel cases, all robots have similar GTPs (∼ 300 s) and the diversity islow. This gives us hints about the seemingly weird phenomenon.

We define the deadlock threshold as the maximal number of robots that could still incur deadlock. Whenthere are t sticks in the arena, the threshold is t in sequential cases and (k − 1)t in parallel cases. Ourexperience with the original experiment made us believe that specialization is high any time the number ofrobots is less than the deadlock threshold, which is not always true. Just as in a company having muchmore jobs than employees, when the deadlock threshold is much higher than the number of robots, eachrobot tends to have multiple roles, as every employee has to take multiple jobs. Since a robot has only oneGTP value, trying to specialize into too many directions just makes all GTPs similar and results in a lowdiversity, especially when k is large in parallel cases.

With the global reinforcement signal, when the number of robots is larger than the deadlock threshold,the decreasing of specialization was again observed.7 What was initially unexpected is that specializationachieved its maximum when the number of robots was measurably lower than the threshold. However, seeing

6Our arguments were: (a) In sequential cases, the requirement for robots doing grips before gripk is similar—theirGTPs are large enough for the next robot to come and take over. In parallel cases, k different GTP values may beinstead required—robots doing grip1 need the largest GTP and robots doing gripk need the smallest GTP. (b) Withthe same number of sticks, the parallel collaboration essentially requires more robots working simultaneously. Weknow from the original experiment that specialists may arise if there are insufficient robots compared with sticks.

7For parallel cases, since the threshold is much higher, we verified this with 2 sticks, 4 to 9 robots, and k = 4.

— Page 96 —


6 8 10 12 14 16 18 20 22 240

2

4

6

8

10

12

14

Number of robots

Div

ersi

ty

Sequential, LocalSequential, GlobalParallel, LocalParallel, Global

6 8 10 12 14 16 18 20 22 240

1

2

3

4

5

6

7

Number of robots

Spe

cial

izat

ion

Sequential, LocalSequential, GlobalParallel, LocalParallel, Global

Figure 3: Diversity and specialization in the generalized experiments with k = 4.

that the deadlock threshold is a pessimistic estimation since the agents cannot have infinitely large GTPs,the “real” threshold should be smaller.

5 Conclusions

This paper presented our initial effort to measure specialization in collaborative swarm systems. Specializa-tion is a mixed concept of both diversity and adaptation. We define specialization as the part of diversitythat is induced by the need of performance improvement. Our experiments with the original and general-ized stick-pulling experiments showed that specialization was more consistent and meaningful than diversitywhen properties related to performance and learning were under study. Our results validated some of ourintuitions about specialization in these collaborative experiments but also revealed some properties that weat first did not see.

Our specialization measure depends heavily on the underlying dynamic process. Different learningalgorithms might result in different specialization values even when the final learned systems are the same.Future work will be making the specialization measure more independent of the choice of the learningalgorithm, or more generally speaking, of the dynamic process in which diversity and swarm performanceinteract.

Acknowledgements

This work has been mainly supported by the Caltech Center for Neuromorphic Systems Engineering underthe US NSF Cooperative Agreement EEC-9402726 and the Northrop Grumman Corporation Foundation.Alcherio Martinoli is currently sponsored by a Swiss NSF professorship.

References

Balch, T. 1998. Behavioral Diversity in Learning Robot Teams. Ph.D. thesis, Georgia Institute of Tech-nology, Atlanta, GA.

Bonabeau, E., Dorigo, M. & Theraulaz, G. 1999. Swarm Intelligence: From Natural to Artificial Systems.Oxford University Press, New York.

Hayes, A. T., Martinoli, A. & Goodman, R. M. 2003. Swarm robotic odor localization: Off-line optimiza-tion and validation with real robots. Robotica 21(4): 427–441.

— Page 97 —


Ijspeert, A. J., Martinoli, A., Billard, A. & Gambardella, L. M. 2001. Collaboration through the exploita-tion of local interactions in autonomous collective robotics: The stick pulling experiment. AutonomousRobots 11(2): 149–171.

Lerman, K., Galstyan, A., Martinoli, A. & Ijspeert, A. J. 2001. A macroscopic analytical model ofcollaboration in distributed robotic systems. Artificial Life 7(4): 375–393.

Li, L. 2002. Distributed learning in swarm systems: A case study. Master’s thesis, California Institute ofTechnology, Pasadena, CA.

Li, L., Martinoli, A. & Abu-Mostafa, Y. S. 2002. Emergent specialization in swarm systems. Pages 261–266 in: Intelligent Data Engineering and Automated Learning — IDEAL 2002 (H. Yin et al., eds),vol. 2412 of Lecture Notes in Computer Science, Springer-Verlag.

Martinoli, A. & Easton, K. 2002. Modeling swarm robotic systems. Pages 297–306 in: ExperimentalRobotics VIII (B. Siciliano & P. Dario, eds), vol. 5 of Springer Tracts in Advanced Robotics, Springer-Verlag.

Mataric. M. J. 1998. Using communication to reduce locality in distributed multi-agent learning. Journalof Experimental and Theoretical Artificial Intelligence 10(3): 357–369.

Murciano, A., Millan, J. del R. & Zamora, J. 1997. Specialization in multi-agent systems through learning.Biological Cybernetics 76(5): 375–382.

Parker, L. E. & Touzet, C. 2000. Multi-robot learning in a cooperative observation task. Pages 391–401in: Distributed Autonomous Robotic Systems 4 (L. E. Parker et al., eds), Springer-Verlag.

Versino, C. & Gambardella, L. M. 1997. Learning real team solutions. Pages 40–61 in: DistributedArtificial Intelligence Meets Machine Learning: Learning in Multi-Agent Environments (G. Weiß.,ed.), vol. 1221 of Lecture Notes in Artificial Intelligence, Springer-Verlag.

— Page 98 —


Dynamic Polyethism in Social Insect Societies- A Simulation StudyDaniel Merkle and Martin Middendorf

Parallel Computing and Complex Systems Group, Institute of Computer Science,University of Leipzig, Augustusplatz 10-11, D-04109 Leipzig, Germany

E-mail: {merkle,middendorf}@informatik.uni-leipzig.de

Abstract

In this paper we study the dynamics of labor division in social insect societies through simulationstudies with threshold reinforcement models. The effects of variable demands for work, agedependent thresholds, and finite life span of the individuals are discussed. Moreover, it is shownthat the introduction of a threshold dependent selection process between the individuals duringtask selection to the model can lead to the occurrence of specialists an differentiation betweenindividuals as an emergent phenomenon that depends on the colony size.

Keywords: task division, threshold models, polyethism, simulation.

1 Introduction

Threshold models have been used successfully to study phenomena of behavior of social insect societies.Division of labor and spatial organization of work are prominent examples. Division of labor can be explainedwith stimulus-response threshold models where an individual has an internal threshold to the level of demandfor a certain task. If the individual encounters a task with a stimulus that is higher than the threshold ofthe individual it starts working on the task with high probability. Recently, threshold reinforcement modelshave been proposed which allow to model learning and forgetting effects (see the interesting paper (Gautraiset al., 2002) for an overview). Learning/forgeting is modelled by decreasing/invcresing threshold values.

In this paper we study the dynamics of labor division in an insect society through simulation studies. Wefirst reexamine the model that was introduced in (Gautrais et al., 2002) and give a different explanation whyspecialization occurs in this model. Then we extend the model and study the effects of differing demandsfor work and individuals that have a finite life-span. Moreover, the influence of age dependent thresholds isconsidered. Finally we show that the introduction of a threshold dependent selection process between theindividuals during task selection can lead to the occurrence of specialists (and differentiation for finite timeor finite age models) between the individuals as an emergent phenomenon that depends on the colony size.

2 The Threshold Reinforcement Model

In this section we describe the threshold reinforcement model that has been introduced in (Gautrais et al.,2002). It is assumed that there are N individuals and m tasks T1, . . . , Tm. Each task Tj has a associatedstimulus value Sj ≥ 0. Each individual i has an associated threshold value Θi,j for each task Tj such that0 ≤ Θi,j ≤ Θmax

j for a given maximal value Θmaxj .

In each time step an individual is idle or actively engaged in exactly one task for which it will do α unitsof work during the time step. An active individual becomes idle with probability p. When it becomes idleit is idle for at least one time step. An individual i that was idle starts to work on task Tj with probability(1/m)(S2

j /(S2j + Θ2

i,j). Observe, that the maximal amount of work that the colony can perform in one timestep on average per task is: Wmax = (N/m) · (α/(1 + p).

— Page 99 —


In each time step the threshold values are changed for each task Tj and each individual i according to thefollowing rules: i) if i is engaged in Tj then Θi,j = max{Θi,j−ξ, 0}, ii) otherwise, Θi,j = min{Θi,j +φ,Θmax

j }.ξ is the learning parameter and φ is the forgetting parameter.

In each time step the stimulus values are changed for each task Tj according to Sj = Sj + σj − Ej · αwhere σj = D ·Wmax, 0 < D ≤ 1 is the demand parameter, and Ej denotes the number of individuals thatare currently engaged in task Tj . Note that parameter D allows to model situations with different demandsfor work, e.g., for D = 1 the colony must work at full capacity in order hold the stimulus values at the samelevel.

In order to study the specialization of an individual the following measure was introduced for a systemwith 2 tasks. A period of work of an individual is defined as the time from the start of working at some taskuntil the next idle time. If the task an individual has worked on during a period of work differs from the taskof the next period of work this is called a transition. For some time period let Ci be the number of transitionsdivided by the total number of periods of work minus 1 for individual i. The degree of specialization Fi ofindividual i is measured by Fi = 1 − 2Ci. Observe that −1 ≤ Fi ≤ 1, Fi = 1 when an individual has notswitched between tasks (high specialization), Fi = 0 when an individual has switched randomly betweentwo tasks (no specialization), and Fi = −1 when an individual has switched alternately between the tasks.The activity of an individual over a time period is measured as the proportion of time steps it was workingat a task.

3 Reexamination of Colony Size Dependent Specialization

In (Gautrais et al., 2002) the degree of specialization (measured over all simulation steps) of individuals fora two task model was investigated for different colony sizes and different demands. It was shown that formedium (D = 0.5) or large (D = 0.8) demands specialists occur but only for colonies that are not too small(N ≥ 20). It was argued that the magnitude and time scale of fluctuations of the difference between thestimulus values (S1−S2) is the crucial factor because in small colonies large absolute differences are sustainedfor longer periods than in large colonies. The interpretation was that a high S1 − S2 will break down anyspecialization that individuals may have for task 2 because they are more likely to tackle task 1 (and solearn) while forgetting task 2 (and vice versa for high S2 − S1). In large colonies where the fluctuationswere less sustained the individuals have sufficient time working on the same task for the positive feedbackof learning to take effect. The conclusion was that differentiation in activity levels and specialization onlyoccurs when colony size exceeds some critical value.

In our simulation we obtained the same results but can not agree with the given interpretation ofthe results. In the following we give a different explanation and show that the reason why there are nospecialists in the small colonies has only to do with the specific colony size dependent situation duringthe first steps after the chosen initialization of the simulation model. Recall that for initialization thethresholds and the stimuli are set zero (or nearly zero) and the maximal threshold values are presumablyequal (Θmax := Θmax

1 = Θmax2 ).

Given this initialization during the first time steps in small colonies the stimuli will grow much slowerthan in large colonies because the value of σj , j ∈ [1 : 2] is proportional to colony size. The threshold valuesgrow independently of the colony size. Hence, the chance to start working for an individual and thereforethe activity level is much smaller when colony size is small. The effect is that for each individual in a smallcolony both threshold values will grow until they reach the maximum values Θmax

j . But such individualsare not specialists and all have a similar activity. The stimuli will continue to grow until the chances tostart working for an idle individual become so high (even with threshold values that are maximal) that thedemand for work can be satisfied by the colony. In later phases of the simulation an individual which hasnearly maximal thresholds values has nearly no chance to become a specialist (especially when the maximalthresholds are high).

In contrast in a large colony the stimulus values grow fast during the first steps after initialization. The

— Page 100 —


-0.1 0.0 0.5 1.0 5

10

100

1000

2000

-0.1 0.0 0.5 1.0 5

10

100

1000

2000

Figure 1: Specialization for different colony sizes N ∈ [5 : 2000] for D = 0.8, Tmax = 2000, with (right) andwithout (left) demand variation phase; dark colors indicate that a large fraction of individuals has the correspondingspecialization level

effect is that the activity of the individuals strongly increases during the first time steps. But when anindividual has started to work early at say task T1 then the chances that it will work again at task T1 inthe following period are high because its threshold values for task T1 and T2 are quite different (becausethe threshold for the task the individual is working on remains to be nearly zero and the other thresholdincreases each time step by φ).

Hence, we would not call the observed difference in specialization between small and large colonies anemergent phenomenon because is the consequence of the colony size dependent growth of stimulus valuesduring the initial phase of the simulation model (for the chosen initial parameter values). We made severalexperiments in order to back our explanation but can not describe all of them due to space limitations. Oneexperiment was to explicitly reset the threshold values of some individuals in small colony that has reacheda stable state to Θ1 = Θmax and Θ2 = 0. The results was that this individuals become specialists (withrespect to Fi) and remain specialists (over the observed simulation steps). Thus, specialization does notbreak down over the simulation time in the small colony.

In order to somehow remove the effect of the initialization phase we studied a system where the demandchanged between longer periods of very high demand and very small demand after initialization. After thisdemand variation phase the system is not longer dependent on the initial conditions (more exactly, the valuesof the initial stimulus and threshold values). The results show that there are nearly no differences betweensmall and large colonies in such systems with respect to activity levels or specialization.

Figure 1 shows the different behaviors of the system with and without demand variation phase forD = 0.8. When thresholds and stimuli are initialized (close) to zero specialization occurs only for largercolony sizes. But after a demand variation phase nearly all individuals have a specialization very close to 0.0no matter what the colony size is. Similar test runs showed, that the specialization is essentially independentof the colony size. If not stated otherwise, we will apply an initialization phase to the test runs and thecolony size will be N = 100.

Another aspect of the model we should mention and will discuss in the extended paper is that differen-tiation between individuals with respect to activity level or specialization is only a phenomenon that existswhen measured over finite times periods. Thus, all observed differentiation between individuals dependentsfundamentally on the simulation time because in the limit over infinite time all individuals in a colony haveexactly the same degree of specialization and activity. This is different in the model where individuals have

— Page 101 —


0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

Figure 2: Infinite life span: Specialization level for different demands; Tmax ∈ {20, 500, 1000, 2000} (from left toright); N = 100

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

Figure 3: Finite life span amax = 10000: Specialization level for different demands; Tmax ∈ {20, 100, 500, 1000}(from left to right); N = 100

a finite time live span (see next section) and where differentiation can occur independently from simulationtime (see Section 6).

4 Finite Life Span and Maximal Thresholds

In this section we introduce individuals with a finite life span to the model where the colony size is fixed anda constant inflow of new individuals occurs. Since new individuals might be different from older individualsthis can change the behavior of the model significantly.

For the simulation we assumed that each individual has maximal age amax. When an individual hasleft the system because its life time ended it is replaced by a new individual so that the colony size remainsconstant (initialization was done with individuals of random age). In the following we study a system thatis in a stable state after a demand variation phase. The specialization for a system with individuals of finitelife span is measured as the average specialization over all individuals that left the system.

We also study the influence of the size of the maximal threshold values. As has been argued largemaximal threshold values make it unlikely that an individual with threshold values that are nearly maximalwill ever become a specialist. This might be different in a system with small maximal threshold values. Itis not mentioned explicitly in (Gautrais et al., 2002) which maximal thresholds values have been used forthe simulations but results indicate values of about 1000. Here we also study systems with smaller suchvalues because for some natural systems small values are more realistic. A large threshold of size 1000 in

— Page 102 —


0

50

100

150

200

250

300

350

400

450

0 200 400 600 800 1000

max. threshold for task 1max. threshold for task 2

-250

-200

-150

-100

-50

0

50

100

150

200

250

0 200 400 600 800 1000

difference to exp. worksteps task 1differnece to exp. worksteps task 2

Figure 4: Age dependent maximal thresholds T 1max and T 2

max (left); difference between expected number ofworksteps and performed number of worksteps for each individual and both tasks for D = 0.8, N = 500(right);Tbase = 20, amax = 1000

combination with learning parameter ξ = 4 (forgetting parameter φ = 3.5) means that an individual has toimprove its skills by 250 learning steps (respectively ≈ 285 forgetting steps) until threshold 0 is reached whenstarting with threshold 1000 (respectively vice versa). Hence large maximal threshold values are realistic insituations where learning/forgetting is a slow process that leads over many different levels of skills. But forinsects many learning processes are fast and improve only over a few steps until the individual reaches itsfinal level of skill (analogously for forgetting processes). As an example consider experiments that have beendone to study learning of tactile patterns (Scheiner et al., 1999) and odor (Ben-Shahar et al., 2000) withhoney bees. It was shown that the maximal response value (measured by the proboscis extension response,PER) was reached after only about 5-6 learning steps and a pattern or odor has been forgotten after about5 contacts with a different pattern or odor. This means that for ξ = 4 and φ = 3.5 the maximal thresholdvalue should be about Θmax = 20 to model such cases.

We compared a system with infinite lifespan to a system with individuals of finite life span (see figures 2and 3 for results). We show only the results for colony size 100 (for other tested colony sizes from 5 to 2000the results are nearly the same). For infinite life span and extreme demand values (D ≈ 0.0 or D ≈ 1.0})no specialization occurs. If D is very small (D ≈ 0.0), both thresholds values of an individual usually growto the maximal value. If D is very large (D ≈ 1.0) the work load is so high that both threshold valuesof the individuals stay close to zero. Specialization does not occur for very high demand because the highthreshold value of a specialist for one task decreases its overall probability to work. For larger Tmax valuesand medium demand (e.g. Tmax = 1000, D = 0.8, N = 1000) some individuals specialize to some extent(because they have one threshold value near zero and the other is larger). But there are also individualsthat have both threshold values high. This is different for smaller Tmax values. The reason is that here theindividuals usually specialize to one task for some time, but as Tmax is small, the probability of switchingand specializing in the other task for some time is high. This results in a medium level of specialization.

For a system with individuals that have a finite life span the results show that for small values of Tmax

that the specialization level is very similar to the equivalent system with individuals of infinite lifespan. Butfor large values of Tmax highly specialized individuals occur, as they decide in the first steps in which taskthey specialize. As Tmax is very large, they will not switch back again.

— Page 103 —


0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

0 0.2 0.4 0.6 0.8 1

-0.1

0.0

0.5

1.0

Figure 5: Age dependent thresholds: Specialization level for different demands; Tbase ∈ {20, 100, 500, 1000}(from left to right), N = 100, amax = 10000

5 Age Dependent Maximal Thresholds

Age dependent thresholds are used to model natural systems which have an age dependent task division.Examples are honey bees where young workers work mostly within the hive while older workers becomeforagers and work outside of the hive. Age dependent behavior is a complex phenomenon that is influencedby social but also by genetic factors (Ben-Shahar et al., 2002). We study here a system where the maximumthreshold values are age dependent. It should be mentioned that some forms of emergent task succession canalso be explained with non age dependent threshold models (e.g. Bonabeau et al, 1999). The age dependentmaximal thresholds T 1

max and T 2max for task T1, respectively T2, for an individual of age a are defined by (see

also Figure 4) T 1max(a) = T

(2∗a)/amax

base + Tbase, T 2max(a) = T

2−(2∗a)/amax

base + Tbase where Tbase is the minimalmaximal threshold value. We study a system with maximal age amax = 1000.

Figure 5 shows the distribution of specialization levels for different values of Tbase. A high degree ofspecialization occurs even for small demands. Young individuals have a high (resp. small) maximal thresholdfor task T1 (resp. task T2). Note that, in contrast to individuals without age dependent thresholds, thethreshold for task T1 was initialized to the maximal value when the individual is born. Therefore it is veryunlikely, that an individual will work on task T1, when it is young. This can clearly be seen in the rightpart of Figure 4. For every individual in the colony the difference between the number of time steps theindividual has worked for a task and its expected number assuming that there is no age dependent influenceis shown. The expected number E(a) of worksteps that an individual should have worked at a specific taskat age a is defined as E(a) = (D · a)/(m · (1 + p)). Only for high demands the individuals can not specialize(to the different tasks during different age) because the high demand can not be satisfied by the colony (seeFigure 4). This shows that the colony can react flexible to high demands in that individuals of all ages workon both tasks.

6 Selection Pressure and Colony Size Dependent Polyethism

In this section we introduce a model that shows emergent colony size dependent polyethism and uses selectionbetween individuals as additional factor for task selection. In natural systems not all individuals that havedecided to work for a task will actually be able to because competition to work for the task will hindersome of the individuals. We assume that success in such a competition between individuals depends on theindividuals threshold for the corresponding task.

In the model we assume that from all individuals that have decided to work for a task in a time steponly a fraction of 1−ρ individuals with the lowest thresholds for the corresponding task are successful whereparameter ρ ≥ 0 defines the selection pressure. The non successful individuals become idle for that time

— Page 104 —


0.0 0.2 0.4 0.6 0.8 1.0

10

100

1000

0.20.4

0.60.8

1.0

specialization 10

100

1000

colony size

0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 6: Influence of selection: Specialization level for different colony sizes; left: infinite age Tbase = 20,ρ = 0.9, D = 0.5; right: finite age amax = 500, Tbase = 100, ρ = 0.5, D = 0.8 N = 100

step. Since on average less individuals will work compared to the model without selection we multiplied σand φ by (1− ρ) compared to the model without selection.

A reason why no strong specialists occur in small colonies (for suitable parameter values) is the largervariance in the decisions of the individuals. It happens with high probability that individuals which startto specialize slightly for one task are selected to work for the other task (because individuals with relativelylow thresholds accidentally might have decided not to try to work for this other task). Hence the emergingspecialization brakes down. For large colonies it is more unlikely that an individual which has a relativelylow threshold for one task due to random effects is selected to work for this task. But this means thatsmall specialization that happens due to random effects is enforced by the selection. For finite age systemsdifferentiation between the individuals can be observed. See Figure 6 for results (more details in the extendedversion of the paper).

7 Discussion

In this paper we have simulated task division in the threshold reinforcement model of (Gautrais et al.,2002). We reexamined the emergence of colony size dependent specialization and have shown that thecolony dependent specialization in this model is due to a colony size dependent stimulus increase duringthe initialization phase and that differentiation is a (simulation) time dependent phenomenon. We haveproposed an extension of the model with a threshold dependent selection between individuals during taskselection. It was observed for this model that the occurrence of specialists is an emergent phenomenon thatdepends on colony size. It will be interesting to study whether colony size dependent phenomena of taskdivision in nature can be explained with the help of threshold dependent selection processes. To make ourresults more independent from some unwanted initialization effects we introduced an initialization phase forthe simulations with varying demands.

Moreover we have studied the influence of individuals with a finite life span and age dependent maximalthreshold values. It was argued that small (compared to the learning an forgetting rates) maximum thresholdvalues could be interesting for modelling natural systems because many learning/forgetting processes arefast and show only few levels of skill. Such small maximal threshold values change the behavior of the systembecause individuals can change between specialization for different tasks relatively easily.

— Page 105 —


References

Ben-Shahar, B., Robichon, A., Sokolowski, M.B., Robinson, G.E. 2002. Influences of Gene Action AcrossDifferent Time Scales on Behavior. Science 296:741–744.

Ben-Shahar, B., Thompson, C.K., Hartz, S.M., Smith, B.H., and Robinson, G.E. 2000. Differences inperformance on a reversal learning test and division of labor in honey bee colonies. Animal Cognition3:119–125.

Bonabeau, E., Dorigo, M., and Theraulaz, G. 1999. Swarm Intelligence. Oxford University Press, NY.

Gautrais, J., Theraulaz, G., Deneubourg, J.L., and Anderson, C. 2003. Emergent polyethism as a conse-quence of increased colony size in insect societies. Journal of Theoretical Biology 215: 362–373.

Scheiner, R., Erber, J., and Page Jr, R.E. 1999. Tactile learning and the individual evaluation of the rewardin honey bees (Apis mellifera .L). Journal of Comparative Physiology A 185: 1–10.

Theraulaz, G., Bonabeau, E., and Deneubourg, J.-L. 1998. Response threshold reinforcement and divisionof labour in insect colonies. Proceedings of the Royal Society London B 265: 327–332.

— Page 106 —


Topology and Complexity of FormationsAbubakr Muhammad, Magnus Egerstedt

Georgia Institute of Technology, Atlanta, GA 30332, USA.Corresponding author : [email protected]

Abstract

Biological multi-agent systems such as animal herds, insect colonies and fish schools provide a lotof insight into the study and design of artificial multi-agent systems such as teams of autonomousmobile robots. Similarly, a lot can be learned about biological systems by borrowing design andanalysis tools from multi-agent robotics. In this paper some recent work in the area of multi-agent robotics by the authors has been summarized, which addresses some basic issues in themodelling of formations with limited sensory and communication capabilities. The basic idea isto model spatial relationships between agents as connectivity graphs. An information theoreticcomplexity measure of multi-agent formations is suggested, which is based on the complexity ofconnectivity graphs. The complexity measure helps find graphs and formations of the highestand the lowest complexities.

Keywords: Cooperative Control, Multi-Agent Robotic Systems.

1 Introduction

The interest in control and coordination of multi-agent robot teams has experienced a dramatic increaseduring the last few years (Egerstedt & Hu, 2001; Gazi & Pasino, 2003; Jadbabaie, Lin & Morse, 2003; Klavins,2002; Muhammad & Egerstedt, 2003b; Ogren, Fiorelli & Leonard, 2002; Saber & Murray, 2001). Some ofthe techniques developed for single agents, interacting with both structured and unstructured environments,such as trajectory tracking, nonlinear control, mapping and localization, are readily applicable in the multi-agent case as well. However, a number of challenges, stemming from the distributed and hence local natureof the information available to the individual agents in the formation, have presented themselves. In orderto look for inspiration when trying to model such systems, roboticists have increasingly begun to look tonaturally occurring systems, where distributed, multi-agent systems are abundant. These systems rangefrom human societies, where each agent is an extremely complex system in itself, and the social behaviortranscends beyond simple mechanical tasks, to lifeless physical systems made of agents like particles, atomsor molecules. The latter carry no intelligence themselves, but interact using simple physical laws, and giverise to complex adaptive systems as a group. Robotics can be characterized as finding its place somewherein between these two extremes. Needless to say, the comprehension of the complicated behavior of humansocieties may be the ultimate goal for a multi-agent system designer, but it is far too difficult and exists onlyin science fiction. The state of the art in multi-agent robotics is instead tackling significantly more humbleobjectives, like terrain exploration, coordinated building and manipulation, planning of team formations etc(Axelsson, Muhammad & Egerstedt, 2002; Balch & Arkin, 1998; Beard, Lawton & Hadaegh , 2001; Fierro etal., 2001; Lawton, Beard & Young, 2000; Mataric, Nilsson & Simsarian, 1995; Reif & Wang, 1999; Tanner,Pappas & Kumar, 2002). On the other hand, inspiration from lifeless physical systems is inadequate, asrobotic platforms available today carry so much computing power, sensors and communication capabilitiesthat they capable of much more than the imitation of simple “inverse-square” laws.

Group behavior is manifested in various biological systems as an impressive result of organic evolution.Examples can be found among social insects, animal herds, bacterial colonies, schools of fish, formations offlying birds, and so on. These group behaviors are very similar to the ones exhibited in multi-agent robotics,because of the following reasons:

— Page 107 —


1. Local Interactions: Individuals in animal groups interact only locally with their immediate neighborsand in many cases there is an absence of leader-hierarchy. This model is similar to fully decentralized,artificial multi-agent systems made up of identical members with limited individual sensor ranges.The emergence of a rich and complex global behavior from local interactions is of prime interest inmulti-agent robotics.

2. Simple Individual Behavior: In robotics, individual agents exhibit a relatively small number of simpleinteractions, which give rise to complex group behaviors (Reynolds, 1987). This model is relevantwhen studying animal groups in which the individuals interact in a small number of simple ways.

3. Communications: Communication is an essential part of coordination in animal groups and the phys-ical methods of exchanging information has a rich variety. In insects, for example, they communicatefor alarm and assembly, recruitment, recognition, signalling presence of food, grooming and a host ofother activities (Wilson, 1971; Bonabeau et al., 1997). Individual robots can also be equipped withcommunication channels for coordination. The issues concerning suitable information exchange forcoordination are still open and active research problems.

4. Group size, Complexity, and Randomness: In insect societies, it has been observed that individualbehavior is influenced by the colony size (Anderson & Ratnieks, 1999; Wilson, 1971). In largersocieties, individual workers tend to make their decisions by collecting advice from their neighbors,and performing some kind of averaging or voting. In smaller colonies, the worker relies more on itsown judgement than information exchanged from other workers. This explains why the insect’s actionsmay look erratic and seemingly random at the individual level, but give rise to order on a global leveland why bigger colonies seem more ordered. These issues are related to design criteria in multi-agentsystems, like how small should the tracking errors of individual control laws be, or what should be theresolution of the sensors, and finally how are these individual design factors related to the team size?

It can therefore be concluded that there is a remarkable similarity between the group behaviors foundin biological systems and the ones roboticists want their artificial systems to exhibit. Similarly, a lot can belearnt from the abstract artificial robotic systems when modelling behaviors for animal groups. In this paperwe present graph-theoretic models that are helpful to study the complexity and topology of formations inrobots that interact locally with their neighbors. From the discussion above, it can be expected that thesemodels might prove useful for the study of animal groups and social insects, as well as for understandingthe coordination of multiple mobile robots.

In many multi-agent systems, such as insect societies, animal herds, and robot teams, the individualagents can collect information about their environment and other agents by either peer to peer communi-cation or by relying on sensory information. Since any physical sensor is always limited by its range andresolution, or by calibration errors, the information available to each agent by direct observation (or stateestimation) is always limited and uncertain. Sensory limitations may also arise due to directivity patternsof sensors, e.g. the conic field of view of an eye or a camera, the radiation patterns of wireless antennas,sonars and lasers in robots.

Similarly, if we let the agents share information using peer to peer communication strategies, the possi-bility to convey and use global information is limited due to bandwidth limitations, weaker reception at largespatial distances, or the absence of feasible communication channels. This problem worsens as the formationsize increases, both in cardinality and spatial dimension. Hence no individual agent can be assumed to havecomplete knowledge about the states of every other agent. This limitation directly leads to the questionabout how the local interactions should be represented. An obvious choice is to let the existence of suchinteractions be represented by edges in graph-based models (Muhammad & Egerstedt, 2003a; Jadbabaie,Lin & Morse A, 2003; Saber & Murray, 2003; Tanner, Pappas & Kumar, 2002).

A natural way to model the limitations of interaction among agents is to define their regions of influence.A region is defined according to the sensory range of an agent or the maximum distance by which it cancommunicate with other agents. For robotics applications, this makes perfect sense, but this also holds for

— Page 108 —


Figure 1: Agents and their region of influence

biological multi-agent systems like social insects and fish schools, in which agents only interact with theirneighboring agents. Therefore, it is interesting to study the class of graphs, based on a limited regions ofinfluence. In a recent work by Muhammad & Egerstedt (2003b), the case was investigated when all agentslive in a two dimensional Euclidean space, and have similar circular regions of influence of radius δ centeredat their positions. The situation is similar to Figure 1, where ant 1, cannot interact with ants 5 and 6,because they are outside its region of influence, but it can interact with ants 2, 3, and 4. A graph can nowbe constructed, where nodes correspond to agents, and there exists an edge between two agents if the oneagent lies in another’s region of influence. These graphs have been named as connectivity graphs by theauthors (Muhammad & Egerstedt, 2003b). The space of all connectivity graphs on N agents is denoted asGN,δ ⊆ GN , where GN is the space of all possible graphs on N vertices. It can be immediately see that:

• The connectivity graphs are simple by construction i.e. there are no loops or parallel edges.

• They are undirected because all agents have the same radius for their regions of influence.

• The motion of individual agents in the formation may result in the removal or addition of edges inthe connectivity graph, and therefore the graph is a dynamic structure.

• Every graph is not a connectivity graph.

An arbitrary graph exists as a connectivity graph if it has a valid realization in the configuration space ofagents. Many realizations can correspond to the same graph. Although, this can be stated more rigorously,as given by Muhammad & Egerstedt, (2003b), the basic idea is straightforward. There are many interestingexamples of realizable and non-realizable connectivity graphs. If a graph is completely disconnected, itmeans that the distance between any two agents in the formation are separated by more than the distance δ.This can easily be achieved by placing each agent in such a way that it lies outside the regions of influenceof all other agents. Therefore all completely disconnected graphs are realizable as connectivity graphs. If agraph has many disjoint connected components, each connected component can be placed “far away” fromall other components so that none of the agents in one component lie in the regions of influence of agentsin other connected component. By this construction, a realization for this graph can be obtained if andonly if all components are realizable individually. Similarly, complete graphs, where an edge exist betweenall nodes can easily be produced, if the agents are placed very close to each other. Therefore, the study ofrealizability of graphs can be confined to connected graphs only. Using this technique one can now ask thequestion: when do graphs not exist as connectivity graphs?. Muhammad & Egerstedt (2003b), proved thefollowing theorem.

— Page 109 —


Theorem 1.1 The space of connectivity graphs over N agents GN,δ, is a proper subset of the space of allpossible graphs over N vertices GN , if and only if N ≥ 5.

The proof involves giving examples of graphs that cannot be realized for N ≥ 5. Examples of non-realizable graphs for 5 and 6 vertices are shown in Figure 2. That these graphs are not realizable, can beseen from geometrical arguments. In fact, the “star” graphs of the type given for N = 6, do not exist for allN ≥ 6, which completes the proof. This theorem helps understand that not all graphs are valid models formulti-agent formations.

Figure 2: Graphs that are not connectivity graphs.

The importance of this characterization of the space of connectivity graphs can also be understood asfollows. From the discussion above, it can be seen that totally disconnected graphs and totally connected(complete) graphs are trivially realizable. However these two extremes are not very interesting from abehaviorial point of view. The disconnected case corresponds to he situation where there is no interactionbetween agents. In the completely connected case however, the agents are packed so tightly that the systembecomes fully coupled and the global information is available at all agents. The central theme of multi-agent coordination i.e. global behavior from local rules therefore becomes irrelevant. Hence the situationof perhaps the greatest interest is between the two extremes when the graph is not necessarily completeor even connected, and when no strictly proper subset of the graph’s vertices is isolated from the rest. Itis precisely this class of graphs and their respective realizations, that give rise to the rich variety of globalbehaviors from simple local rules.

There are several results that have been proven by the authors about connectivity graphs. In anotherwork by Muhammad & Egerstedt (2003a), it has been shown how to obtain subgraphs of connectivity graphsthat resemble simplicial complexes, which are used in algebraic topology to distinguish between different“shapes.” However the most interesting results, from the point of view of biological multi-agent systems,describe the complexity of multi-agent systems in terms of the complexity of their connectivity graphs. Theresults of this study has been a motivation for designing algorithms to produce low-complexity multi-robotformations called δ-chains, that need a small number of interactions to maintain formation. The work oncomplexity of multi-agent systems is summarized below followed by some observations on its relevance tobiological multi-agent systems.

2 Complexity of Multi-agent Systems

It has recently been shown by Muhammad & Egerstedt (2003c) that the type of graphs called δ-chains havethe lowest complexity among all multi-agent formations. A δ-chain is a connected graph, which is also aHamiltonian path on all nodes. See Figure 3. If Xj represent the state associated with an agent 1 ≤ j ≤ N ,

— Page 110 —


Figure 3: δ-chain and complete graph for 7 vertices.

the intrinsic structural complexity of the multi-agent formation is defined as:

C(F) =∑

j

∑i�=j

Fi,j(Xj), (1)

where each Fi,j is the information flow at agent j due to agent i according to some given communicationprotocol. The information flow at an agent, is the time rate of information exchange taking place at thatagent due to either or both sensory perception or communication. It was also shown that the two modesof information exchange are equivalent from an information theoretic point of view. Also, the presence ofprotocols implies that every interaction is not active during a certain time period. Therefore the intrinsiccomplexity is bounded above by a quantity that assumes that all interactions are active for all time. Thisbound is in-fact a complexity associated with a broadcast protocol.

If ∆t is the minimum permissible time for information exchange in the system (due to either bandwidth,sensor update interval, or algorithm execution cycle), then it can be seen that protocols of synchronousinformation exchange, which are more complicated than the broadcast protocol, would result in a decreaseof the total information flow. Let us denote a formation as F = (X1, X2, . . . , XN ), and denote the complexityof a formation, associated with the broadcast protocol as CB(F), then

CB(F) ≥ CP (F),

where CP (F) is the complexity for some arbitrary protocol. CB(F) is therefore the worst case complexityassociated with a particular formation. The information flow of a remote state Xj at agent i, according tothis protocol, is

Fi,j(Xj) =I(Xj ;Zj,i)

kij∆t, (2)

where Zj,i is the sensory measurement of the sensor on board agent j, (or the equivalent virtual sensor forthe communication channel between agents i and j), and I(Xj ;Zj,i) is the information obtained about Xj

from Zj,i (Cover & Thomas, 1991), and kij is an integral multiple of ∆t so that kij∆t is the sensor updatetime.

We also showed that the complexity CB(F) is bounded above as

CB(F) ≤∑

j

∑i�=j

deg(vj)I(Xj ;Zj,i)

kij∆t,

where deg(vi) is the number of agents being sensed (or communicated with directly) by agent i. Further-more, if the states exchanged by all agents are of the same type and encoded in the same way, I(Xj ;Zi,j) = γ,then we can write

— Page 111 —


Figure 4: δ-chains and V-formations in bird flight. (Copied with permission from A. Filippone, UMIST, UK).

CB(F) ≤ γ

∆t

∑i

deg(vi) +

∑vj �∈star(vi)

deg(vj)

kij

.

Compare this to the complexity defined on a graph G = (V, E), in the context of molecular chemistry(Randic & Plavsic, 2002).

C(G) =∑vi∈V

deg(vi) +

∑vj∈V,vi �=vj

deg(vj)

d(vi, vj)

, (3)

where d : V × V → R+ is some distance function defined between vertices.Therefore it is easy to see that,

CB(F) ≤ γ

∆tC(G),

where G is the connectivity graph of the formation. This relationship leads to the following observation. Thecomplexity of the connectivity graph of a formation is a (tight) upper bound for the worst case complexityassociated with an arbitrary protocol of communication in a multi-agent formation. Therefore the study ofstructural complexity of multi-agent formations is closely related to the complexity of their connectivitygraphs. With these considerations in mind, the following theorem was proved by Muhammad & Egerstedt(2003c).

Theorem 2.1 If G is a connected connectivity graph on N vertices, then the complexity of the graph G isbounded above and below as

C(δN ) ≤ C(G) ≤ C(KN ), (4)

where δN is the δ-chain and KN the complete graph on N vertices.

This theorem gives the justification for studying δ-chains as low-complexity formations. The δ-chains areinteresting objects in the context of biological multi-agent systems. These chains are examples of formationsthat can be maintained with minimum coordination. One of the most interesting manifestations of thesechains can perhaps be seen in formation flight of birds, specially the V-formations. See Figure 4. Although,there have recently been studies that relate this type of formation flight to energy conservation (Weimerskirchet al., 2001) , nevertheless the aspect of minimum coordination is hard to overlook in this case. In othernaturally occurring multi-agent systems, δ-chains can be observed in queues, lines, caravans and flanks,

— Page 112 —


that can be maintained with minimum interaction between agents. This gives an additional leverage tothe complexity definition of Equation 1, which has also been shown to have a remarkable similarity withcomplexity measures for chemical graphs (Muhammad & Egerstedt, 2003). Therefore these complexitymeasures may be useful in comparing formations of various natural and artificial multi-agent systems.

3 Conclusions

The introduction of connectivity graphs for characterizing the local interactions in multi-agent formationsserves two purposes. First, since these interactions imply constraints on the movements of the individualagents, it is vitally important that the set of feasible formations can be characterized in a precise manner.This has been described as the space of all connectivity graphs for a fixed number of agents. Secondly, andperhaps more importantly, these graphs provide guidance as to how the information should flow betweendifferent agents in order for the team of agents to come up with plans for achieving global objectives ina decentralized manner. Therefore, these graph theoretic models help us to study important aspects inthe topology, complexity and coordination of multi-agent systems. This abstraction of multi-agent systemsmakes it possible to compare and relate behaviors in natural and artificial multi-agent systems and mayprovide useful in strengthening this connection, in addition to its original objective of advancing techniquesin design and implementation of autonomous robot teams.

Acknowledgments

This work was sponsored by NSF through the programs EHS NSF-01-161 (grant # 0207411) and ECSNSF-CAREER award (grant # 0237971).

References

Anderson C., and F. Ratnieks. 1999. Task partitioning in insect societies (I): Effect of colony size onqueueing delay and colony ergonomic efficiency. PAmerican Naturalist 154(5): 521–535.

Axelsson H., A. Muhammad, and Egerstedt M. 2003. Autonomous Formation Switching for Multiple,Mobile Robots. IFAC Conference on Analysis and Design of Hybrid Systems, Sant-Malo, Brittany,France.

Balch T. and R.C Arkin. 1998. Behavior-based formation control for multirobot teams. IEEE Transactionon Robotics and Automation 14: 926–939.

Bonabeau E., G. Theraulaz, J. L. Denebourg, S. Aron and S. Camazine. Self Organization in social Insects.Trends in Ecological Evolution 12: 188–193.

Beard, R.W., Lawton J. and Hadaegh F. Y. 2001. A coordination architecture for spacecraft formationcontrol. IEEE Transactions on Control Systems Technology 9: 777–790.

Cover M. and J. Thomas. 1991. Elements of Information Theory Wiley Series in Telecommunications.

Detrain C., J. L. Denebourg, and J. M. Pasteels. 1999. Information Processing in Social Insects. Birkhauser.

Egerstedt M. and Hu X. 2001. Formation constrained multi-agent control. IEEE Transactions on Roboticsand Automation 17: 947–951.

Egerstedt M., Muhammad A. and Hu X. 2002. Formation control under limited sensory range constraints.10th Mediterranean Conference on Control and Automation, Lisabon, Portugal.

Fierro R., Das A., Kumar V. and Ostrowski J. 2001. Hybrid control of formations of robots. ICRA 1:157–162.

— Page 113 —


Gazi V. and Passino K. 2003. Stability Analysis of Social Foraging Swarms. IEEE Transactions on Systems,Man, and Cybernetics: in press.

Jadbabaie A., Lin J., and Morse A. 2003. Coordination of groups of mobile autonomous agents usingnearest neighbor rules. IEEE Transactions on Automatic Control 48(6): 988–1001.

Klavins E. 2002. Communication Complexity of Multi-Robot Systems. Fifth International Workshop onthe Algorithmic Foundations of Robotics, Nice, France.

Lawton J., Beard R. and Young B. 2000. A decentralized approach to elementary formation maneuvers.Proceedings of the IEEE International Conference on Robotics and Automation 3: 2728–2733.

Mataric M., Nilsson M. and Simsarian K. 1995. Cooperative multi-robot box-pushing. Proceedings ofIROS, Pittsburgh, PA pp. 556–561.

Muhammad A., M. Egerstedt M. 2003a. On the Structure of Connectivity Graphs of Robot Formations.Technical Report, School of Electrical and Computer Engineering, Georgia Institute of Technology,Atlanta, GA.

Muhammad A., M. Egerstedt M. 2003b. Decentralized Coordination with Local Interactions: Some NewDirections. Workshop on Cooperative Control, Block Island, RI. (Submitted)

Muhammad A., M. Egerstedt. 2003c. On the Structural Complexity of Multi-Agent Robot Formations.2004 American Control Conference, Boston, Massachusetts, USA. (Submitted)

Ogren, P., Fiorelli E. and Leonard N. 2002. Formations with a mission: stable coordination of vehicle groupmaneuvers. Proc. 15th International Symposium on Mathematical Theory of Networks and Systems.

Randic M. and D. Plavsic. 2002. On the Concept of Molecular Complexity. Croatica Chemica ACTA. 75:107–116.

Reynolds C. 1987. Flocks, Herds, and Schools: A Distributed Behavioral Model. Computer Graphics21(4): 25–34.

Reif J. and Wang H. 1999. Social Potential Fields: A Distributed Behavioral Control for AutonomousRobots. Robotics and Autonomous Systems 27(3).

Saber R. and Murray R. 2002a. Distributed cooperative control of multiple vehicle formations usingstructural potential functions. IFAC World Congress, Barcelona, Spain.

Saber R. and Murray R. 2003b. Agreement Problems in Networks with Directed Graphs and SwitchingTopology. IEEE Conference on Decision and Control 2003.

Tanner H., Pappas G. and Kumar V. 2002. Input-to-state stability on formation graphs. Proceedings ofthe 41st IEEE Conference on Decision and Control 3: 2439–2444.

Weimerskirch H., Martin J., Clerquin Y., Alexandre P. and Jiraskova. 2001. Energy saving in flightformation. Nature 413: 697–698.

Wilson E. 1971. The Insect Societies. Harvard University Press, Cambridge, Massachusetts.

— Page 114 —


On Honey Bees and Dynamic Allocationin an Internet Server Colony

Sunil Nakrani1 and Craig Tovey2

1. Computing Laboratory, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK.Corresponding author : [email protected]

2. School of Industrial and Systems Engineering, Georgia Tech, Atlanta, GA 30332, USA.

Abstract

Internet server colonies co-host web services on servers that are leased by its customers on apay-per-use Service Level Agreement(SLA). Given multiple co-hosted web services, the problemencountered by a server colony is to allocate available servers to maximise revenue. The limitednumber of servers, costs of migrating from one service to another, and unpredictability of futureHTTP request loads pose a significant challenge for optimising server allocation.

Based on the many similarities between server allocation and nectar collection in honeybee colonies, we propose a new honey bee algorithm founded on self-organisation of honey beecolonies to allocate foragers among food sources. This decentralised algorithm dynamicallyallocates servers to satisfy the unpredictable HTTP request loads. We compare it againstan omniscient algorithm that computes an optimal allocation policy, a greedy algorithm thatuses past history as a guide to the future to compute allocation policy, and an optimal-staticalgorithm that computes omnisciently the best among all possible static allocation policies. Weuse HTTP trace data from a commercial service provider and synthetically generated HTTPrequests for evaluating performance of our algorithm.

The experimental results show that our algorithm performs better than static or greedy,indeed very close to the optimal omniscient level for highly variable request loads. In contrast,it is outperformed by greedy for some low variability access patterns. This suggests that realhoney bee colony forager allocation, which is suboptimal for static food sources, possesses acounterbalancing responsiveness to food source variability.

Keywords: web-servers, honey bee, foraging, self-organisation, Internet infrastructure, SLA, server farm,service performance management.

1 Introduction

The web landscape offers myriad Internet services such as online stock trading, banking, ticket reservations,auctions and so on. Consequently, Internet computing infrastructure is being increasingly relied upon forday-to-day operations by a rapidly growing fraction of human civilisation. Difficulty in provisioning serverresources for such services stem from the fact that user behaviour is difficult to predict and unconstraineddemand can create sudden surge in load (Chase et. al., 2001).

An emerging market trend in Internet computing is the proliferation of large data centres, based on com-mon hardware platforms, that co-host third-party web content on servers for a fee (IBM,2003; Verio,2003).This managed hosting model benefits from economies of scale and presents an opportunity for dynamiccapacity provisioning whilst shielding web-content owners from capital overhead and unpredictable demand.Given that the hosting fee may be negotiated on the basis of requests served, the hosting centre’s objectiveis to dynamically allocate finite servers to co-hosted services, taking into account switching costs, such thatrevenue is maximised.

— Page 115 —


In this paper, we model the dynamic server allocation problem and propose a biologically-inspired ap-proach to this optimisation problem in a managed Internet server colony. Specifically, our work has beeninspired by the study of honey bee colonies and the behaviour of forager bees, characterised by decentralisedand elementary interactions, that effect a complex collective behaviour to solve the problem of adequate foodcollection to ensure survival of the colony. The new Honey Bee algorithm models servers and HTTP requestqueues in an Internet server colony as, respectively, foraging bees and flower patches. We perform experi-mental work on a simulated web content hosting centre using trace data from a commercial service providerand trace data derived from a simulated web-server. Our results indicate that the honey bee algorithmadapts well to highly variable request loads.

The remainder of the paper is organised as follows: In section 2, we describe how honey bee coloniesdeploy forager bees to collect nectar amongst diverse flower patches. The server colony model and themapping of server allocation to honey bees forager deployment is given in section 3. The server allocationalgorithm based on honey bee foraging behaviour is described along with the greedy, omniscient and optimal-static algorithms, to which it is compared, in section 4. In section 5, we describe the simulation model used inthe experiments and compare the performance of the honey bee algorithm against other algorithms. Finally,in section 6, we speculate on how our results may point to a survival advantage conferred by the foragerallocation mechanism in real honey bee colonies.

2 Swarm Intelligence–Honey Bee Colony

Colonies of social Insects (bees, ants, wasps, termites) possess what has been classed as Swarm Intelligence.A broad definition of the term implies a sophisticated collective behaviour borne out of primitive interactionsamongst members of the group to solve problems beyond capability of individual members. Such coloniesare characterised by (i) Self-Organisation: Decentralised and unsupervised coordination of activities, (ii)Adaptiveness: Response to dynamically varying environment and (iii) Robustness: Accomplishing groups’objective even if some members of the group are unsuccessful. These group level adaptive properties,it has been suggested, lend themselves well to distributed optimisation problems in telecommunication,manufacturing, transportation and so on (Bonabeau et. al., 1999; Bonabeau and Meyer, 2001; Cicirello andSmith, 2001).

A model of self-organisation that takes place within a colony of honey bees has been presented in (Seeley,1995). Specifically, it describes interactions between members of the colony and the environment that leadsto dynamic distribution of foragers to efficiently collect nectar from array of flower patches(food sources) thatare capricious in terms of profitability to the colony. In brief, foraging bees visiting flower patches not onlyreturn to the hive with nectar but also with a profitability rating of respective patches. At the hive, foragerbees interact with receiver bees to offload collected nectar which also provides feedback on the current statusof nectar flow into the hive. This feedback mechanism sets a response threshold for an enlisting signal. Anamalgamation of response threshold and profitability rating (function of nectar quality, nectar bounty anddistance from the hive) establishes the length of the enlisting signal known as waggle dance.

The waggle dance is performed on the dance floor where inactive foragers can observe and follow. Ef-fectively, each active forager bee provides a feedback on her local flower patch whilst observing bees haveaccess to the set of attractive food sources being capitalised by the colony. However, individual foragers donot acquire the full set of global knowledge but rather randomly select a dance to observe from which theycan learn the location of the flower patch and leave the hive to forage.

The resulting self-organised proportionate allocation pattern, derived from multiple and proportionatefeedback on goodness of food sources, is described in (Seeley et. al., 1991) and validated by experimentalstudy on real honey bee colonies. The model given by (Bartholdi et. al., 1993) predicts a steady statepattern of forager allocation where rate of value accumulation equalises among the patches being exploited.Intriguingly, this pattern is not optimal (unlike the famed hexagonal comb structure). Note that this sub-optimality is with respect to a static problem.

— Page 116 —


3 An Internet Server Colony

There is a trend toward a web computational model where a single hosting provider manages contentdelivery of myriad Internet services to HTTP requests from global audience (IBM,2003; Verio,2003). Underthis paradigm, the content owners purchase resources on a pay-per-use Service Level Agreement(SLA) andhosting provider shares resources among its clients. The model benefits from economy of scale due to resourcesharing and insulation from maintenance and over-provisioning costs for its clients. Pay-per-request-servedis a common way in which pay-per-use SLA is negotiated between hosting provider and content owner.Such an SLA implicitly acknowledges that limited resources, migration costs, and uncertainty may lead tooverloading and lost requests (Jayram et. al., 2001). Consequently, the hosting provider is incented tomaximise the total value of served requests.

The prevailing architecture of choice for a server colony is an ensemble of commodity servers which arepartitioned into clusters (virtual servers) and configured to host web services (Brewer, 2000). IncomingHTTP requests for a given service are held on a service queue and spread by a load balancing switch toany server that belongs to the cluster hosting such a service. Any server in the cluster can respond torequests for service and a server from one cluster can be reallocated to another cluster by reconfigurationthereby providing scalability and fault tolerance (Fox et. al., 1997; Chase et. al., 2001). During servermigration from one cluster to another, it becomes unavailable due to time involved in scrubbing of existingand installation of new web application (Appleby et. al., 2001). As far as user audience of any service isconcerned, the server colony appears as a virtual server.

In our model, we map the Internet Server allocation problem to the honey bee forager allocation asfollows: We consider respective service queues as available forage sites i.e. flower patches. The dynamic andunpredictable nature of incoming HTTP requests arrival patterns, which is a function of user behaviour,is equivalent to flower patch volatility, which is a function of daily climatic and other changes. A unit ofallocation in the Internet server colony i.e. a single server, equates to a single forager bee. Therefore, agiven virtual server engaged in serving a specific HTTP requests queue can be thought of as a group offorager bees engaged in collecting nectar from a specific foraging site. Finally, we think of the ensemble ofservers as a colony of forager bees, and the service queues relating to co-hosted web services as sources ofnectar to be exploited profitably. Different services have different time cost and value-per-request-served,just as different flower patches have different forager-round-trip times and nectar quality. We believe thatthe principal characteristics of the Internet server problem match very closely to forager allocation in thecolony of honey bees. For example, in addition to the correspondences mentioned, the migration costmimics the changeover cost when a bee switches from one patch to another (Seeley et. al., 1991). Moreover,the self-organising property is well suited to managing resources in a large server colony where traditionaladministrative methods can be cumbersome and prohibitive.

4 Server Allocation Algorithms

This section details algorithms to allocate servers to competing co-hosted services. The basic challenge is todetermine server demand of each service based on its current HTTP request load. We assume there are Mgroups comprised from n servers, called virtual servers V S0 . . . V SM−1, and service queues Q0 . . . QM−1. Aserver si ∈ V Sj serving queue Qj is paid cj cents per request served.

4.1 Honey Bee

We present an allocation algorithm for an Internet server colony inspired by forager allocation in the honeybee colony. Let any server in the server colony be either a forager or a scout server. Also, let the dance floorbe represented by an advertboard and waggle dance be represented by an advert.

— Page 117 —


Table 1: Lookup table for adjusting probability of reading the advertboardProfit Rate P[Reading] ri

Pi ≤ 0.5Pcolony 0.600.5Pcolony < Pi ≤ 0.85Pcolony 0.200.85Pcolony < Pi ≤ 1.15Pcolony 0.021.15Pcolony < Pi 0.00

A server si ∈ V Sj , on completion of each request from Qj , will attempt with probability p to post anadvert on the advertboard with duration D = cjA, where A denotes advert scaling factor. Also, it willattempt with probability ri to read a randomly selected advert from the advertboard if it is a forager orrandomly select a V Sj , j : 0 . . . (M − 1) if it is a scout. The probability ri is dynamic and changes as afunction of forager/scout servers’ own profit rate and server colony’s overall profit rate. The profit rate, Pi,

for a server si is given by Pi =cjRi

Tiwhere Ri is the total number of request served by a given server in the

time interval Ti. The server colony’s overall profit rate, Pcolony, is given by Pcolony = 1Tcolony

∑(M−1)j=0 cjRj

where Rj denotes the total number of request served by V Sj in the time interval Tcolony. A server si ∈ V Sj

serving queue Qj determines its profitability by comparing profit rate Pi with Pcolony. It will adjust ri

according to the lookup table 1. In essence, a server is more likely to read an advert if it is serving a queuethat is low in profitability. We point out that this is effectively the same as in real honey bee colonies giventhat dancing foragers never get recruited to another patch and foragers use global information (hive’s nectarintake rate, time to find a receiver bee) to decide whether to dance or not. Thus, bees at profitable patchdecrease the chance of following another dance(advert).

4.2 Omniscient

The omniscient algorithm provides an upper bound on possible profitability. (It would be both information-ally impossible and computationally prohibitive in practise.) It computes, by dynamic programming, theoptimal server allocation given complete knowledge of future HTTP request loads. For an Internet servercolony, the recursive profit function is given by ft(π

∗t , At) = {P (π∗

t , At)+ft+1(π∗t+1, At+1)} for 0 ≤ t ≤ N−1

and ft(π∗t , At) = 0 for t ≥ N . where t denotes discrete time steps, π∗

t denotes optimal server allocation pol-icy, At denotes total HTTP request arrivals in the current time step t plus residual requests from time stept−1, P (π∗

t , At) denotes maximum profit made with optimal policy π∗t and arrivals At. For time horizon split

into N time steps, f0(π∗0 , A0) represents the maximum profit that is possible from time steps 0 . . . (N − 1))

with allocation policies π∗0 . . . π∗

N−1. It would be remiss of us not to point out that this algorithm is, both,time and space intensive as a function of problem size. For example, 50 server to be allocated across 3virtual servers, 11 interploation buckets, requires atleast 1.3 GB for interim result tables and exploration of167 million states per time step.

4.3 Greedy

The greedy allocation algorithm represents a standard conventional heuristic approach to the problem. Itallocates servers based on what would have been optimal during the preceding time period. The idea isthat the immediate past is the best available guide to the uncertain future. In particular, when a specifiedtime interval TA expires, the algorithm reallocates servers across the server colony based on the optimalprofit that could have been made given the HTTP request arrivals data from expired time interval and thecurrent allocation (thus taking migration costs into account). The optimal profit is given by Popt−Greedy =max{0≤i≤(k−1)} {P (πi, ATA)}, over k possible allocation policies with ATA denoting the total number of

— Page 118 —


request that arrived during the time interval TA. Thus, algorithm chooses, for time interval TA + 1, anallocation policy πi that maximises Popt−Greedy for the time interval TA.

4.4 Optimal-Static

The optimal-static algorithm omnisciently chooses the best from among all static (fixed) allocations. Thisreflects an upper bound on the current level of profitability of many SLA providers, which do not changetheir allocations more often than once a month. Their profitability will be lower than optimal-static valuebecause they cannot know the coming month’s HTTP request load in advance. It attempts to deter-mine the best static allocation policy for a time horizon split into N discrete time steps: Popt−stat =

max{0≤i≤(k−1)}{∑(N−1)

t=0 P (πi, At)}, over k possible server allocation policies, P (πi, At) represents the profit

made with server allocation policy πi and HTTP request load at time step t. The best static-optimal al-location policy is the one that maximises Popt−stat over 0 ≤ t ≤ N − 1 for time horizon split into N timesteps.

5 Experimental Results

In this section we describe the simulation model of the dynamic server allocation problem and presentexperimental results comparing the performance of honey bee algorithm with Omniscient, Greedy, andOptimal-Static algorithms using synthetic HTTP request data as well as trace data from a commercialservice provider. These experiments consider the case of an Internet server colony with 2 and 3 virtualserver(websites) composed from a total of 50 servers. We use the performance metric of total revenue earnedby the server colony from serving HTTP requests.

5.1 Simulation Model

We have developed discrete event simulation models for dynamic server allocation algorithms— Honey Bee,Omniscient, Greedy, and Optimal-Static. All models and algorithms are implemented in C++SIM(Little,1994) on Sparc Sun-Blade-100 running SunOS 5.9. The following assumptions are common to all models:All servers are homogeneous in terms of processing capacity and employ First-Come-First-Serve schedulingpolicy. The time to serve a HTTP request is exponentially distributed with a mean service time dependingon request type. Each server is paid a fixed revenue per request served, the amount depending again onrequest type. A reallocated server becomes unavailable for a fixed migration time (Appleby, 2001). Thistime is accounted by the fact that a server must be purged of current application and data, reloaded with anew application and data of the website to which it is allocated. A stream of HTTP requests arriving for aparticular virtual server(website) is held in a queue. Each request has a waiting threshold to receive serviceand on crossing this threshold, a request randomly chooses to keep waiting or balk. Each virtual serverhas an independent HTTP request stream. The trace data are from an international commercial serviceprovider (confidentiality agreement restricts us from disclosing their name), and from simulated data frominhomogeneous Poisson processes (Brown et al. 2002).

In the Honey Bee model, servers can be reallocated at any time. There is one server per virtual serverthat is designated as a scout and rest as foragers, reflecting the low proportion of scouts in real honeybee colonies (Seeley, 1995). A scout server randomly reallocates itself to any virtual server in the colonyat any time whilst forager servers randomly reallocates themselves in response to an advert read for aparticular virtual server. Each server(scout/forager) advertises its own virtual server by placing an adverton an advertboard with given time duration. The advertboards is kept up-to-date by purging adverts withexpired time duration. In Omniscient and Greedy, we make the assumption that servers are reallocated atthe beginning of each allocation interval and do not change during the remainder of this interval. For theOptimal-Static model, we make the assumption that servers allocated at the beginning remain unchanged for

— Page 119 —


the complete duration of simulation and, therefore, do not suffer any migration costs. All simulation modelsexcept Honey Bee and Greedy require an allocation policy which is computed offline using the request arrivaldata. For example, Omniscient policy is computed using dynamic programming. The common parametersfor all models are as follows– Total servers: 50; Revenue-per-request: 0.5 cent; Mean Execution Time: 15mS; Request Waiting Threshold: 10 seconds; Balk rate: 1.01 seconds and Balk Probability: 0.04. Specificparameters from each model as as follows– Omniscient: Server migration Time: 300 seconds, AllocationInterval: 900 seconds; Honey Bee: Advert Posting probability: 0.1, Scouts Per Virtual Server: 1, ServerMigration Time: 300 seconds; Greedy: Server Migration Time: 300 seconds, Server Allocation Interval:1800 seconds.

5.2 Experiments

Table 2 shows the revenue earned by the server colony over a 24 hour period for respective algorithmsconfigured as 2 and 3 virtual servers with synthetic HTTP request pattern as shown in fig. 1. Under thisenvironment, our algorithm adapts well to dynamic changes in the arrival pattern. It outperforms Greedy andOptimal-Static whilst being within 11.7% and 11.6% of the Omniscient for 2 and 3 virtual server configurationrespectively. We point out that for a low variability request pattern (with (peak:trough) 10:1/5:1), Greedyalgorithm outperforms honey bee by 1.27%/1.21% on the 2 and 3 virtual server configurations, respectively.

Table 2: Revenue for synthetically generated HTTP requestsRevenue($)

Algorithm 2-VS 3-VSActual Projected Actual Projected

Omniscient 970,636.00 991,298.00 1,119,400.00 1,159,039.55Honey Bee 868,949.00 — 1,003,050.00 —Greedy 836,077.00 — 968,087.00 —Optimal-Static 810,510.00 827,455.00 860,363.00 875,210.90

Table 3: Revenue from real Internet service traces.Revenue($)

Algorithm 2-VS 3-VSActual Projected Actual Projected

Omniscient 1,066,440.00 1,071,741.83 1,336,960.00 1,352,872.18Honey Bee 1,050,110.00 — 1,238,470.00 —Greedy 1,043,400.00 — 818,040.00 —Optimal-Static 844,822.00 845,076.00 1,108,360.00 1,108,554.58

Table 3 shows the revenue earned by the server colony over a 24 hour trace-driven simulation from HTTPrequest pattern that is derived from real Internet services as shown in fig. 1. For 2 virtual server configuration,HTTP traces are used from Services B and C whilst A, B and C traces are used for 3-VS configuration. Underthese request arrival pattern, Honey Bee algorithm outperforms Greedy and Optimal-Static for 2 virtualconfiguration and performs within 1.55% of the Omniscient. For 3 virtual server configuration, Honey Beealgorithm outperforms Greedy, and Optimal-Static whilst performing within 7.95% of the Omniscient.

To test the adaptability of our algorithm, we used HTTP request patterns that are synthetically generatedand range across varying degrees of inhomogeneous Poisson arrivals. We use a numerical scale to denotedegree of inhomogeneity(peak:trough) as follows: 1 : 1(0), 10 : 1(1), 15 : 1(2) and 25 : 1(3). The server colony

— Page 120 —


0

500

1000

1500

2000

2500

0 5 10 15 20 25

Mea

n R

eque

st/s

Time (Hours)

HTTP Requests Trace From Internet Services

Service-AService-BService-C

0

200

400

600

800

1000

1200

1400

1600

0 5 10 15 20 25

Mea

n R

eque

st/s

Time (Hours)

Synthetically Generated HTTP Requests

Syn-ASyn-BSyn-C

Figure 1: Average HTTP request access patterns.

is configured for 3 Virtual Servers. We normalise the performance of all algorithm to Omniscient and depictresults in fig. 2. Interestingly, for zero variability, honey bee performs well as all other algorithms despitethe constant migration overhead. This is attributed to the fact that servers at all three queues are earningrevenue at a rate that is equal to or better than the combined rate of the hosting centre which resultsin decrease in the probability of reading an advert and hence reduction in migration. As the variabilityis increased, honey bee performance degrades and converges to within 10-15% of the Omniscient whichsuggests its adaptability to dynamic load. We emphasise that all these results are for an untuned honeybee algorithm. That is, we chose the parameter values for the honey bee algorithm based on common-sensescaling reasoning, and froze those values before we ran any test cases.

0.75

0.8

0.85

0.9

0.95

1

1.05

0 0.5 1 1.5 2 2.5 3

Nor

mal

ised

Per

form

ance

HTTP Request Inhomogeneity

OmniHoney

GreedyOpt-Stat

Figure 2: Adaptability to inhomogeneous requests pattern.

6 Conclusions

In this paper, we proposed a new honey bee allocation algorithm based on self-organised behaviour of foragersin honey bee colonies, and the many similarities between the nectar collection problem faced by a honeybee colony and the revenue collection problem faced by an Internet server colony. Results to date supportedthe effectiveness of the algorithm, particularly in the highly dynamic and unpredictable (Arlitt and Jin,1999) Internet environment. The sub-optimality of the pattern of forager allocation in honey bee colonies,with respect to unchanging flower patches was mimicked by the sub-optimality of the honey bee algorithmcompared with the static algorithm, for test cases with low variability. By the same token, we speculate thata real honey bee colony is quite adaptive to changes in flower patch availability and quality, and that its

— Page 121 —


allocation pattern may be considerably less suboptimal when evaluated in a realistic dynamic environmentrather than a static one. One possible explanation for a performance difference is that static optimisationrequires equalisation of derivatives (marginal rates), implicitly requiring a marginal rate bee for each patch,thus limiting migration rates to one bee at a time. The honey bee algorithm trades off static optimality foradaptiveness. It has no marginal rate bee, but it has the ability to migrate several bees at once, increasingits responsiveness to changed circumstances.

Acknowledgements

We would like to express our gratitude to Tom Seeley for helpful discussions and insights on the innerworkings of honey bee colonies.

References

Appleby K., Fakhouri S., Fong L., Goldszmidt G., Kalantar M., Krishnakumar S., Pazel D. P., Pershing J.,and Rochwerger B., 2001. Oceano - SLA Based Management of a Computing Utility. 7th IFIP/IEEEInternational Symposium on Integrated Network Management.

Arlitt M., and Jin T., 1999. Workload Characterisation of the 1998 World Cup Web Site. Technical ReportHPL-1999-35R1. HP Laboratories.

Bartholdi III J. J., Seeley T. D., Tovey C. A., and Vande Vate J. H., 1993. The Pattern and Effectivenessof Forager Allocation Among Flower Patches by Honey Bee Colonies. J. theor. Biol. 160:23–40.

Bonabeau E., Dorigo M., and Theraulaz G., 1999. Swarm Intelligence: From Natural to Artificial Systems.Oxford University Press.

Bonabeau E., and Meyer C., 2001. Swarm Intelligence: A whole New Way to Think About Business.Harvard Business Review. 107–114.

Brewer E. A., 2000. Lessons from Giant-Scale Services. ttp://db.cs.berkeley.edu/~jm/cs262.

Brown L., Gans N., Mandelbaum A., Sakov A., Shen H., Zeltyn S., and Zhao L., 2002. Statistical Analysisof a Telephone Call Centre: A Queueing-Science Perspective. http://stat.wharton.upenn.edu/

~haipeng/.

Chase J. S., Anderson D. C., Thakar P. N., and Vahdat A. M., 2001. Managing Energy and ServersResources in Hosting Centres. 18th ACM Symposium on Operating Systems Principles (SOSP).

Cicirello, V. A., and Smith, S. F. 2001. Insect Societies and Manufacturing. I JCAI-01 Workshop onArtificial Intelligence and Manufacturing: New AI paradigm for Manufacturing.

Crovella, M.E., 1998. Generating Representative Web Workloads for Network and Server PerformanceEvaluation. Proceedings of Performance ’98/ACM SIGMETRICS ’98.

Fox A., Gribble S. D., Chawathe Y., Brewer E. A.,and Gauthier P., 1997. Cluster-Based Scalable NetworkServices. Proc. 16th ACM Symp. on Oper. Syst. Principles (SOSP-16). St. Malo, France.

IBM, 2003. http://www-3.ibm.com/services/e-business/hosting/managed_hosting.html.

Jayram T. S., Kimbrel T., Krauthgamer R., Schieber B., and Sviridenko M., 2001. Online Server Allocationin a Server Farm via Benefit Task Systems. 33rd ACM Symposium on Theory of Computing. July6–8, Hersonissos, Crete, Greece

Little, 1994. C++SIM. University of Newcastle Upon Tyne. http://cxxsim.ncl.ac.uk/.

Seeley, T.D., Camazine, S., Sneyd, J. 1991. Collective decision making in honey bees: How colonies chooseamong nectar sources. Behavioral Ecology and Sociology 28:277–290.

Seeley T. D., 1995. The Wisdom of the Hive. Pub. Harvard University Press.

Verio, 2003 http://hosting.verio.com/.

— Page 122 —


Development of Collective Control Architecturesfor Small Quadruped Robots Based on Human

Swarming BehaviorDaniel W. Palmer1, Marc Kirschenbaum1, Jon Murton1,

Ravi Vaidyanathan2 , Roger D. Quinn2

1. Dept. of Mathematics and Computer Science, John Carroll University, University Heights, OH, U.S.A.

2. Dept. of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH,U.S.A. Corresponding author :[email protected]

Abstract

We introduce a method of designing behaviors for swarms of micro-robots based on observationof human beings executing various tasks collectively. As a case study, we have focused on the de-velopment of decentralized control strategies specifically applicable to swarms of the Mini-Whegsquadruped robot. The design process consists of carefully mapping hardware requirements forthe robotic platforms in question, then tasking large groups (swarms) of human beings to per-form mission specific tasks within the constraints of the robotic vehicle. A basic software enginehas been developed and implemented to support on-line human swarm experiments in a vir-tual environment, with subsequent off-line algorithm extraction following for eventual transferonto robotic platforms. In our ongoing work, a variety of virtual robotic swarm experimentshave been performed, and various methods of algorithm extraction explored. Beyond swarmcontroller development, one of the most useful and practical aspects of this technology is thatit enables those involved in micro-robotic research to understand from a first hand perspectivethe issues involved in performing global tasks with limited sensor information. We believe thatthe mining of virtual human swarm behaviors will lead to the successful development of controlarchitectures capable of directing microrobot swarms in the field, as well as provide insights intothe social behavior of all manner of multi-agent systems.

Keywords: Microrobots, Swarm-Based Control, Collective Control, Biologically-Inspired, Human Swarming

1 Introduction

Highly mobile small vehicles, sometimes called micro-robots, are better suited for certain missions thanlarger vehicles. For example, they can aid in search and rescue because their diminutive size enables them tofit into tight spaces, such as those found in rubble and in caves. As another example, a group of small robotsprovide robustness through redundancy for remote missions such as extraterrestrial exploration. Mobilesmall robots are also appropriate for insect inspired research because their scale is similar to that of theinsect models.

Achieving effective use of micro-robots requires more than just controlling a single vehicle; the largerproblem, with a potentially greater payoff, resides in establishing coordinated behavior of groups (or swarms)of such vehicles. At present, however, there remains a noticeable gap between the development and simulationof swarm control strategies, and their implementation on robotic hardware platforms for field-use.

In this work the development of decentralized control strategies specifically applicable to swarms of theMini-Whegs (Morrey et al., 2003) quadruped robot is investigated. As a method of controller inspiration,we are examining the swarm behavior of large groups of human beings working under conditions analogousto that of the autonomous robots. By conducting multiple observations of human swarms performing

— Page 123 —


Figure 1: Photograph showing relative sizes of the Mini-Whegs IV Robot and a Blaberus gigantius cockroach.

constrained experiments, patterns of successful behavior emerge. Using these patterns, swarm algorithmscan be reconstructed and applied to control micro-robotic platforms. We believe that the mining of human-based swarm behaviors conducted under realistic hardware constraints will lead to the successful developmentof control architectures capable of directing micro-robot swarms in the field.

2 Micro-robot Control Parameters

In order for a control strategy to achieve field-viability, it is critical that it be designed to function withinconstraints analogous to those imposed by hardware platforms. This is especially true in the micro-roboticarena, where several limitations (power and mobility in particular), are very strict. An architecture relyingon any level of global communication, for example, is unlikely to be effective under these restrictions.

Collective control strategies developed in this work are designed with the highly mobile small robotscalled “Mini-Whegs” in mind (Morrey et al., 2003). They are derived from the larger Whegs series ofrobots, which benefit from abstracted cockroach locomotion principles (Quinn et al., 2002). Key to theirsuccess is the three spoke appendages, called “whegs.” which combine the speed and simplicity of wheelswith the climbing mobility of legs. To be more compact than the larger Whegs vehicles, Mini-Whegs usesfour whegs phased in an alternating diagonal gait. These 9 cm long robots can run at sustained speedsof over 10 body lengths per second and run over obstacles that are taller than their leg length. They canrun forward and backward, either right-side up or up-side down. Their robust construction allows them totumble down a flight of stairs with no damage and carry a payload equal to twice their weight. A jumpingmechanism has also been developed that enables Mini-Whegs to surmount much larger obstacles such asstair steps. Figure 1 shows a photograph of the Mini-Whegs IV robot.

In the human swarm controller development environment, each “agent” in the human swarm is consideredto be a single Mini-Whegs platform. Their mobility and power are constrained with the capabilities of therobot itself. Furthermore, their communication and sensing capabilities are also restricted to those currentlypossible on a small robot such as mini-whegs. Given these limitations the agents are then tasked to interactwith one-another in mission scenarios to achieve emergence.

3 Human Swarm Controller Inspiration

A circular dependency exists at the core of designing and building autonomous robots that can work togetherin swarms. Before developing swarm control algorithms in simulation, it is necessary to know the robot’scapabilities, yet before committing to a particular robot configuration, designers must be certain of its

— Page 124 —


sufficiency to support a collective control strategy. Of critical importance in this process is understandingthe relationship between the low-level local information-based behavior of individual agents and the high-level global actions of the swarm.

One escape route from this dilemma is an iterative, brute force approach that estimates the robotcapabilities and approximates a starting point for a control strategy. When this does not work, successiverefinement of both sets of parameters is necessary. While it is easy to modify or upgrade the virtual specsof a proposed robot, developing corresponding autonomous collective control strategies has proven to bevery difficult, time-consuming and costly. If the controller does not accomplish the intended goal, extensivetesting can give clues as to what changes to make, but the process is iterative and lengthy. Ultimately theapproach may not be viable without changing the robot’s capabilities, which can dictate re-developing thecontroller. Even if the control strategy succeeds, this is just one data point, a task-specific control strategythat works with a particular robot configuration. The goal is to develop a general purpose robot, therefore,the entire process must be repeated for many different scenarios to determine a viable working set of robotsensors and communication features.

We introduce a novel alternative approach in this work. Instead of developing many autonomous,experiment-specific, robot-specific controllers, we use human beings as the computational foundation ofsimulated robot swarms. The primary benefit of this architecture is the flexibility and adaptability of thehumans, allowing us to consider the widest range of possibilities with the least amount of development effort.

In our past work, we have demonstrated the general usefulness of leveraging human computation, inwhich up to 100 people were enlisted to work together, directly interacting in a large, enclosed space toaccomplish a wide variety of collective tasks (Palmer et al., 2003a,b). After recording the actions takenby human swarms, we have reverse-engineered the observed algorithms and successfully applied them tosimulated robotic swarms.

Human flexibility and adaptability is a strength we want to leverage, but it is also an obstacle. Humanshave sophisticated sensors and communication skills that can impede the reverse-engineering process. Forexample, hand waving, body language, facial expressions and other sometimes subtle cues all convey meaningand propagate information that cannot be easily captured by our observations or easily reproduced insimulation. To tailor the human reverse-engineering process for robots with restrictions in power, sensorsand communication abilities, the data-collecting bonanza of the human visual processing system must belimited. We accomplish this by physically separating the humans and creating a virtual human swarm.Connected remotely over a network, humans take in only the data provided by simulated sensors, they onlyinteract with each other over constrained communication channels and they can only perform simulatedactions that mirror the abilities of a proposed robot design.

4 Swarm Algorithm Development

This mix of unlimited human flexibility and adaptability constrained by realistic robotic limitations allowsthe accomplishment of several objectives in comparison to alternative controller development strategies: (1)it reduces the time, effort and cost of simulating and evaluating different scenarios; (2) it provides for theextraction of control algorithms by reverse-engineering the strategies employed by the humans; (3) it canbe used to determine the minimum onboard sensor and communication requirements necessary for manydifferent applications; and (4) it provides for evaluating the autonomous implementations of the extractedalgorithms once they are implemented. We examine each of these objectives in turn.

4.1 Reduction of time and cost in development

Leveraging the capacity of the human brain for logical reasoning, common sense and flexibility, dramat-ically reduces the time and cost to produce control strategies by eliminating the need for physical robotcollectives or sophisticated software simulations. Humans, interacting over networked computers, can drive

— Page 125 —


robots within a swarm based on the robot’s local information. The network enforces the limited sensorand communication abilities of the robots, allowing the humans to interactively produce control strategiesthat do not exceed their capabilities. This provides agents with the cognitive ability to understand naturallanguage and can therefore be quickly programmed to operate within the constraints of a particular scenario.

One of the powerful features of the virtual swarm is that it provides quick feedback since many differentscenarios can be tried and evaluated in a small amount of time. Humans are capable of taking a descriptionof an articulated goal and producing appropriate low-level actions. Thus, many configurations of sensors,battery power, bandwidth, and communication reliability can be easily investigated. To do this without ahuman “driver” requires either building autonomous intelligence into a simulation system, which is costly,and very time consuming, or building an actual swarm of physical robots which is much more expensive,much more time-consuming. This also severs the feedback loop between robot design and control strategydevelopment.

4.2 Algorithm extraction

The controller strategy will be developed in a four-step process. First, a virtual swarm is used with asimulated robot. Then, once the virtual swarm is able to successfully accomplish the intended goal of theexperiment, it is necessary to mine an algorithm to understand how the humans solved the problem. Third,the human controller must be replaced with a software controller, and finally, the simulated robots must bereplaced with their physical counterparts. In this section we focus on step two: algorithm mining.

A prototype system for implementing virtual human swarms is currently under development. The dis-tributed system controls what a human at a remote keyboard sees and hears, as well as, restricting andmonitoring their communication (see Figure 2B). For example, the software can enforce reduced sensor ca-pabilities, prevent global communication, and limit possible agent actions. The virtual swarm software canrecord all actions and communications during an experiment making it easier to reverse-engineer controlstrategies for the robot collective. By repeatedly replaying the recorded data, successful patterns of controlbecome apparent. Once the developers have a good understanding of what the controller must do, it is thenappropriate to expend the considerable effort and energy to encapsulate these behaviors in simple rule sets.The complexity of this task is made simpler by the fact that now the developers have a better understandingof how the swarm control strategy should behave. The human-centric system provides information in a moreuseful order than if the algorithms had to be developed first and then tested.

4.3 Sensor and communication design

A primary goal is to produce robots as cheaply as possible. To this end, the algorithm mining should try toproduce a solution that uses a minimal set of sensors and communication facilities.

Using humans as the computational intelligence of simulated robots provides an “upper bound” of thecapability of a swarm of those robots. Clearly, for a given configuration of sensors, power, and communica-tion, if human-level reasoning capabilities cannot produce a working solution, then an autonomous controlleris unattainable.

There is a complex relationship between the robot’s sensor capabilities, the robustness of its communica-tion (both in bandwidth and reliability) and the complexity of the tasks it can complete. A long-range goalof this research is to explore and describe this inter-dependency. The fast feedback of the human-centricsimulation platform is the only realistic way to collect enough data to see successful patterns in the solutionspace.

4.4 Testing and evaluation

The development of the human-centric system produces a customized testbed for the controller softwareas a natural consequence. The commands from the humans are simply replaced by ones generated by a

— Page 126 —


A: Human Swarm Experiment Snapshot B: Virtual Human Swarm Experiment Snapshot

C: Screenshot of Virtual Swarm Server D: Screenshot of Virtual Swarm Agent 1

E: Screenshot of Virtual Swarm Agent 2 F: Screenshot of Virtual Swarm Agent 3

Figure 2: Various snapshots and screenshots

— Page 127 —


controller. The rest of the system remains intact and global observation of performance of the autonomouscontrollers can be directly compared with their human counterparts. This again provides quick and usefulfeedback to help fine-tune and verify the controller software. This architecture also supports humans workingwith autonomously controlled agents. Such a hybrid swarm supports further testing of the extracted controlalgorithms and helps transition to a purely autonomous system. Earlier in the algorithm developmentprocess, a hybrid swarm can also allow experimentation with larger swarms than there are human drivers.In this scenario, the autonomous controllers will be given a default set of behaviors, which can be overriddenby directives from human controlled agents.

5 Virtual Human Swarm (VHS) System

Figure 2A shows a typical scene from a physical human swarm experiment. In these experiments the collec-tion of 100 humans are given descriptions of a physical goal and as a group, work towards reaching it. (Inthis example, agents were instructed to organize by color and form lines.) A major problem encounteredin physical human swarm experiments is the difficulty in controlling most aspects of the experiment’s en-vironment. When the humans are in each other’s physical presence, they can shout across the room, copybehaviors and read visual cues.

In Figure 2B, five humans control agents in a small virtual swarm. In this example, all participants werein the same room; however this is not a requirement. In actual practice, the participants will be isolated

Figure 3: A: Screenshots of Dispersion (top) scenario. Figure 3B: Screenshots of Congregation (bottom) scenario

— Page 128 —


from each other. Figure 2C shows a snapshot of the Virtual Human Swarm (VHS) server displaying a globalview of the environment. The solid black lines indicate impregnable walls, the gray lines in the upper rightcorner indicate obstacles, which can be traversed but at a slower rate, and the white area is the sensorregion for the beacon located at the center. The human agents only see a restricted portion of this overallenvironment.

Examples of the agent views are presented in Figures 2D-2F. The unbalanced hourglass shape shows theforward and backward sensing region corresponding to that of a mini-wheg. The greater forward sensingcapabilities are represented by the larger triangular area. The humans interact with the environment throughbuttons below the viewing area. Six buttons correspond to the six different movements that the mini-whegscan perform. (Forward, Backward, and since the mini-whegs can not make lateral movements the othermoves are Forward-left, Forward-right, Back-left, and Back-right.) The remaining button, Ping, allows theagent to locate other agents in their vicinity but outside their immediate sensor range through alternatemeasures such as sonar or radio. As evidenced in Figure 2D, there also is the possibility of communicatingwith other agents either in broadcast or point to point mode. The current implementation allows full chattingcapability but this will be restricted once we start collecting data. Figure 2D shows the agent right nextto the beacon. Figure 2E and Figure 2F show two other agents’ views. Each has another agent within itssensor field. These are indicated by the color square within the triangular sensor region.

Although it is premature to present results from the research, we give a flavor of the kinds of data weexpect to collect. Initially, we have several simple scenarios designed to determine the ability of humans tocomplete tasks in the VHS. Scenario one starts with all agents centrally located and the task is to disperse toa uniform density. Scenario two is the reverse: from a uniformly distributed starting point, all agents mustcongregate. No collecting point for the swarm is specified. Figures 3A and 3B show the general nature ofthe data we expect to collect. In the top row, from left to right, the agents can be seen starting out clusteredin the left portion of the screen, and spreading out to fill the region. The bottom row, starting at the leftfully spread out shows the agents converging together. Note that they regroup at a different location thanwhere they started.

Other scenarios include a “Find-the-beacon” exercise, Formation-formation and establishing a perimeterabout a given location. Figures 2C through 2F show the overview and three agent views of a “find-the-beacon” exercise. In Figure 2C, the beacon can be scene inside the square in the lower right side of thescreen. The circular region surrounding the beacon is a visual representation of the sensing region of thebeacon. Notice that it is a line-of-sight device, so it cannot be sensed through walls and only directionallyout the doorway.

As of the writing of this paper, the software is still under development. The screenshots presented herewere collected during system testing. We expect to complete the system and begin running experimentsshortly. We expect to collect and analyze data for presentation at the workshop.

6 Future Work

One of the most appealing aspects of this research is the ease of “programming/reprogramming” a virtualhuman swarm. High level commands are given and away they go. Eventually, if we make the interface asgood as what can be seen in video games, there is the possibility of having an endless number of swarmagents to perform our experiments. Our job will be to create virtual worlds that capture the environmentsour mini-whegs will face and to be creative enough to keep our human agents from losing interest. Thiscould be a cost effective method for the initial design of a robotic controller. There are already plans tobuild a 3D version of the software that will run continuously over the internet. Given a suitable structureof points and scoring opportunities, we may find that thousands of undergraduates are willing to contributeto this research. We also envision the possibility of mounting micro-cameras on the mini-whegs and havinghuman drive the actual hardware

— Page 129 —


Ackowledgements

The authors gratefully acknowledge Ms. Kelly Zajac for her software development efforts in the virtualswarm implementation.

References

Morrey, J.M., Lambrecht, B., Horchler, A.D., Ritzmann, R.E., and Quinn, R.D., (2003) Highly Mobile andRobust Small Quadruped Robots To appear in IEEE Int. Conf. On Intelligent Robots and Systems(IROS’03), Los Vegas, Nevada, 2003.

Quinn, R.D., Kingsley, D.A., Offi, J.T. and Ritzmann, R.E., (2002), Improved Mobility Through AbstractedBiological Principles. IEEE Int. Conf. On Intelligent Robots and Systems (IROS’02), Lausanne,Switzerland, 2002.

Palmer, D.W., Kirshembaum, M., Kovacina, M.A., Murton, J.P., Zajac, K.M. 2003a. Self-referential Bio-logical Inspiration: Humans Observing Human Swarms to Identify Swarm Programming Techniques.7th World Multi-Conference on Systemics, Cybernetics & Informatics (SCI), Orlando, FL.

Palmer, D., Kirschenbaum, M., Murton, J.P., Kovacina, M.A., Steinberg, D.H., Calabrese, S.N., Zajac,K.M., Hantak, C.M., Schatz, J.E. 2003b. Using a Collection of Humans as an Execution Testbed forSwarm Algorithms. Pages 58–64 in Proceedings of the IEEE Swarm Intelligence Symposium, April24-26.

— Page 130 —


Evolving Foraging BehaviorsLiviu Alexandru Panait, Sean Luke

Department of Computer Science, George Mason University, 4400 University Drive MSN 4A5,Fairfax VA 22030, USA. Corresponding author : [email protected]

Abstract

Insects are particularly good at cooperatively solving multiple complex tasks. For example,foraging for food far away from the nest can be solved through relatively simple behaviors incombination with communication through pheromones. As task complexity increases, however,it may become difficult to determine the proper simple rules which yield the desired emergentcooperative behavior, or to know if any such rules exist at all. For such tasks, machine learningtechniques like evolutionary computation (EC) may prove a valuable approach to searching thespace of possible rule combinations. The paper shows that by allowing simultaneous learn-ing of both the pheromone-depositing and pheromone-using policies, good performing foragingbehaviors can be obtained. Additionally, the learned foraging behaviors use only pheromoneinformation to find the path to the nest and to the food source, not requiring compasses orother complex mechanisms to return back to the nest.

Keywords: foraging, learning, evolutionary computation

1 Introduction

Artificial Intelligence has drawn many ideas from biology: evolutionary computation, neural networks, ro-botics, vision, and cooperative problem solving all steal liberally from mother nature. One such area ofparticular recent interest in AI has been in algorithms inspired from social insects such as ants, termitesand bees. The interest stems from the capacity of such simple organisms to collaboratively work together tosolve problems no individual one could. A good source of information for AI algorithms inspired from socialinsects is (Bonabeau et al, 1999). Much of the social-insect-inspired AI literature has focused on foraging andrelated tasks through the use of pheromones. The social memory mechanism of pheromones is an invitingparadigm for designing multiagent systems with blackboards, joint utility tables, and other global memorymechanisms. However, hand-coding of agent behaviors using this paradigm can prove problematic given theunexpected emergent group behaviors that arise. An interesting question is whether an automated machinelearning process can discover good performing pheromone-based foraging strategies.

Previous learning methods have only partially addressed this possibility, because they have tended tohard-code the pheromone depositing mechanism and only learn how to use the pheromone information.This paper shows that it is possible to have the entire foraging behavior discovered by the learning system.Both the pheromone depositing mechanisms and the foraging behaviors are simultaneously and successfullylearned in a series of increasingly difficult environments.

The paper proceeds with a description of previous work in learning foraging behaviors, followed by adescription of an evolutionary computation approach to the learning task. A set of three experiments inincreasingly difficult environments shows that good foraging behaviors can be discovered. A later experimentshows that behaviors learned for complex domains are robust in simpler environments. The paper ends witha set of conclusions and directions for future work.

2 Previous Work

This paper addresses the problem of central place food foraging, which consists of two main phases: aninitial exploration for food, followed by carrying it back to the nest (Sudd and Franks, 1987).

— Page 131 —


Deneubourg et al present a model for the path selection process of the Argentine ant Linepithema humile(Deneubourg et al, 1990). Additional studies of models for choosing the shortest path (Bonabeau, 1996;Bonabeau and Cogne, 1996) suggest that the emergent optimality is due to the fact that larger amountsof pheromones can more quickly accumulate on shorter paths rather than on longer ones. A technique formodeling army ant foraging patterns is presented in (Deneubourg, 1989), where Monte Carlo simulationsshow foraging patterns similar to ones observed in nature. Resnick presents a foraging model where antsadapt to food sources located at varying distances and which deplete with time (Resnick, 1994). Furtherinvestigation of the model is reported in (Nakamura and Kurumatani, 1997).

Several learning algorithms have been used in conjunction with pheromones. Some algorithms re-lated to reinforcement learning adopt a fixed pheromone laying procedure, but use the sensed informationabout pheromones while exploring the space or while updating state-action utility estimates (Leerink et al,1995; Monekosso et al, 2002). Evolutionary computation techniques have also been applied to learn explo-ration/exploitation strategies using pheromones deposited by hardcoded mechanisms. For example, Sauteret al show how EC can be used to tune the action-selection behavior in an application involving multiple“digital” pheromones (Sauter et al, 2002). A similar idea applied to network routing is presented in (Whiteet al, 1998).

AntFarm is another system that combines communication via pheromones and evolutionary computation(Collins and Jefferson, 1991, 1992). To our knowledge, AntFarm is the closest work to the algorithm presentedin this paper. AntFarm uses multiple colonies of homogeneous ants, with each colony in a separate 16x16grid environment. The ants use a single pheromone to mark trails toward food sources, but use a compass topoint themselves along the shortest path back to the nest. The system uses a neural network representationand evolves only the foraging strategies, while the ones for returning home are hardcoded. In the future worksection of the paper, Collins and Jefferson admit not having observed the evolution of cooperative behaviors(Collins and Jefferson, 1992). The authors suggest the problem comes from the incapability of the systemto discover how to create nice uphill gradients of pheromones for ants to use during the foraging process.

A common trend in all previously described algorithms is that the ants know how to return to the nest.This assumption is mainly based on Holldobler and Wilson’s reporting that ants use sophisticated naviga-tional techniques for this task, including orientation based on landmark memorization or using the positionof the sun (Holldobler and Wilson, 1990). However, we argue that most current robotics applications arestill far from that level of sophistication. Moreover, the discovery of pure-pheromone behaviors is appealingin that its analysis seems more likely to lead to useful applications of pheromone-like global memories toproblems for which such “hand-coded” hacks are of less utility. Last, by using only pheromone functions,we hope to move towards a formal description of the system as a variation of dynamic programming.

The work in this paper is concerned with learning foraging behaviors that can find food and nest locationsin relatively simple, obstacle-free environments. In an accompanying poster paper (Panait and Luke 2003),we present a hard-coded algorithm for ants to successfully forage in environments with obstacles.

3 Evolving Foraging Strategies

To evolve ant behaviors, we used a form of EC known as “strongly-typed” genetic programming (GP) (Koza,1992; Montana, 1995). In the common form of genetic programming, which we adopted, evolutionaryindividuals (candidate solutions) use a parse tree structure representation. Leaf nodes in the parse treereturn various external state values for the ant. Internal nodes in the tree are take the values passed themby their children and return the result of some function applied to those values. Crossover swaps subtreesamong individuals. In strongly-typed GP, type constraints specify which nodes may be children to variousother nodes: we used strong typing to enable a large set of available functions operating on vector, scalar,and directional information. Even so, the representational complexity available to the GP learner was

— Page 132 —


Pheromone Depositing Tree Function Descriptionscalar ←CurFoodPhLevel() Food pheromone at my locationscalar ←CurHomePhLevel() Home pheromone at my locationscalar ←LastDeposited() How much pheromone I deposited last timescalar ←DistanceFromSite() Number of time steps elapsed since I last visited the nest

(or food, depending on state)scalar ←MaxDistanceFromSite() Max possible distance from site (depends on the maximum

life time of ants)scalar ←MaxLocalFoodPheromone() Max food pheromone at my eight neighboring locationsscalar ←MinLocalFoodPheromone() Min food pheromone at my eight neighboring locationsscalar ←MaxLocalHomePheromone() Max home pheromone at my eight neighboring locationsscalar ←MinLocalHomePheromone() Min home pheromone at my eight neighboring locationsscalar ←MaxPheromone() Max amount of pheromone possiblescalar ←MaxPhomoneDividedByMaxDistanceFromSite() MaxPheromone() / DistanceFromSite()8

scalar ←MaxPheromoneDividedByMaxDistanceFromSite() MaxPheromone() / MaxDistanceFromSite()8

scalar ←Add(scalar, scalar) Add two scalarsscalar ←Sub(scalar, scalar) Subtract two scalarsscalar ←Max(scalar, scalar) Maximum of two scalarsscalar ←Min(scalar, scalar) Minimum of two scalars

Behavior Selection Tree Function Descriptionvector ←FoodPheromones() Amounts of food pheromones at the eight neighboring lo-

cationsvector ←HomePheromones() Amounts of home pheromones at the eight neighboring lo-

cationsvector ←AddV(vector, vector) Add two vectorsvector ←SubV(vector, vector) Subtract two vectorsvector ←Mul2V(vector) Multiply each component of a vector by 2vector ←Div2V(vector) Divide each component of a vector by 2vector ←SqrV(vector) Square each component of a vectorvector ←Sqrt(vector) Take the square root of each component of a vectordirection ←MinO(vector) Return the index of the smallest component of a vectordirection ←MaxO(vector) Return the index of the largest component of a vectordirection ←ProbO(vector) Return a random index, chosen using the normalized com-

ponent sizes as probabilities (+ .001)

Table 1: Function set for an ant’s GP pheromone-depositing and behavior-selection trees respectively. Functionsdepicted take the form of returnType←functionName(argumentTypes). Leaf nodes have no arguments.

significantly less than that afforded in the hand-coded design presented in our accompanying poster paper.The GP system uses three data types. The first data type is scalar, representing any real valued

information. An example of a scalar data is the level of food pheromones at the current location. A seconddata type is vector, representing a collection of related scalar values. The set of readings for food pheromoneslevels in neighboring locations is an example of vector data. The third data type is orientation, used by antsto decide to which neighboring location to move next. A data of type orientation can point to one of theeight neighbors of the current location.

In our approach, a GP individual consists of two trees: the pheromone-depositing tree and the behavior-selection tree. An ant is in two states: either he is laden with food, or he is not. The pheromone-depositingtree tells the ant how much pheromone to deposit; but the ant’s state tells it which pheromone to deposit.Additionally, the trees consist of nodes labeled by a given pheromone name (for example, MaxLocalHome-Pheromone (for the pheromone to the “home”, or nest). These labels are correct when the ant is not ladenwith food, but when the ant has food, the pheromones actually dealt with by these nodes are swapped. Thusfor example, when the ant is laden with food MaxLocalHomePheromone actually returns the max value ofthe local pheromone to the food, and not the nest. We did this because it had become obvious early on thatthis was a symmetric problem and we could thus heuristically reduce the search space: instead of having twopheromone-depositing and two behavior-selection trees (one for each state), by swapping the pheromones

8We had thought these would be helpful to the learning algorithm.

— Page 133 —


Figure 1: Evolution of performance in the 10x10 gridworld.

Figure 2: The emergent foraging pattern forLearnedBehavior10x10

we could use the same tree for both states.The root node of the pheromone-depositing tree was expected to return a scalar value (pheromone to

deposit); if the value was negative, the absolute value was used. The root node of the behavior-selectiontree was expected to return a direction (one of eight directions to move). Accordingly, these two trees werepermitted to be constructed out of two different sets of nodes. The sets are shown in Table 1. The functionsshown are admittedly simple; but for a first attempt we felt this was reasonable.

The algorithm the learning ants followed is:

Learning-Foraging-Behavior (executed at each time step)Call the first tree to select the desired level of pheromones to depositCall the second tree to select where to move nextDeposit the pheromones and move to the desired location

The experiments were implemented using the MASON (Luke et al, 2003) multi-agent simulator andthe ECJ (Luke, 2002) evolutionary computation framework. The parameters for the evolutionary com-putation system were: elitism of size 2, 100 individuals per population, minimum/maximum depth forRamped Half-and-Half tree generation of 2/4, minimum/maximum depth for Grow tree generation of 3/3,and re-attempting unsuccessful crossover operations 100 times before giving up and returning the originalparents. All other parameters have default values as specified in (Koza, 1992). The fitness of an individualis computed as the average performance of three trials. The performance in a single trial is calculated asFoodPickedUp+10∗FoodCarriedToNest. The parameters for the multiagent foraging simulation are: mini-mum/maximum amount of a given type of pheromone per location if 0/100, evaporation rate of 0.1%, anddiffusion rate of 0.1%. Because of demanding simulations, the evolution process was extremely slow. Thislimited the current experiment to only a single run consisting of few generations. We are aware that ad-ditional runs are required to make statistically significant conclusions; so this work should be consideredproof-of-concept only.

Likewise, this proof-of-concept experimentation relies on three assumptions that we plan to eliminate infuture work. The first assumption is that the agents can move to any of the eight neighboring locations (thiseliminates obstacles and requires the world to be toroidal. Second, ants die and new ants are created at thenest. Third, ants can not only add, but can also remove pheromones from the environment (the concept ofanti-pheromones was previously used in (Montgomery and Randall, 2002) to improve exploration and helpthe system escape from local optima).

— Page 134 —




We begin with three experiments which evolved ant-foraging behaviors in increasingly large worlds.

3.1 First Experiment

The first experiment concerned learning foraging behaviors in a small 10x10 toroidal grid world. Otherparameters for the ant foraging simulation were as follows: 501 simulation steps, food source located at(5,3), nest located at (7,7), ant lifespan of 50 simulation steps, one initial ant, one new ant per time step,and maximum 50 ants in simulation at each time step.

The performance of the best-so-far individuals and the average performance per generation are plotted inFigure 1. The graph shows that a good solution is discovered relatively easily, within two generations. Evenwith the new setting, very simple algorithms performed equally well in this simplified world. A behavior ofa well performing forager is shown below as LearnedBehavior10x10, and an emergent foraging trail in oneapplication of the specific learned foraging behavior is presented in Figure 2.

LearnedBehavior10x10

If carrying a food itemAdjust the amount of food pheromones to MaxDistanceFromSite − DistanceFromSiteMove to the neighboring location with most home pheromones

ElseAdjust the amount of home pheromones to MaxDistanceFromSite − DistanceFromSiteMove to the neighboring location with most food pheromones

3.2 Second Experiment

The second experiment concerned learning foraging behaviors in a larger 33x33 grid world. Other parametersfor the ant foraging simulation were as follows: 1001 simulation steps, food source located at (17,10), nestlocated at (23,23), ant lifespan of 50 simulation steps, one initial ant, one new ant per time step, andmaximum 50 ants in simulation at each time step.

The performance of the best-so-far individuals and the average performance per generation are plotted inFigure 3. The graph shows that a good solution is still discovered relatively easily, within three generations.This time, a simple evolutionary computation run discovered a more complex individual. A behavior ofa well performing ant forager is shown as LearnedBehavior33x33, and an emergent foraging trail in one

— Page 135 —




10x10 environment 33x33 environment 100x100 environment

LearnedBehavior10x10 2801.00 (27.91) 113.80 (281.78) 2.20 (3.52)



Table 2: The performance of evolved foraging behaviors across the three grid worlds. Numbers (mean performance,followed by standard deviation between parentheses) summarize food items collected and returned to the nest in10 runs. Bold numbers represent the statistically significantly better performances for a given grid size (down acolumn).

application of the specific learned foraging behavior is presented in Figure 4.


If carrying a food itemAdjust the amount of food pheromones to MaxPheromoneDividedByDistanceFromSiteMove to the neighboring location with minimum value for FoodPheromones−3∗HomePheromones

ElseAdjust the amount of home pheromones to MaxPheromoneDividedByDistanceFromSiteMove to the neighboring location with minimum value for HomePheromones−3∗FoodPheromones

The new individual contains a simple, but useful formula for exploring. Simply put, it guides the searchfor food by advancing to locations with more food pheromones (closer to the food source), but in the samewith fewer home pheromones (indicating farther from the nest). This improves the initial search process byguiding the ants away from the nest. The discovered strategy is a much simpler and often more effectiveforaging behavior than the one reported in (Panait and Luke, 2003).

3.3 Third Experiment

The third experiment concerned learning foraging behaviors in a 100x100 grid world. Other parametersfor the ant foraging simulation were as follows: 2501 simulation steps, food source located at (50,30), nestlocated at (70,70), ant lifespan of 500 simulation steps, one initial ant, one new ant per time step, and

— Page 136 —


maximum 500 ants in simulation at each time step.The performance of the best-so-far individuals and the average performance per generation are plotted in

Figure 5. The graph shows continual improvement over the first nine generations, suggesting incrementallymore complex foraging strategies are discovered. A relatively similar, still a bit more complex individualwas discovered. The behavior of a well performing ant forager is shown as LearnedBehavior100x100, andan emergent foraging trail in one application of the specific learned foraging behavior is presented in Figure 6.


If carrying a food itemAdjust the amount of food pheromones to Max(MinLocalFoodPheromone,

MaxPheromoneDividedByDistanceFromSite)Move to the neighboring location with minimum value for FoodPheromones−2∗HomePheromones

ElseAdjust the amount of home pheromones to Min(MinLocalHomePheromone,

MaxPheromoneDividedByDistanceFromSiteMove to the neighboring location with minimum value for HomePheromones−2∗FoodPheromones

3.4 Fourth Experiment

In the fourth experiment, we took the best evolved individuals from each of the three previous experimentsand tested them in all three environments. For each individual and each grid size, we performed 10 runs.The results are reported in Table 2.

As can be seen, the more difficult the training problem domain, the more general the solutions (theyperform equally well to other solutions particularly evolved for the simpler domains). This suggests thatfor simpler problems, there is no gradient toward more and more sophisticated foraging behaviors. Rather,simple enough such strategies perform as well as more advanced strategies, and the learning system is notcapable to distinguish among them. However, as the problem domain becomes more and more challenging,increasingly sophisticated and general foraging strategies are discovered. Additional experiments are requiredto support this hypothesis.


This paper presented an evolutionary approach to learning foraging behaviors. Ants learn both what quan-tities of pheromones to deposit, and also how to use this information for navigation. Even though therepresentation allowed for both incrementing pheromones and for setting them to specific values, all learnedstrategies used the current levels of pheromones and set them to exact values, instead of incrementing them.This is in line with the results in (Panait and Luke, 2003) which suggest that setting works better thanincrementing. However, further experiments are required to validate this hypothesis.

The results in the paper suggest that better results might be obtained when allowing the system toadjust both the pheromone laying and the foraging strategies. This suggests possible improvements overprevious work where the pheromone depositing strategy was hardcoded.

Our future work addresses possible speed-ups to the multiagent simulation, analysis of the representa-tional bias, eliminating the simplifying assumptions we made, formalization of the system as a dynamicalsystem, and further testing in more complex domain possibly involving obstacles, multiple possibly-decayingfood sources, and predator agents, all of which may require the system to develop specialized behaviors fordifferent ants.

— Page 137 —


References

Bonabeau, E. 1996. Marginally stable swarms are flexible and efficient. Phys. I France: 309-320

Bonabeau, E., and Cogne, F. 1996. Oscillation-enhanced adaptability in the vicinity of a biffurcation: theexample of foraging in ants. Proceedings of the Fourth International Conference on Simulation ofAdaptive Behavior: From Animals to Animats 4. MIT Press

Bonabeau, E., Dorigo, M., and Theraulaz, G. 1999. Swarm Intelligence: From Natural to Artificial Systems.Oxford University Press

Collins, R. J., and Jefferson, D. R. 1991. An artificial neural network representation for artificial organisms.Parallel Problem Solving From nature: First Workshop (PPSN I). Springer-Verlag

Collins, R. J., and Jefferson, D. R. 1992. AntFarm: towards simulated evolution. Artificial Life II.Addison-Wesley

Deneubourg, J. L., Goss, A., Franks, N. R., and Pasteels, J. M. 1989. The blind leading the blind: modellingchemically mediated army ant raid patterns, Insect Behavior 2: 712–725

Deneubourg, J. L., Aron, S., Goss, S., and Pasteels, J. M. 1990. The self-organizing exploratory patternsof the argentine ant. Insect Behavior 3: 159–168

Holldobler, B., and Wilson, E. O. 1990. The Ants. Harvard University Press

Koza, J. 1992. Genetic Programming: on the Programming of Computers by Means of Natural Selection.MIT Press

Leerink, L. R., Schultz, S. R., and Jabri, M. A. 1995. A reinforcement learning exploration strategy basedon ant foraging mechanisms. Proceedings of the Sixth Australian Conference on Neural Networks

Luke, S. 2002. ECJ 9: A Java EC research system. Available at http://www.cs.umd.edu/projects/plus/ec/ecj

Luke, S., Balan, G. C., and Panait, L. A. 2003. MASON: A Java Multi-Agent Simulation Library. Proceed-ings of the Second International Workshop in Mathematics and Algorithms of Social Insects, Atlanta,Georgia

Monekosso, N., Remagnino, P. and Szarowicz, A. 2002. An improved Q-learning algorithm using syntheticpheromones. From Theory to Practice in Multi-Agent Systems, Second International Workshop ofCentral and Eastern Europe on Multi-Agent Systems, CEEMAS 2001, Revised Papers, Springer-Verlag,Also Lecture Notes in Artificial Intelligence LNAI 2296

Montana, D. J. 1995. Strongly types genetic programming. Evolutionary Computation 3:199-230

Montgomery, J., and Randall, M. 2002. Anti-pheromone as a tool for better exploration of search spaces.Proceedings of Ant Algorithms: Third International Workshop, ANTS 2002. Springer-Verlag. AlsoLecture Notes in Computer Science LNCS 2463

Nakamura, M., and Kurumatani, K. 1997. Formation mechanism of pheromone pattern and control offoraging behavior in an ant colony model. Artificial Life V: Proceedings of the Fifth InternationalWorkshop on the Synthesis and Simulation of Living Systems. MIT Press

Panait, L. A., and Luke, S. 2003. Ant foraging revisited. Proceedings of the Second International Workshopin Mathematics and Algorithms of Social Insects, Atlanta, Georgia

Resnick, M. 1994. Turtles, Termites and Traffic Jams, MIT Press

Sauter, J., Matthews, R. S., Parunak, H. V. D., and Brueckner, S. 2002. Evolving adaptive pheromone pathplanning mechanisms. Proceedings of First International Joint Conference on Autonomous Agents andMulti-Agent Systems (AAMAS-02). Pages 434–440

Sudd, J. H., and Franks, N. R. 1987. The Behavioral Ecology of Ants. Chapman & Hall, New York

White, T., Pagurek, B., and Oppacher, F. 1998. ASGA: improving the ant system by integration withgenetic algorithms. Genetic Programming 1998: Proceedings of the Third Annual Conference. MorganKauffman Publishers

— Page 138 —


In the Shadow of the Binary Tree:of Ants and Bits

Zhanna Reznikova1, Boris Ryabko2

1. Institute for Animal Systematics and Ecology, Siberian Branch RAS, Frunze 11, Novosibirsk, 630091,Russia. E-mail: [email protected]; URL:http://fen.nsu.ru/~rezzhan/

2. Siberian State University of Telecommunication and Computer Science, Novosibirsk, Russia; E-mail: [email protected]; URL:http://www.ict.nsc.ru/~ryabko/

Abstract

The experimental approach based on ideas of Shannon Entropy and Kolmogorov Complexityis discussed as an effective tool to reveal how individual cognitive and communicative skillsinfluence collective decision making in group-retrieving ants.

Keywords: Ants, Information Theory, Animal communication, Kolmogorov complexity.

1 Introduction

In many recent works on self-organization in social insects the roles of positive/negative feedback and redun-dancy at the collective level have been emphasized. It has been thoroughly demonstrated that sophisticateddisplay of the collective intelligence of ant colonies may be generated by the interaction of local rules atthe elementary level of individual workers (Deneubourg et al., 1989; Franks, 1989, 2002; Gordon, 1995).Comparatively narrow cognitive skills are sufficient for making simple individual decisions such as, for ex-ample, choosing the shorter of two foraging paths. These algorithms allow a whole colony to develop aforaging strategy that prioritizes path length or food quality (Stickland et al., 1993). It was shown on armyants Eciton burchelli that the local movement rules of individuals following chemical trails, can lead to acollective choice of direction and the formation of huge three-line traffic system that minimize congestion(Couzin & Franks, 2003). In many cases simple algorithms may serve as general models for understandingtask allocation.

At the same time, some tasks in insect societies require communicative means for precise transferring ofinformation at the individual level and that hardly could be imagined as being based on local elementarychoices. For example, in group retrieving red-wood ants small foraging groups search for a certain leaf withan aphid colony within a tree crown (Reznikova & Novgorodova, 1998). Accomplishing of this sort of tasksis probably based on individual inter-relations within “cliques” (Anderson & Franks, 2001; Hart & Ratnieks,2001).

In order to estimate potential of ants’ communicative means, neither direct deciphering (like with hon-eybee dances), nor language training (like with apes and parrots) are easily available. We have suggestedanother approach based on ideas of information theory. It is natural to use information theory approach ininvestigation of communication systems, because this theory presents general principles and methods for de-veloping effective and reliable communicative systems. Surprisingly, applications of information theory havebeen incorporated in only a few studies of social insects. This method has already allowed the demonstrationof sophisticated communications in-group-retrieving ants (Reznikova & Ryabko, 1994; Ryabko & Reznikova,1996). Here the key principles of analysis are presented with the recently obtained data on different species.

2 Approaches, Methods and Materials

The main point in our approach is that the experiments provide a situation in which ants have to transmitinformation quantitatively known to the experimentalist in order to obtain food. This information concerns

— Page 139 —


the sequence of turns toward a trough of syrup. We used the laboratory set-up called “binary tree” whereeach “leaf” of the tree ends with an empty trough with the exception of one filled with syrup. The simplestdesign was the tree with two leaves. In this situation a discovering animal should transmit one bit ofinformation to other specimens: to go to the right (R) or to the left (L). In other experiments the number offorks in one branch increased to six. Hence, according to Shannon’s approach to measuring the quantity ofthe information, the number of bits necessary to choose the correct way is equal to the number of bifurcations.

The experiments were conducted in 1982–2003, with laboratory colonies of different species: 3 of Formicarufa, 2 of F. sanguinea, 1 of F. pratensis, 2 of F. cunicularia and 1 of Myrmica rubra. Ants were housedin plastic transparent nests that made possible to observe their contacts. All the ants were labeled withan individual color mark. The composition of ant’s groups was revealed during preliminary stages of theexperiments. In Formica s.str. the small cliques within the colonies were discovered which were composedof a “scout” and 5–8 “recruits” (foragers). In sum, 335 scouts of three species were used in the main trials.We placed a scout on a trough containing food, then it returned to the nest by itself. The scout had tomake up to four trips before it was able to mobilize the group of foragers. In all the cases of mobilization wemeasured (in sec.) the duration of the contact between the scout ant the foragers. When the group beganmoving to the maze we isolated the scout and the foragers had to search for the food by themselves. Toprevent access to the food in a straight line, the set-up was placed in a water bath, and the ants reached theinitial point of the binary tree by going over a bridge.

The experiments were devised as to eliminate all possible ways helpful to finding food, except informationcontact with the scout. To avoid the use of an odor track, the experimental set-up was replaced by an identicalone during staying of the scout within the nest. The fresh maze contained all troughs full with water inorder to avoid the possible impact of smell of syrup. If the group correctly reached the “sound” leaf of thebinary tree they were immediately presented with the food.

3 Results

3.1 Evidence of Information Transmission in Ants

The long-term experiments revealed information transmission based on distant homing within small constantcliques consisted of a scout and foragers in Formica s.str. Not all of the scouts managed to memorize theway towards the maze. The number of such scouts dropped with the complication of the task. In the caseof two forks, all active scouts and their groups (up to 15 per a colony) were working, while in the case ofsix forks, only one or two scoped with the task. It turned out that “capable” scouts were able to transmitinformation on absolutely different routs towards the goal during one daily experiment.

Evidence of information transferring from the scouts to the foragers came from two sets of data: first,from the statistical analysis of number of faultless findings of the goal by a group, and second, from specialseries of control experiments with “uninformed” (“naive”) and “informed” foragers.

The statistical analysis of number of faultless findings of the goal was carried out as follows. We comparedHypothesis H0 (ants find the “right” leaf randomly) with Hypothesis H1 (they find the goal thanks toobtained information), proceeding from that the probability of a chance finding of the correct way with inumber of forks is ( 1

2)i. For example, the probability of a chance finding of the correct way towards the

trough in the maze with three forks is ( 12)3. One of our series of trials with F. sanguinea that mastered

the binary tree with three forks, included 20 trials from which in three cases the group of foragers failed tofind the food; in 5 cases 1–3 ants were left behind the group and in 12 cases all foragers correctly reachedthose leaves, where their scout had found syrup. In these experiments a correct search can be considered a“success”, while an unsuccessful search, when the group failed to come or came in small number was calleda “failure.”

Thus, we have results of 20 independent Bernoulli tests, where the probability of “success” (P ) in thecase of the realization of H0 is ( 1

2)3, against H1 where P > 1

8.

— Page 140 —


In this series there were 12 “successes” and 8 “failures.” To verify H0 against H1 we used the binomialcriterion (the table in: Hollander & Wolf, 1973). In our case H0 was rejected in favor of H1, P < 0.001.We then analyzed different series of experiment (338 trials in sum), separately for 2, 3, 4, 5 and 6 number offorks. In all cases H0 was also rejected in favor of H1, P < 0.001.

Apart, the special control experiments were performed in which naive ants were tested in the maze.In F. sanguinea and F. rufa we compared searching results in the ants that had and had not previouspossibility to contact with the scout (the “informed” and “naive” scouts, correspondingly.) The naive and“informed” ants were allowed to search for the food during 30 min. In more agile F. pratensis the timespent on searching the trough by “informed” and “uninformed” specimens were compared, see below (theseseries were conducted in 2003 by T. Novgorodova as a part of our investigations of family structure in thisspecies, see: Novgorodova, 2004).

In the control experiments 26 uninformed F. sanguinea failed to find the food on the mazes with 4–6forks and only one ant found the goal by chance on the binary tree with 4 forks. In F. rufa there were 2successes versus 26 failures. It can easily be shown that the difference between frequencies of findings ofnaive and informed ants is significantly different.

In F. pratensis almost all “naive” foragers were able to find food on their own but they spent 10–15 timemore than those ants which entered the maze after the contact with the successful scout. Average values aswell as amounts of sampling are given in the Table 1. For every trial we used Wilkokson’s non-parametrictest (see: Hollander & Wolf, 1973) to examine a Hypothesis H0 (data from both samples follow the samedistribution) against H1 (they follow different distributions) at significance level 0.01. It turns out that theduration of searching time is essentially more in those ants that previously contacted with the successfulscout.

In sum, the obtained data confirm information transmission in three species belonging to subgenusFormica s.str. and exclude any orientation mechanism, except the use of information transmitted by thescouts.

No information transferring based on distant homing was observed in the rest two species. In singlyforaging F. cunicularia the foragers learned the way towards the maze, while making up to 30 trips per day,but they did not try to recruit other members of their colony. Not more than 5 ants were active in the mazeper day. M. rubra workers used olfactory cues, but when we changed the maze, they had to do withoutodour trails. In these cases they resorted to only solitary foraging, just as F. cunicularia.

Table 1: Comparison of duration of searching the trough by “uninformed” (U) F. pratensis antsand individuals that previously contacted with the successful scout (“Informed”, I); July-August,2003.

Sequence of the turns Ants (U/I) Mean Amounts of Sampling PRRRR U 36.3 9 < 0.01

I 345.7 9LLLL U 37.3 9 < 0.01

I 508.0 9LRRL U 16.6 7 < 0.01

I 118.7 7RLLR U 16.3 7 < 0.01

I 565.9 7

— Page 141 —


3.2 Using Shannon Entropy and Kolmogorov Complexity to evaluate antcommunicative and cognitive skills

The quantity of information (in bits), necessary for choosing the correct way toward the maze, equalsi, the number of forks. We assumed that this duration (t) was ai + b, where i is the number of forks,a - coefficient of proportionality, equals to the rate of information transmission (bit/min), and b was anintroduced constant, since ants can transmit information not related directly to the task, for example, thesimple signal “food.” From the obtained data, we evaluated the parameters of linear regression and thesample correlation coefficient (r). All values of r significantly differed from 0 at P = 0.01. The rate of theinformation transmission (a) derived from the equation t = ai + b, was 0.738 for F. sanguinea and 1.094 forF. rufa. We do not consider these values as species constants, they probably vary. Note that the rate ofinformation transmission is relatively small in ants.

Let us now count the total number of different possible ways to the trough. In a simplest binary treewith one fork there are two leaves and therefore two different ways. In a tree with two forks there are 22

ways, with three forks, 23 ways, and with six forks, 26 ways; hence, the total number of different ways isequal to 2+22 +23 + . . .+26 = 126. This is the minimal number of messages the ants must possess in orderto pass the information about the food placed on any leaf of the binary tree with 6 forks.

Another series of experiments was based on a basic concept of Kolmogorov complexity. When planningthese series, we assumed that in highly social ant species specimens possess such an important propertyof intelligent communicators as the ability to quickly grasp the regularities and use them for coding and“compression” of information. Thus the length of the text should be proportional to the complexity ofthe information. This idea is a basic concept of Kolmogorov complexity. This concept is applied to words(or text) composed of the letters of an alphabet, for example, of an alphabet, consisting of two letters, Land R . Informally, the complexity of a word (and its uncertainty) equals the length of its most concisedescription, according to Kolmogorov. For example, the word “LLLLLLLL” can be represented as “8L,”the word “LRLRLRLR” as “4LR,” while the “random” word of shorter length “LRRLRL” probably cannotbe expressed more concisely, and this is the most complex and has the greatest uncertainty.

We tried to analyse the question of whether ants can apply simple “text” regularities for compression(here the “text” means the sequence of the turns toward the maze). As proven by Kolmogorov (1965), thereis no algorithmically evaluated quantitative measure of text complication. So, strictly speaking, we can onlyverify whether ants have the “notion” of simple and complex sequences.

In the special series of experiments, F. sanguinea were presented with different sequence of turns. Com-parison of the main Hypothesis H0 (the time of the information transmission does not depend on the textcomplexity) with Hypothesis H1 ( this time actually depends on it) showed that the more time ants spenton the information transmission, the more information—according to Kolmogorov—was contained in themessage. It is interesting that the ants began using regularities to compress only quite large “texts.” Thus,they spent from 120 to 220 sec. to transmit information about random turn patterns on the maze with 5and 6 forks and from 78 to 135 sec. when turn patterns were regular. On the other hand, there was noessential significance when the length of sequences was less than 4.

4 Discussion and Conclusion

The majority of models consider cognitive skills and individual interactions for social insects redundant.However, another set of arguments comes from data concerning excellent learning capabilities of socialinsects. Bulk of obtained results concerns orientation and memory, mainly in bees and wasps. We supposethat the study of communicative means is one of the most effective tools to comprehend limits of intelligencein social insects at the individual level. For this, appropriate species have to be chosen as well as the adequateset of tasks for the colony to be solved.

From early experiments of Schneirla (1946) and Thorpe (1950) it was known that some ants and solitary

— Page 142 —


wasps perform almost as well as rats and dogs in maze-learning and detour tasks. Mazokhin-Porshnyakov(1969, 1984) experimentally demonstrated that the honey bees and social wasps are capable of abstraction,extrapolation and solving rather sophisticated discrimination tasks at a level of dogs and monkeys. Inparticular, several individually trained bees were able to distinguish between chains consisted of paired andunpaired small figures and thus capable for concept formation concerning twoness. Guirfa and colleagues(1996) found bees as being capable for concept formation concerning symmetry versus asymmetry.

Bees, wasps and ants possess several specific mechanisms, which should help them in the efficient percep-tion and learning (Collet et al., 1993; Menzel & Muller, 1996). For example, memory of landmarks passedby on flights between the hive and the feeding place is organized sequentially in honeybees so they haveto “count landmarks” (Chittka & Geiger, 1995). Behavioral flexibility allows social insects to switch theirlearning patterns in accordance with changeable conjuncture. For example, when the bees are preventedfrom learning landmark cues on arrival to a hive, they match for learning them during specific “turn-back-and-look” flight maneuvers (Lehrer & Bianko, 2000).

Ants are known to do many clever things such as using cognitive maps (Wehner,1990), learning byobservation (Reznikova, 1982, 2001) and even count ( Reznikova & Ryabko, 2001). Rosengren and Fortelius(1987) characterized red wood ants as “replete ants” storing, not lipids in their fat-bodies, but habitatinformation in their brains. In our endeavors to estimate the degree of flexibility of insect behavior, wewould, as Wehner (1999) noted, enter largely uncharted territory. But, as the same author concluded, wemight not have to wait long for some answers to emerge.

We consider some answers emerged from the obtained data on distant homing in Formica s.str. It isnaturally to expect that ant species with different colony design develop communication systems of differentlevels of complexity. Thus, Robson and Traniello (1998) found a major difference between mass-recruitmentant species that are characterized by the collective action of simple individuals, and the group retrievingones. In the latter case, a colony design is based on complexity rather than simplicity. The process ofrecruitment is so organized that the removal of the discovering ant leads to the dissolution of the retrievalgroup.

Elaborating possible approach for comparative investigation of communication, we found out that themost likely situation to observe task distribution and behavioral flexibility at the individual level and evaluatepotential properties of communication in group-retrieving ants is to force them to solve a complex searchproblem. In this situation pre-hidden processes of information transmission would become observable.

Using ideas of Information Theory to design experiments enabled us not only to reveal informationtransmissin from scouts to foragers within cliques but also to estimate the rate of this process and suggest thatgroup - retrieving ants are able to memorize and use simple regularities, thus compressing the informationavailable. This, in turn, enables us to assume flexibility of communication system in highly social ant speciesbased on individual cognitive capabilities that help colonies to solve many daily, but non-standard, problems.

Ackowledgements

Supported by Russian Fund for Basic Research (grants 02-04-48386 and 03-01-00495) by INTAS (grant no.00-738) and by the fund of Dr. Frank Salter (Humanethologie und Humanwissenschaftliches Zentrum derLudwig - Maximilians Universitat, Munchen).

References

Anderson, C. , Franks, N.R. 2001. Teams in animal societies. Behav. Ecol. 5: 534– 540.

Chittka, L., Geiger, K. 1995. Can honeybees count landmarks? Anim. Behav. 49: 159–164.

Collett ,T.S., Fry, S.N., Wehner, R. 1993. Sequence learning by honeybees. J. Comp. Physiol (A) 172:693–706.

— Page 143 —


Couzin, I.D., Franks, N.R. 2003. Self-organized lane formation and optimized traffic flow in army ants.Proceedings of the Royal Society of London, Series B 270: 139–146.

Deneuboug, J.L., Goss, S., Franks, N.R., Pasteels, J.M. 1989. The blind leading the blind: modelingchemically mediated army ant raid patterns. Journal of Insect Behav. 2: 719–725.

Franks, N.R. 1989 Army ants: A Collective Intelligence. American Scientist 77(2): 138–145.

Franks, N. 2002. Sociophysiology and decision making. Proc. of XIV IUSSI Congress, Sapporo, p. 32.

Gordon D.M. 1995. The development of an ant colony’s foraging range. Anim. Behav. 49: 649–659.

Hart, A.G., Ratnieks, F.L.W. 2001. Task partitioning, division of labor and nest compartmentalizationcollectively isolate hazardous waste in the leafcutting ants Atta cephalotes. Behav. Ecol.Sociobiol.49: 387–392.

Hollander, N., Wolf, D. 1973. Nonparametric statistical methods. John Wiley and Sons, New York.

Kolmogorov, A.M. 1965. Three approaches to the definition of the notion “quantity of information.”Problems of information transmission 1: 3–11.

Lehrer, M., Bianko, G. 2000. The turn-back-and-look behaviour: bee versus robot. Biological Cybernetics83: 211–229.

Mazokhin-Porshnyakov, G.A. 1969. Die Fuhigkeit der Bienen, visuelle Reize zu generalisieren. Ztschr.vergl.Physiol. 65: 15–28.

Mazokhin-Porshnyakov, G.A., Semenova, S.A., Lubarsky, G.Yu. 1984. Analysis of group visual behaviourof honeybees during foraging. Zhurnal Obshei Biologii 45: 79–87 (in Russian with English summary).

Menzel, R., Muhller, U. 1996. Learning and memory in honeybees: from behavior to neural substrates.Annu. Rev. Neurosc. 19: 379–404.

Novgorodova, T. 2004. Experimental investigation of group retrieving in Formica pratensis with the useof binary tree. Russian Journal of Zoology, in press.

Reznikova Zh.I. 1982. Interspecific communication among ants. Behaviour 80 ( 1–2): 84–95.

Reznikova Zh. 2001. Interspecific and intraspecific social learning in ants. Advances in Ethology 36: 108.

Reznikova, Zh..I., Novgorodova, T.A. 1998. The role of individual and social experience in ants relationswith symbiotic aphids. Dokladi Biological Science 359 (4): 572–574.

Reznikova, Zh., Ryabko, B. 1994. Experimental study of the ants communication system, with the appli-cation of the Information Theory Approach. Memorabilia Zoologica 48: 219–236.

Reznikova , Zh., Ryabko, B. 2001. A Study of Ants’ Numerical Competence. Electronic Transactions onArtificial Intelligence, section B1: Selected Articles from the Machine Intelligence 18 (5):113–126.

Robson, S.K., Traniello, J.F.A. 1998. Resource assessment, recruitment behavior and the organization ofcooperative prey retrieval in the ant Formica schaufussi. J. Ins. Behav. 11: 1–22.

Ryabko,B., Reznikova, Zh. 1996. Using Shannon Entropy and Kolmogorov Complexity to study thecommunicative system and cognitive capacity in ants. Complexity 2: 37–42.

Schneirla T.C. 1946. Ant learning as a problem in comparative psychology. Pages 276–305 in: Twentiethcentury psychology (P.L. Harriman, ed.). New York. Philosophical Library.

Stickland, T.R., Tofts, C. , Franks, N.R. 1993. Algorithms for ant foraging. Naturwissenschaften 80:427–430.

Thorpe, W.H. 1950. A note on detour experiments with Ammophyla pubescens Curt. (Hymenoptera:Sphecidae).Behaviour 2: 257– 263.

Wehner, R. 1990. Do insects have cognitive maps? Annnu. Rev. Neurosi. 13: 403– 414.

Wehner R. 1999. Spatial Representation in Small-Brain Insect Navigators: Ant Algorithms. Pages 242–258 in: Learning. Rule Extraction and Representation (A.D. Friederici and R. Menzel, eds.).WdeG,Berlin-NewYork.

— Page 144 —


Costs of environmental fluctuations andbenefits of dynamic decentralized foraging

decisions in honey beesThomas Schmickl, Karl Crailsheim

Department for Zoology, Karl-Franzens-University Graz, Universitatsplatz 2, A-8010 Graz, Austria.Corresponding author : [email protected]; [email protected]

Abstract

Honey bees show the impressing ability to choose collectively (swarm intelligence) betweennectar sources of different quality by selecting the energetic optimal one. We here presentresults from a multi-agent simulation of a cohort of foraging bees. The simulation allows us toinvestigate these collective decisions in a variety of setups and to project the daily net honeygain of the simulated bees. We are able to explore the economic results of foraging decisions in afluctuating environment: We investigated the dynamics and the efficiency of this decentralizeddecision system in terms of potential costs, which can arise from a fluctuating environment, andin terms of benefit, which is defined as successful prevention of these costs via adaptations ofthe foraging strategy.

Keywords: honey bee, multi agent simulation, modeling, collective decision, foraging, swarm intelligence

1 Introduction

The collection of nectar, its transfer to specialized food-storing bees and its processing into honey is a veryimportant working process in a honey bee colony. The collection of nectar is an energy consuming and riskyprocess, as flight activity consumes much energy and exposes the bees to potential predators. Honey beeshave developed the ability to collectively choose between nectar sources by selecting the optimal one: Thissource provides a maximum ratio of gain compared to costs (see equation 1, cf. Seeley, 1994). Both, thegain and the costs are measured in terms of energy spent respectively energy consumed:

sourcequality =gain[J ]− costs[J ]

costs[J ](1)

The whole decentralized decision process is based on competition among dancing bees, which guide new(naive) bees to their foraging targets. These guided bees also can perform dances after their successfulreturns, thereby providing a positive feedback loop. Bees dance longer and more often for better sources,leading to race conditions among these feedback loops. Finally nearly the whole group of foraging beesconverges to the optimal nectar source (see Seeley et al., 1991).

In addition to the selection of nectar sources, bees regulate their nectar intake rate accordingly to thecurrent nectar processing workforce (= decentralized workload balancing). Homecoming foragers considerthe queuing delays they have experienced while waiting for available food-storing bees. If this delay is long,foraging bees will gradually cease their recruitment dances. If the delay is very long, they will even stoptheir own foraging flights, thus lowering the collective foraging activity (Seeley, 1989; Seeley and Tovey,1994; Anderson and Ratnieks, 1999a,b; Ratnieks and Anderson, 1999; Hart and Ratnieks, 2001).

Both mechanisms (decentralized target selection and workload balancing) are very good examples forswarm intelligence, because the decisional component is not located inside the single individual bee. Incontrast to that, it is located in the overall system, arising as an emergent phenomenon of self organization.The model described in this article is the first model that uses multi-agent modeling and implements bothprocesses described above.

— Page 145 —


Figure 1: Behavioral states of the agents (bees) in our simulation and possible transitions between those states.

2 The multi-agent simulation

We produced a multi-agent simulation able to simulate the dynamics of honey bee nectar foraging as de-scribed in literature. It treats a cohort of up to 1000 foraging bees inside and outside the hive. Additionallyit simulates the nectar transfer to up to 1000 nectar receiving bees. Each agent (representing a foraging bee)behaves depending on its “behavioral state.” Figure 1 shows the possible behavioral states and the possibletransition between those states. Table 1 describes the behaviors associated to the behavioral states.

Table 1: Possible behavioral states of agents and their corresponding behaviors.

Behavioral State Behavioral Actions

In-hive Random walk inside the hive, set color to yellow, if nectar in crop has droppedto critical level: load a portion of honey or nectar into the crop

Starting-scouting Load a random amount of nectar, directly move towards the entrance, set colorto green

Scouting Random flight outside until nectar-load is on minimum levelOn-source Sit on source, load crop with nectar until it is full, set color to source-colorReturning Direct flight to the hives entrance, set color to source-color

Unloading Set color to brown, random walk in the hive until storage bee is available, addremaining nectar from crop to the honey stores

DancingRandom walk in the hive, set color to source-color, offering information aboutthe source (direction and distance)

FollowingChoose one of the dancing bees within the hearing-radius, receive the source-information (error applied), set color to source-color

Starting-foraging Load needed amount of nectar, directly move towards the entranceForaging Direct flight towards the target-source

Foraging-search Fly circles around the target-place until search-time is over

2.1 Transitions between behavioral states

During simulation runs the simulated bees switch from one behavioral state to another. These state transi-tions can be induced through several ways:

— Page 146 —


Figure 2: The probability to dance for or to abandon the source depends on the unloading delay encountered bythe homecoming foraging bee. (a) Empirical results redrawn from Seeley (1992). (b) The thresholds used in oursimulation: P(abandon) is modeled using equation 2 and Θ = 40, n=5. P(dance) is modeled using equation 3and Θ = 10, n = 2.5.

1. Fixed time delays: e.g. from the state “on-source” to the state “returning.”

2. Fixed events: e.g. in the state “returning” the arrival at the hive’s entrance will automatically changethe state to “unloadin” (after a successful flight) or to the state “in-hive” (after a flight withoutsuccess).

3. Thresholds: Several internal or external (local) stimuli determine the bees transitional probability viathe stimulus-response-system described below.

The important behavioral transitions are modeled via a stimulus-response-system. During each timestep, each bee collects a set of local (or internal) stimuli, leading to a transitional probability, which is afunction of the stimulus strength. In most cases we use a sigmoid function (see equations 2, 3; cf. Bonabeauet al., 1999; see figure 2). In the case of the dancing probability we use a linear function of the sourcequality, as suggested by Seeley (1992). The parameter s represents the stimulus strength and models the“steepness” of the curve.

P (s) =sn

sn + θn(2)

P (s) = 1− sn

sn + θn(3)

Despite these stimulus-response curves, several other threshold curves model the transitions betweenother behavioral states in our simulation: The dancing probability and the abandon probability are addi-tionally influenced by the source quality and by (adjustable) individual differences among the bees. Theprobability to change from a naive bee to an interested dance-following bee, the probability to spontaneouslystart a foraging flight and the probability to spontaneously start a scouting flight are modeled using similarthresholds in our simulation.

2.2 Error in communication leads to wrong guiding of bees

In our simulation, the communication of the positions of nectar sources is affected by an adjustable amountof error noise. Therefore recruited bees do not immediately find the correct place, so they often fly specificsearch patterns (circles). We assume that highly frequented food sources are easier to locate for searchingbees than vast food sources. Therefore, we introduced an environmental factor called “local attraction.”This attraction can be seen in figure 3, as a surrounding of the upper food source. Each additional bee ona food source increases this local attraction, which has a guiding effect for searching bees, as long as they

— Page 147 —


Figure 3: The user interface of the simulation: It can either run inside of the NetLogo environment or in acommon web browser after publication on a web server. The “world” of the simulation contains the dancing areaof the hive (in a zoomed view) and the outside environment (approx. 800 meters outside of the hive). Eachsimulated bee is always visible on screen, colors code her state and the information she carries. The user canplace nectar sources (max. 3 sources) anywhere in the environment (black area) and it is possible to changetheir concentrations even at runtime. Several intrinsic properties of the bees can be adjusted via the sliders inthe lower left part, the time-plots and histograms located on the right side provide information about the bees’states and behaviors. They get updated every second, giving a valuable “live insight” into the colony.

are within a radius of 3 patches to the source. This corresponds to approx. 7 meters in reality. This localattraction can be interpreted as sound guidance or guidance through odors. As soon as there are no bees atthe source this attraction field diminishes quickly.

2.3 Keeping track of nectar gain and foraging costs

Our special interest lies in the energetic efficiency of the collective foraging process, so we model the energyexpenses due to walking and dancing in the hive as well as due to flying with specific weights (bee-weight+ crop weights). We consider that the nectar offered by the sources, the nectar used in the crops and thehoney stored in the colony has different sugar concentrations. Therefore we use the energetic equivalent of1Mol glucose as a common currency and model all dilution processes and concentration processes. We alsoconsider that in most cases sucrose-solutions (not glucose!) were used in empirical experiments.

2.4 The simulation user interface

See figure 3.

3 Results

We performed successful simulations of several historic empirical experiments, which were done by T. Seeley:Selection of optimal foraging targets (Seeley et al., 1991), equilibrium concentrations (Seeley 1994, depictedin figure 3) and cross-inhibition of 2 groups of nectar foragers (Seeley 1997). We only discuss the firstexperiment mentioned above in this article (due to space limitations). We further explored the meaning of

— Page 148 —


Figure 4: Collective decision of foragers between two different food sources. (a) Empirical data redrawn fromSeeley et al. (1991). (b) Results of our multi-agent simulation.

the term “optimal decision” in the meanings of net honey intake of the simulated foraging cohort under dif-ferent foraging conditions. All our simulation experiments were performed with the following basic settings:number-of-foragers = 400; number-of-nectar-receivers = 300, bee-individuality = 0.1; max-searching-time =40; communication-noise = 0.015; colony-nectar-need = 0.75.

3.1 Simulating foraging decisions between two different nectar sources

In the historic experiment T. Seeley allowed a colony for four hours (8 a.m.-12 a.m.) to choose between twodifferent nectar sources in the same distance to the hive. Both sources contained sucrose solution, sourceA with a concentration of 2.5M and source B with a concentration of 1.0M. After 4 hours the colony hadcollectively chosen the better source (A: 2.5M). At 12 a.m., the concentrations in the two sources werechanged. Source A offered then only 0.75M sucrose solution, source B offered then 2.5M solution. Afteranother 4 hours, the colony changed its former collective decision and massively foraged on source B. Figure4 shows the results of the empirical study (figure 4a) and of our multi-agent model (figure 4b).

3.2 Calculating the benefits and costs of this foraging decision

To calculate the costs of the environmental fluctuation in the experiment above, we performed three differentsimulation runs and explored the net honey gain throughout 8 hours in all three cases (see figure 5):

• a) No fluctuation: How much will the net honey gain be, if nobody switches the qualities of the twosources (source A and B) at 12 a.m.?

• b) Decision: How much will the net honey gain be, if the qualities of the two sources are switchedaccordingly to the setup described in subsection 3.1?

• c) No decision: How much will the net honey gain be, if the concentration in source A is lowered to0.75 M and source B is removed?

From these results, we interpret the costs of the environmental fluctuation as

costfunction = a− c (4)

and the gained benefit from the ability to re-decide after the fluctuations as

costfunction = b− c (5)

— Page 149 —


Figure 5: Net honey gain of 400 simulated foraging bees and 300 simulated nectar receiving bees. (a) Timepattern of honey gain in all three setups described above. (b) The “no-decision” setup in 3 runs assumingdifferent levels of “colony-nectar-need,” which affects the horizontal offset of the linear dancing threshold curves(Seeley, 1994)

We see, that due to the collective decision system, the bees could manage to benefit significantly: About 23

of the potential costs of the environmental fluctuations could be compensated by the quick adaption of thecollective foraging strategy.

3.3 Is this benefit dependent on the quality difference between the twosources?

In the historic experiment, T. Seeley offered the colony a source with 0.75M and a source with 2.5M su-crose solution. Our next question was: Will the dynamics of the decisional process be affected by varyingconcentrations differences between the two sources? We performed the following simulation runs: After thetreatment at 12 a.m. source B contained 2.5M sucrose solution (see experiment 3.1.) and source A (thealternative source) offered one of the following concentrations: 0.01M, 0.25M, 0.5M, 0.75M, 1M, 1.25M,1.5M, 1.75M, 2M, 2.25M, 2.5M sucrose solution. We performed 8 simulation runs using each of these valuesfor source A.

As figure 6 shows, there are two interesting features in the collective decision making process: Is therea big concentration difference between the two sources (0.01M–0.5M at source A), the decision process isfast and overall net honey gain is higher. Is there a slight concentration difference between the two sources(1.75M–2.5M at source A), no decision happens and the colony exploits both (good) sources at a high rate.Only in the middle range (0.75M–1.5M at source A), the decision takes some time, but nevertheless, thecolony is able to prevent at least half of the potential costs.

3.4 Is this benefit dependent on the frequency of environmental fluctua-tions?

Not only the concentrations of the sources can vary, but also the time pattern of those environmentalfluctuations can affect the overall benefit gained from the decisional process. To explore this dependency onfluctuation frequency, we exposed our simulated colony to several treatments: no fluctuations, fluctuationsevery 30min, every 1h, every 2h, every 3h, and every 4h. These fluctuations were switches of the sources’sugar concentrations from 0.75M to 2.5M and vice versa.

— Page 150 —


Figure 6: (a) Time pattern of the net honey gain in simulation runs similar to the setup described in 3.1, usingother sugar concentration for the “worsened” nectar source. N=1 for each concentration. (b) Analysis of thedependence of the net honey gain on the concentration of the “worsened” nectar source. N=8 per concentrationsetup.

Figure 7: Net honey gain of our simulated colony in an environment with different frequency of source-qualityfluctuations. N=1 per frequency setup.

Figure 7 shows, that even in a rapidly changing environment, almost 1/3 of the potential costs couldbe avoided by the simulated colony. Rapid fluctuations (every 30min) seem to lead to a kind of parallelexploitation of both sources. This is due to the fact, that within these rapid fluctuations, time is too shortfor abandoning massively a food source and so both food sources show a lot of recruitment. But due to thefluctuations, only 1

2of foraging flights lead to a target with good source quality, thus lowering the net intake

to a level as seen in b(30min) in figure 7.

4 Discussion

Our multi agent simulation of honey bee foraging could successfully repeat historic empirical experiments.In literature several models are described dealing with honey bee foraging and workload balancing. Thesimulation presented in this paper is the first one that combines several aspects into one single multi-agent

— Page 151 —


model: decisions of foraging targets, workload regulation, nectar economics (respectively energy economics),and the importance of colony state and the importance of individual differences. As a side effect, it will servethe bee research community after publication on our web server, so that it can be freely used for educationworldwide.

The simulation experiments we describe in this article show impressing properties of the foraging decisionprocess found in honey bees. Under all stress conditions our colony was exposed to, the colony couldefficiently avoid a great fraction of possible damage, measured in a decrease of honey gain. The collectivedecision process could correctly (re-)decide for a new and better foraging source and gradually adapted itstime pattern to the amplitude of the fluctuation, measured as the difference between the sources (figure6) as well as to the frequency of the fluctuations (figure 7). This example of swarm intelligence in honeybees provides a valuable source of inspiration which we currently exploit in a project focusing on swarmsof autonomous robots that have to decide for common goals. Our future investigations on this topic willinclude a higher structured environment (more sources, more complex sources), implementation of moredetails in modeling the queuing details and “better” searching patterns of foraging bees.

Ackowledgements

The writing of this article was supported by the “Fonds zur Forderung der Wissenschaftlichen Forschung(FWF),” project no. P15961-B06.

References

Anderson, C, Ratnieks, F.L.W. 1999a. Task Partitioning in Insect Societies. I. Effect of Colony Size onQueueing Delay and Colony Ergonomic Efficiency. The American Naturalist 154: 522–535

Anderson, C, Ratnieks, F.L.W. 1999b. Worker allocation in insect societies: coordination of nectar foragersand nectar receivers in honey bee ( Apis mellifera) colonies. Behavioral Ecology and Sociobiology 46:73–81

Bonabeau, E., Dorigo, M., Theraulaz, G. 1999. Division of labor and task allocation. Pages 109-147 in:Swarm intelligence. From natural to artificial systems. Oxford University Press, NY.

Hart, A.G., Ratnieks, F.L.W. 2001. Why do honey bee (Apis mellifera) foragers transfer nectar to severalreceivers? Information improvement through multiple sampling in a biological system. BehavioralEcology and Sociobiology 49: 244–250.

Ratnieks, F.L.W., Anderson, C. 1999. Task Partitioning in Insect Societies. II. Use of Queueing DelayInformation in Recruitment. The American Naturalist 154: 536–548

Seeley, T.D. 1989. Social foraging in honey bees: how nectar foragers assess their colony’s nutritionalstatus. Behavioral Ecology and Sociobiology 24: 181–199

Seeley, T.D., Camazine, S., Sneyd, J. 1991. Collective decision-making in honey bees: how colonies chooseamong nectar sources. Behavioral Ecology and Sociobiology 28: 277–290.

Seeley, T.D. 1992. The tremble dance of the honey bee: message and meanings. Behavioral Ecolology andSociobiology 31: 375–383

Seeley, T.D. 1994. Honey bee foragers as sensory units of their colonies. Behavioral Ecology and Sociobiology34: 51–62

Seeley, T.D., Tovey, C.A. 1994. Why search time to find a food-storer bee accurately indicates the relativerates of nectar collecting and nectar processing in honey bee colonies. Animal Behavior 47: 311–316

Seeley T.D. 1997. Wechselseitige Hemmung zwischen Sammlerinnen. Pages 189-192 in: Honigbienen. ImMikrokosmos des Bienenstockes. Birkhauser Basel, Boston, Berlin.

— Page 152 —


Evolution versus Engineering - The CollectiveIntelligence of Sorting; Size Matters

Sam Scholes1, Ana B. Sendova-Franks 1,2, Chris Melhuish1, Matt Wilson1

1. Intelligent Autonomous Systems Lab, University of the West of England, Coldharbour Lane, Frenchay,Bristol, BS16 1QY, tel:(+44) [0] 117344 2870

2. School of Mathematical Sciences, University of the West of England, Coldharbour Lane, Frenchay,Bristol, BS16 1QY, U.K. tel: (0117) 344 3161 fax: (0117) 344 2734

Abstract

Collaboration between biologists and roboticists can facilitate the creation of new behaviouralalgorithms by roboticists and help biologists by exposing the underlying mechanisms that allowthe algorithms to function (for a review see Webb, 2000). This paper makes a direct comparisonbetween robot annular sorting (Wilson et al., 2003) and brood sorting in the ant Leptothoraxalbipennis (Franks & Sendova-Franks, 1992). We compared the ants’ and robots’ patterns interms of shape, compactness, completeness and separation. We draw the conclusions that thesize of the area available for sorting is critical in the quality of the pattern that is created.The principles we have uncovered are fundamental in the construction of the pattern and applyequally to ants and robots.

Keywords: Leptothorax, sorting behaviour, ant robot comparison

1 Introduction

Engineers have often modelled biological organisms in order to solve complex problems. The modelling ofentire systems of organisms such as ants has given us insights into how simple, local interactions can producecomplicated and even intelligent behaviour. The Nouvelle AI approach is concerned with building systemsthat use local interactions with the environment to study self-organised behaviour in simulations and robots.In turn, the rules that simple autonomous robot systems perform give biologists clues as to what to lookfor in the interaction of organisms they study. This two-way flow of ideas and concepts between engineeringand biology has the potential to accelerate the advancement of biology and robotics and to allow us toask fundamental questions about the foundations of intelligence. Although a huge number of studies havebeen inspired by biology, few have tried to reproduce a biological system with robots that are physicallycomparable and follow the same behavioural rules as the biology (Webb, 1998). This study attempts tocompare a biological system and the robots that have been inspired by it. It underlines the limitations ofrobot architecture and asks if modelling a single aspect of a system, such as behaviour, can really be achievedwhen the agent displaying the behaviour does not have comparable sensory or locomotive abilities. Lastly,it demonstrates the power of the environment to alter the product of an algorithm in both biological andbiologically inspired systems.

Leptothorax albipennis ants have probably been sorting their brood into concentric annuli for millennia.Much more recently, humans have emulated their behaviour and created robots that can sort pucks (Frisbees)into a similar pattern. This begs the question - which is better at sorting: ants, or robots employingminimalist mechanisms? Although the task is the same for both participants, there are some fundamentaldifferences between ants and robots. Firstly there is a difference in physical design. This may seem obvious,but in terms of peripheral systems and sensory apparatus, the ants are world leaders of the animal kingdom.It is currently not known exactly how the ants perceive their brood but there is evidence to suggest that itis partly through touch and partly through olfaction. The robots cannot compete with receptor sensitivity

— Page 153 —


on a par with the best analytical chemistry instruments used in science. Instead, each robot is equippedwith two infrared sensors, fixed three inches apart, mounted on the underside of the robot. When the robotmoves over a puck and picks it up, the sensors each send a beam of light at the puck. One sensor measuresthe amplitude of reflected light at the centre of the puck and the other on the periphery. The robots can tellthe difference between black, white, and grey pucks. The position of the two sensors also allows the robotto tell if a puck is a different shade in the centre than on the periphery.

Each u-bot has four infra red sensors mounted around the sides of its casing, enabling it to see whiteobjects in front, diagonally in front on both sides and behind. After a collision, which depresses the frontscoop and stops the robot, these sensors tell the robot if it has hit a wall. Mounted on the back of the robot4 cm above floor level, is an opto-sensor that detects if the robot is about to reverse into a puck.

The robots do however have two advantages over the ants. Firstly, they have been designed specificallyto do the task of sorting. The ants have been selected for by a huge diversity of selection pressures, somerelevant and some completely irrelevant to the sorting task. This means that at any one time, the robotswill be sorting and the ants could be doing one of hundreds of different behaviours. The second advantagethe robots have over the ants is that because they only have one task, their behaviour can be optimised forthat task. In fact, within the task of sorting, the robots can be further optimised to produce well-separatedor highly compacted structures. The ants, on the other hand cannot have truly optimised behaviour becausean optimised behaviour cannot respond to a change in selection pressure. Animals that are over-specialisedquickly become extinct as their environment changes. The robot experiments used 6 robots and 45 pucks,divided into 15 pucks in each of three shades. To emulate this, the ants have been given three distinct typesof brood, eggs, small larvae and large larvae. They naturally position these brood types in size order withthe smallest eggs in the centre, the small larvae in the middle and the largest larvae on the periphery. Therobots attempt to sort the pucks into the same order with the white pucks in the centre, surrounded by aband of black pucks, with a further band of polo pucks around the periphery of the structure.

2 Methods

We will compare the patterns of ants and robots over three conditions, namely three sizes of arena. In eachcase, the shape of the arena was octagonal, as used in Wilson (2002). The largest arena is equal to onestandard robot arena, 68.76 m2 or 1760 times the size of one robot. The ant’s arena was at the same ratio,5280 mm2 or 1760 times the size of one ant. The second arena was half the area of arena one and the thirdarena was half the area of arena two. The actual arena sizes are in table 1. It was not possible to control forthe difference in ant brood area and robot puck area. The robots can only manipulate pucks of a standardsize and shape and we wanted to base the arena sizes on the ratio of robot area and arena area. We didcontrol for the difference in puck and brood sizes when analysing the data. By halving the arena area eachtime we hoped to explore the effects of bulldozing in the robots in a controlled manner. A later study willoptimise the area of the pucks, number of robots and sorting area.

Table 1: Arena areas for each condition

Arena 1 area (m2) Arena 2 area (m2) Arena 3 area (m2)Ant 0.001320 0.002640 0.005280

Robot 17.16 34.38 68.76

Each of the ant experiments were replicated three times using two colonies in each condition to decreasethe chance of a colony being uncooperative. New brood and ants from the same colony were used in each ofthe conditions and a mean metric result was generated from the three replicates. Each of the experimentsused 6 ants, three taken from the centre of the colony and three from the periphery as the jobs a worker

— Page 154 —


is likely to do is location specific (Sendova-Franks, 1995). Central workers are likely to spend most of theirtime sorting and caring for the brood and the queen, while peripheral workers are predominately foragers.We decided that an even mix gave the most representative sample. Forty five brood items split into threetypes of 15 items were presented to the ants in an equally spaced series: type 4, type 6, type 8 where type 4are the smallest central items, type 6 are larger and form a band around the central cluster. Type 8 are thelargest item and are placed in a band on the periphery of the structure. The items were laid in a grid withinthe octagonal arena with the corners of the square at the same distance to the arena walls as the surroundingitems until it was impossible to do so whilst still following the 4,6,8 series. This pattern mimicked the startpattern of the pucks in Wilson et al.’s (2003) paper, a uniform grid pattern.

The robot experiments used 6 robots and 45 pucks split into three categories of 15, black, white andpolo (black centre and white surround or white centre and black surround). The pucks were laid out in agrid with each puck type in turn placed at equal distances from each other. The grid was centralised in thearena with the 4 corner pucks equally distant from the next pucks and the arena wall. Each puck was placedin the series white, black polo to copy the series in the ant experiments. Each robot condition consisted offive trials.

Figure 1: a) original radial displacement of the mean of 50 structures created by robots in the large arena. Thecentral pucks are type one, the first band of items is type 2 and the peripheral band is type three. b) Radialdisplacement of brood types in the large arena. Type 4 brood are central surrounded by a band of type 6 anda more peripheral band of type 8 brood.

3 Results

Each result is considered separately, starting with the radial displacements from structures in each arena be-fore comparing the ants with the robots in each arena area using Wilson et al.’s (2002) metrics of separation,completeness, compactness and shape.

3.1 Radial displacement

Each box-plot shows the median (central line), inter-quartile range (box) and highest and lowest values ofthe distance of each brood type to the centroid. The overall shape of each figure allows comparison betweenthe robot arena and ant arena. In the robot experiments, type one items form the centre of the structure,type 2 should have a higher median radial displacement than type 1 and type 3 should be further from thecentroid than type 2. In the ant experiments, type 4 should be at the centre of the structure, surrounded bya ring of type 6 and a larger ring of type 8 items. In each case, the median gives a good representation of thecommon radius for each puck type. The more separated the medians and the less spread the inter-quartilerange, the better sorted the structure is.

— Page 155 —


Figure 2: a) Radial displacement of pattern produced by robots in the medium arena. The items are wellseparated and the robots have produced a well-sorted structure. b) Radial displacement of brood types withinstructure created by ants in the medium arena. Type 4 brood are at the centre surrounded by type 6 and 8.

Figure 1a shows results from an experiment by Wilson et al. (2002), using an arena area of 68.76m2 .The displacement shows a tight cluster of type 1 items surrounded very closely with type 2 items. Thetype 3 items are much more separated which shows a good sort. The corresponding ant arena (fig. 1b)was 5280mm2 and exhibits a huge spread of type 4 brood. The type 6 and type 8 brood are also massivelyspread although not as much as the type 4 brood. This pattern suggests that there are multiple clusters oftype 4 brood that have formed several competing structures. Reference to the original data before pooling(not shown) shows this is the case in two of the three trials.

Figure 2 shows the radial displacement of structures in the medium arena. The medium arena washalf the size of the large arena. The robot experiments were based on five trials and the ant experimentsconsisted of three trials. The robots (fig. 2a) have produced structures with a cluster of type one pucks(white) surrounded by a band of type 2 pucks (black). There is some overlap between these pucks and theband of type 8 (polo) pucks around the outside of the structure. However the median brood items in eachcase do not overlap which shows that the robots performed very well in this arena. The ants (fig. 2b) havealso created a cluster of type 4 brood in the centre of the pattern. This is surrounded by a ring of type 6brood, which do overlap somewhat with the cluster. Type 8 brood are placed outside the central cluster

Figure 3: a) Radial displacement of pattern produced by robots in the small arena. The structure is notsorted because the median distances to the centroid have barely one puck diameter between them. b) Radialdisplacement of structures produced by ants in the small arena. The structure is well sorted and has a tightcentral cluster of type 4 brood.

— Page 156 —


but within the spread of the type 6 brood. There is a different story in the small arena. The robots (fig.3a) have been unable to sort due to lack of space. Each puck is 30cm across and as the differences betweenthe medians of each puck type are less than 30cm, the pattern must be a random jumble of pucks. Therewere five robot trials and each created the same type of structure. The ants (fig. 3b) show a very wellsorted structure. The eggs (type 4 brood) are clustered in the centre and are surrounded by a band of type6 brood. These are surrounded by a band of type 8 brood. The medians of each type show that there is agood distinction between the bands.

Figure 4: Separation of the structures produce byants and robots in the three arena sizes

Figure 5: Completeness of the structures in eacharena size. The medium arena has twice the areaof the small arena and the large arena has twice thearea of the medium arena

3.2 Other metrics

3.2.1 Separation

In the medium and large arenas, the robots separated out the pucks better than the ants (see fig. 4). Therewas a trend to better separation as the arena became larger, but the size probably had little effect after itwas big enough not to constrain the robots ability to pull back pucks. In the smallest arena, the robots wereunable to complete the task because the pullback distances were greater than the radius of the arena. Alater experiment with revised pullback distances at the same ratio to the large arena pullback distances andradius, returned a similar result. The ants formed structures that were better separated in the small arena,which meant that the arena size had a limiting factor as the area increased. We believe that the mediumand large arenas were treated as open spaces by the ants and as such they only sorted properly in the smallarena.

3.2.2 Completeness

The robots showed a trend towards better completeness as the arena size increased (fig.5). The ants scoredbetter than the robots in the small and medium arenas but worse in the large arena. The robots may havebeen better able to reach items all around the periphery in the large arena due to distance they reboundedwhen contacting the wall. In a larger arena, they had a better chance of rebounding at an angle that movedthem around the edge of the pattern. The ants showed a slight trend towards worse completeness as thearena area increased. This is because they tended to place their brood against a wall and as the length ofthe wall increased, the thinner the line of brood became.

— Page 157 —


Figure 6: Compactness of the structures produce byants and robots in each of the arena areas. Themedium arena has half the area of the large arenaand the small arena has half the area of the mediumarena.

Figure 7: The shape of the structures created bythe ants compared with the shape of the structurescreated by the robots in each arena. The large arenais four times the size of the small arena. Shapemeasures the how circular the structure is.

3.2.3 Compactness

The ants consistently obtained higher compactness scores than the robots (fig 6), indicating that they wereclustering their brood into a tight pattern. All of the brood items were found and clustered, even in thelarge arena. The robots achieved similar compactness scores in both the small and medium sized arenas.In the small arena, the pucks were clustered into one or sometimes two groups but the placement of puckswithin the group was essentially random.

3.2.4 Shape

The shape of the ants’ pattern was essentially the same in the medium and large arenas (fig.7), but was muchbetter in the small arena. The small arena was the only one where the brood were not placed predominatelyagainst the arena walls. The robots’ score increased dramatically with the size of the arena. As the arenaarea increased, so did the space between the pucks, allowing the robots to move about more freely and createa more circular pattern.

4 Discussion

4.1 Radial displacement

These results show that the smallest area in which the robots can sort is between the small and large arena(fig. 1-3). The area of the pucks is just as important as the area of each robot and the arena.

The robot pullback distances have been optimised in Wilson et al. (2003). He found that by changingthe pullback distances he could make the robots better at separating the puck types (with a larger pullbackdistance) or exhibit better compactness (by shortening the distances). In this study, we found that changingthe pullback distances in the small arena did not improve the radial displacement of the pucks. This suggeststhat there is a predictable relationship between the area of the pucks and the area in which they can besorted.

— Page 158 —


4.2 Separation

In the medium and large arenas, the area was too big for ants to be comfortable and they reacted by pilingtheir brood in one corner, which resulted in a poor separation score (fig.4). The ants were responding towhat they perceived as an open environment, even though the nest roof was within reach. In the smallarena, the ants were able to separate out their brood to a much greater extent. The ants need to leave aspace around each brood item to facilitate feeding and grooming (Sendova-Franks, 1992), which suggeststhat the workers were not caring for the brood properly in the large arena. Contrary to this, the robotsworst separation was in the small arena. In this environment, they simply couldn’t sort the pucks becausethere was not enough room to pull them back if there were other pucks or robots in the way. The robotsdid cluster the pucks although not through any intended method. The outermost pucks in the cluster werepicked up but almost never dropped again. As the arena area increased, the robots were able to separatethe puck types better and the task became easier as the thresholds of error increased. The opposite effectof increasing the arena size for ants and robots highlights how different the reasons and methods behindsorting really are in the two participants. The smallest limit of area for the ants is being studied at themoment.

4.3 Completeness

The ants obtained reasonable completeness scores (fig. 5). A perfect pattern would have had each broodtype on its own common radius with the items radially positioned with one item every 24◦ (with 15 items).In reality, because the brood were placed in a line against a wall and often all on one side of the centroid,each item was often separated from the next by 0◦. Also, the items were of different shapes so the positioningof each one could depend on the orientation of the items. There is no reason why evolution would favoura more complete structure for the ants. This metric was developed using the robots and is biased towardsthe design brief given to them, even so, the ants scored more highly in both the small and medium arenas.The ants have a different set of criteria, which we currently do not know, for building the structures theybuild. These criteria will almost certainly include behaviours that we do not expect. The robots manageda better completeness score in the large arena. This is most likely because they were better able to get allaround the structure to move items. When a robot hits a wall it makes a random turn and rebounds acertain distance as specified in its algorithm. A robot is more likely to turn through an angle that takesthem around the edge of the pattern in a larger arena. A larger number of robots, all starting at differentpoints around the structure would increase the completeness of the structure although each one needs aminimum amount of space to operate correctly. This shows the limitations of the robot morphology whentrying to get around the arena. The robots also move in straight lines, which can be very time consuming.A random walk algorithm will be tested at a later date.

4.4 Compactness

We found that the ants produced much more compact structures than the robots (see fig. 6). The mostlikely explanation for this was that the ants were trying to cluster their brood in a single part of the spaceavailable to them. Leptothorax albipennis live in tiny cracks in sedimentary rocks where they can be packedin with almost no space to move. If the only space available to them is bigger than they would prefer,they will build a defensive wall around the perimeter of their brood. In this case, they could not producea defensive wall and probably perceived even the smallest arena as being unsafe as it was far bigger thana nest that they would have chosen. We intend to continue the experiments using smaller nests to find thelower limit of area that ants can sort in. The robots showed a very similar result to the ants, but produceda less compact pattern. The pullback distances can be changed to produce a more compact structure or amore separated structure. They cannot determine their own pullback distances from the area of the arena,which is something that the ants may be able to do. Again, this is something we intend to test in a future

— Page 159 —


experiment.

4.5 Shape

The shape of the ants’ structure got worse as the area increased (fig. 7), probably due to the larger arenasbeing too big. The robots were substantially better able to place object types on a common radius asthe arena size increased. This is probably due to the locomotive abilities of the robots, as the arena sizeincreased, so did the spaces between items. The larger this gap, the less likely a robot is to displace an itemfrom the common radius of that item type. The more items that are on a common radius, the better theshape of the structure. It would be interesting to further increase the area of the robot arena and see ifthere is a further increase in the shape metric.

5 Conclusions

The opposite effect of arena area on the ants and robots is very obvious when compared like this. It islikely that the ants are not sorting their brood into a circular shape in this situation because of predationand possibly other selection pressures. This highlights that the ants are not just trying to sort their broodfor ease of feeding (O’Toole et.al, 2003) but are also trying to achieve other goals. The robots however,have a single aim, and in the unlikely event of predation, would not be able to respond to the new selectionpressure. The number of situations that a robot would need to be able to respond to is a matter of debate.

This paper has studied the differences and similarities of a biological organism and the robotics thathas been inspired by it. We have found that, overall, the robots are better at the task they have been setbut we have questioned the extraction of that task from a biological source in the first place. We have alsoshown that, for the task of sorting, the environment is a critical factor in producing a well-sorted structurefor both ants and robots. In particular, a sorting area that is too small for the robots and too large for theants produces badly sorted structures.

References

Franks, N. & Sendova-Franks, A.B. 1992 Brood sorting by ants: distributing the workload over the work-surface Behav Ecol Sociobiol 30, 109–123

Melhuish, M. Holland, O. & Hoddell, S. 1998. Collective sorting and segregation in robots with minimalsensing 5th International Conference on Simulation of Adaptive Behaviour, Zurich.

O’Toole, D.V. Robinson, P.A. Myerscough, M.R. 2003, self-Organised criticality in ant brood tending. J.Theor. Biol. 221: 1–14.

Sendova-Franks, A. & Franks, N. (1995) Division of labour in a crisis: task allocation during colonyemigration in the ant Leptothorax unifasciatus (Latr.) Behav Ecol Sociobiol 36: 269–282.

Webb B. (1998) Robots, crickets and ants: models of neural control of chemotaxis and phonotaxis. NeuralNetworks II: 1479–1496

Webb, B. 2000. What does robotics offer animal behaviour? Animal Behaviour 60: 545–558

Wilson, M. Melhuish, C. & Sendova-Franks, A. 2002. Creating Annular Structures Inspired by Ant ColonyBehaviour using Minimalist Robots. Autonomous Robots (Special Issue on Swarm Robotics): in press.

Wilson, M. 2003. Unpublished thesis on collective sorting in minimalist robots.

— Page 160 —


Niche Selection for Foraging Tasks in Multi-RobotTeams Using Reinforcement Learning

Patrick Ulam, Tucker Balch

College of Computing, Georgia Institute of Technology, Atlanta, Ga 30332, USAE-mail: {pulam,tucker}@cc.gatech.edu

Abstract

We present a means in which individual members of a multi-robot team may allocate themselvesinto specialist and generalist niches in a multi-foraging task where there may exist a cost forgeneralist strategies. Through the use of reinforcement learning, we show that the memberscan allocate themselves into effective distributions consistent with those distributions predictedby optimal foraging theory. These distributions are established without prior knowledge of theenvironment, without direct communication between team members, and with minimal state.

Keywords: Multi-Robot Systems, Reinforcement Learning, Foraging, Optimal Foraging

1 Introduction and Motivation

Foraging tasks are a standard testbed for multi-robot research partly due to their strong biological analogsas well as their applicability in a large number of tasks ranging from sample collection to mine disposal. Inmulti-robot foraging tasks, the robots composing the team search for objects to collect (attractors) in anarea and return the attractors found to a goal location. Multi-foraging is a variant of the typical foragingtask in that instead of a single type of attractor, there exist multiple differing types. A significant amountof research has been conducted in the area of multi-robot foraging. This research includes but is not limitedto the effects of communication on multi-robot foraging (Balch & Arkin, 1994), interference patterns inmulti-robot foraging tasks (Goldberg & Mataric, 1997), as well as the dynamics of collective sorting in aforaging task (Denebourg et al., 1990). Of particular interest is Balch’s work concerning the diversity ofmulti-robot teams that learn to perform a multi-foraging task (Balch, 1998). In this work, Balch found thatthe team of robots did not learn to specialize in the foraging task even though multiple types of attractorsexisted. In fact, using the social entropy metric developed within his thesis, he found that team diversityand performance were negatively correlated. This paper focuses upon this result and attempts to answerwhy the robots that learned the foraging task did not specialize, and what could cause a multi-foragingteam to specialize. To address these questions we look towards the optimal foraging theory literature toprovide insight into models of natural organisms’ foraging behavior and the parameters that may result inspecialized foraging behavior as well as generalist foraging behavior.

1.1 Optimal Foraging Theory

Optimal foraging theory, which is used by behavioral ecologists to model the foraging behavior of organ-isms ranging from birds (Krebs et al., 1997) and mantes (Charnov, 1976) to bees (Real, 1991), has lookedextensively at the problem of finding the most efficient means in which an organism may forage. Optimalforaging theory operates under the assumption that evolution has adapted the foraging behavior of organ-isms to maximize certain factors while minimizing others as a means of increasing its reproductive fitness.The usual interpertation of these optimization factors include the maximization of caloric intake and theminimization of other factors such as predatory risk or energy expenditure.

— Page 161 —


Research in this area has produced numerous models to describe this behavior. Most of these modelsutilize some combination of four factors: a fitness set for the foraging activities, activity selection, negativedensity dependence, and variable environments (Wilson & Yoshimura, 1994). The fitness set captures theintuitive notion that specialized foragers are usually more effective than generalized foragers. Activityselection allows for the organism to change its foraging behavior between differing prey types or differingenvironments. Negative density dependence factors capture the notion that foraging decisions are madein the context of the number of other organisms already performing a particular foraging action. Lastly,temporally varying environments are used as a means of modeling seasonal variations or other changingfactors in the environment that may cause different distributions of foragers.

MacArther and Pianka, in their seminal paper on optimal foraging theory (MacArthur & Pianka, 1966),developed a model to determine the most efficient means in which an organism can forage in a patchyenvironment. Of particular interest to this work, is the portion of the model that describe the conditionsin which competing foraging species, one a specialist and one a generalist, will be overrun by the generalistforager. They describe the net intake of food for a specialist forager as kDH where k represents the foragingrate, D the density of food items, and H is the time spent foraging. For the generalist forager the net intakeis equal to k′DH′ where k′ < k to represent the trade off between generalist and specialist strategies andH < H′ to represent the reduced search time incurred by the generalist. This defines the parameter rangesin which specialist foragers can be expected to intermingle with generalist foragers, namely while

H

H′ >k′

k. (1)

Another model of interest is that proposed by Wilson and Yoshimura concerning the coexistence of spe-cialist and generalist foragers (Wilson & Yoshimura, 1994). In their model they define fitness levels fororganisms across different environments though the use of a carrying capacity K. This carrying capacity isdefined on a per species and per environment basis such that Ki,j represents the carrying capacity of speciesi in habitat j. This carrying capacity is utilized to represent specialist and generalists though the use ofconstants a and b. Thus the carrying capacities of the different species in a particular environment can beexpressed as

K1,1 = K1, K2,1 = aK1,K3,1 = bK1. (2)

These relationships between the carrying capacities of the different species are used to determine theindividual fitness of a species as

Wi,j = er(1−(N1,j+N2,j+N3,j)/Ki,j), (3)

where r is a constant rate of increase for all species and Ni,j is the number of species i in habitat j. Byiterating this fitness value along with the current population of a species in that habitat using a standarddiscrete-time population model,

Ni,j,t+1 = Ni,j,tWi,j,t, (4)

they are able to predict the number of the three species that will be present in each habitat uponstabilization of the system.

1.2 Reinforcement Learning

This insight of treating foraging as an optimization process leads to the utilization in our investigation ofa common optimization technique in robotics, namely reinforcement learning. Reinforcement learning is amachine learning technique in which an agent learns through trial and error to maximize rewards received

— Page 162 —


for taking particular actions in particular states over an extended period of time. More precisely, given aset of environmental states S , and a set of agent actions A, the agent learns a policy, π, which maps thecurrent state of the world s ∈ S , to an action a ∈ A, such that the sum of the reinforcement signals r aremaximized over a period of time.

There are a number of techniques for maximizing this reinforcement signal including but not limited tosuch techniques as Q-learning and the adaptive heuristic critic algorithm (Kaelbling et al., 1996). For ourexperiments, however, we chose a relatively simple method to calculate the value of taking a given actionin a given state, namely by calculating the average reward over state, action pairs. This average can becalculated using

Q(s, a) =N (s, a)Q(s, a) + r + maxa′ Q∗(s, a′)

N (s, a) + 2, (5)

where r is the reward received for taking the action, maxa′ Q∗(s, a′) is the reward that would be receivedby taking the optimal action after that, and N (s, a) is the number of times the robot has taken action a instate s. By choosing the action with the highest Q-value, and allowing for the robot to choose a randomaction with a given probability, the robot can explore the state space and converge upon the action with thegreatest average reward. For a more detailed discussion of reinforcement learning refer to (Sutton & Barto,1998) and (Kaelbling et al., 1996).

2 Related Work

A large body of research has looked at using reinforcement learning as a means of guiding multi-robotteams in foraging tasks. Mataric has analyzed the performance of such foraging robots using reinforcementlearning (Mataric, 1997). Balch has measured the behavioral diversity of teams that have learned foragingtasks (Balch, 1998). A significant amount of research has also addressed the division of labor in foraging tasks,both in the context of multi-robot teams, as well as social insects. Jones and Mataric have looked at meansof using limited sensory history to estimate the proper division of labor in a foraging task (Jones & Mataric,2003). Martinson and Arkin investigated the utility of reinforcement learning as a means of guiding roleswitching in a military scenario involving foraging robots, soldier robots, and mechanics (Martinson & Arkin,2003). Additional division of labour models have been proposed in the context of social insects. Bonabeauet al. developed a division of labor model for social insects utilizing response thresholds (Bonabeau et al.,1996). Theraulaz et al. extended this model to allow for threshold adjustment via the use of a reinforcementprocess (Theraulaz et al., 1998).

3 Method

Using the Teambots robot simulation environment, five different worlds were created with 5 percentrandom obstacle coverage and 40 randomly distributed attractors colored blue and red. Eleven variationsof these worlds were generated by varying the proportion of red attractors present such that the number ofred attractors ranged from 0 to 20. Four simulated Nomad 150 robots were placed into the world. Eachrobot can execute one of three foraging strategies: a specialist red attractor foraging behavior, a specialistblue attractor foraging behavior, and a generalist foraging behavior in which the robot would collect eithertype of attractors. Each robot’s controller is designed using the Clay architecture (Balch, 1998) of Teambotswhich allows for the creation of motor schema based control systems. We use the reinforcement learningalgorithm described previously to enable each robot to learn which of the three different foraging strategiesto use. Each robot has only one state and in that state can choose from three actions corresponding to thethree foraging behaviors described above.

— Page 163 —


Red Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Red Attractors 0

1

2

3

4

5

Foragers

Blue Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Blue Attractors 0

1

2

3

4

5

Foragers

Red Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Red Attractors 0

1

2

3

4

5

Foragers

(a) (b) (c)

Figure 1: Distributions of four robots with no search cost as defined by the cost of generalization, cg, and thenumber of red attractors out of 40

In order to capture the notion of the fitness set in optimal foraging theory we utilize a scalar cg for thereward function of the general forager. This scalar can vary from 1, indicating there is no cost to being ageneral forager to, 0 which indicates that general foraging is ineffectual. This reward model is is furtherexpanded to include the notion of search cost as MacArther and Pianka’s model indicate that this may playan important role in specialization of natural organisms foraging behaviors. Thus for each timestep spentsearching for attractors the robot receives a penalty of -1, indicating a significant search cost, or 0 indicatingthere is no significant cost to searching. Thus the reward function can be depicted as:

R(t) =

1 if the attractor is returned using a specialist behavior at time t - 1;1cg if the attractor is returned using the generalist behavior at time t - 1;−1, 0 if the robot does not return an attractor at time t-1.

The cost scalar was varied from 0 to 1 in increments of 0.2 and search cost was varied between 0 and -1.Three hundred trials were run in each configuration.

4 Results

The resulting behavior selection of the individual robots was measured. Figure 1 depicts the resultsof the experiments when the foraging robot does not take the cost of the actual foraging into account.Figures 1a, 1b, and 1c show the number of robots that perform the red foraging, blue foraging, and generalforaging strategies respectively for a given configuration of general foraging reinforcement levels and attractordistribution. Figure 2 shows the same trials run with the addition of a penalty for each time step spentsearching for attractors. Figures 2a, 2b, and 2c depict the resulting behavioral distribution for each of thethree strategies with this additional negative reinforcement in place.

5 Discussion

A baseline case occurs when there is no tradeoff between general and specialized foraging strategies. In thetrials run, this parameterization occurs when the reward level for attractors returned using any strategy is 1.In the trials where there is a cost associated with searching as well as when there is no cost, a homogeneousteam of general purpose foragers results as predicted by Balch’s work on the diversity of multi-foragingteams. For the lower red attractor distribution, the general purpose foragers intermingle with blue foragingspecialists as there are too few red attractors to mandate a fully homogeneous generalist team.

— Page 164 —


Red Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Red Attractors 0

1

2

3

4

5

Foragers

Blue Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Blue Attractors 0

1

2

3

4

5

Foragers

Red Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Red Attractors 0

1

2

3

4

5

Foragers

(a) (b) (c)

Figure 2: Distributions of four robots with significant search cost as defined by the cost of generalization, cg, andthe number of red attractors out of 40

Further analysis is possible by looking at some of the model’s predictions concerning the proper distributionof specialists and generalists. In particular, we can look MacArthur and Pianka’s predictions concerningthe the critical points in which generalist foragers will overrun specialist foragers. The parameters in oursimulation can be mapped to the parameters in MacArthur and Pianka’s model readily. The reinforcementlevels for the generalist and specialist strategies times the reward r can be mapped to k′ and k respectively.The parameters for the time spent foraging, H and H′, can be mapped to the simulation by noting thatthe foraging time for the two strategies will be be proportional to the number of attractors available tobe collected via a given strategy. Hence, the time spent foraging for the specialist will be proportional tomax(Ared,Ablue)

Ared+Ablueand the time spent foraging for the generalist will be proportional to Ared+Ablue

Ared+Abluewhere Ared

and Ablue represent the number of red and blue attractors respectively.

By placing these values in equation 1, the critical points in which the specialists are overrun with generalistscan be calculated as:

hmax(Ared,Ablue)Ared+Ablue

hAred+AblueAred+Ablue

=rcg

r

max(Ared, Ablue)

Ared + Ablue= cg , (6)

where r is is the reward for returning an attractor under the specialist strategies and h represents someconstant handling time for attractor collection.

Figure 3 shows a plot of max(Ared,Ablue)Ared+Ablue

and three different reward multiples for the generalist foragingstrategies where the reward level was multiplied by 0.4, 0.6, and 0.8. The intersections between the rein-forcement factor functions and the maximum proportion of the attractors available to the specialist depictthe critical points at which the team should converge upon a homogeneous generalist strategy. The datapresented in figure 2c shows that the team did in fact converge on the homogeneous strategy, but slightlylater then predicted by the optimal foraging model. The trials in which cg = 0.6 converged to a homogeneousteam when at Ared = 18 as opposed to the predicted Ared = 16. At cg = 0.8 the convergence did not occuruntil Ared = 12. In both the our simulation runs and in the MacArther and Pianka’s model, a homogeneousteam of foragers does not emerge for cg ≤ 0.4.

We can do a similar comparison to Wilson and Yoshimura’s model described previously. We assign thecarrying capacities of each species as

— Page 165 —


0

0.2

0.4

0.6

0.8

1

1.2

1.4

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

max(Red Attractors, Blue Attractors)

0.4 Reinforcement Level0.6 Reinforcement Level0.8 Reinforcement Level

max(Red Attractors, Blue Attractors)/Total Attractors)

Figure 3: Critical points for the transition to homogeneous generalist teams via MacArther and Pianka’s model.

Red Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Red Attractors 0

1

2

3

4

5

Foragers

Blue Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Blue Attractors 0

1

2

3

4

5

Foragers

Red Foragers

0 0.2 0.4 0.6 0.8 1 1.2Cg 0

5 10

15 20

Red Attractors 0

1

2

3

4

5

Foragers

(a) (b) (c)

Figure 4: Allocations for four robots using Wilson and Yoshimura’s model

Kred,red = Nred,Kblue,red = aNred,Kgen,red = bNred

Kblue,blue = Nblue,Kred,blue = aNblue,Kgen,blue = bNblue, (7)

with a = 0, and b = 1-c, utilize equations 3 and 4 to determine the stable configuration, and thennormalize the results for four agents. The resulting configuration space is shown in figure 4. As can be readilyseen, the results are strikingly similar to results of the simulations foraging runs made in which search cost wasnot considered as a significant portion of the reward function. The results from our reinforcement learningallocation produced slightly sharper curves at the data points with low values of cg and low proportion ofred attractors when compared to Wilson and Yoshimura’s predictions. Also, the bifurcation that occurs atwhen cg = 0.6 with the appearance of generalist foragers and the disappearance of red foragers is not aspronounced in our simulation results. The generalist foragers do begin to emerge and specialist red foragersbegin to dissapear but not as drastically as their optimal foraging model would predict.


Reinforcement learning appears to be an effective means for individual robots to learn foraging strategiesin environments where multiple types of attractors exist and the effectiveness between generalized foragingstrategies and specialized strategies may be variable. Using the method described in this paper, the robotswere able to achieve effective distributions in unknown environmens without the use of direct communicationand with the use of minimal state. By modeling the trade-off between the effectiveness of general and

— Page 166 —


specialized foraging strategies via a reward function, the individual robots were able to learn strategiesresulting in team composition that is consistent with the foraging distributions predicted by Wilson andYoshimura’s model. When search cost became the defining factor in the foraging behavior the distributionsclosely converged to homogeneous generalist teams at the points predicted by MacArther and Pianka. Balch’swork concerning the diversity in multi-robot teams that learn the foraging task have been shown to beconsitant with both the optimal foraging models as well as the simulation trials described in this paperin which there existed no cost to performing generalist foraging. While we have looked at the extremeparameterizations of search cost in our simulations, it may prove fruitful to investigate the effect of moremoderate search cost on niche selection for a foraging task. Additional investigation into the scalability ofthe method described in this paper over additional foraging behaviors as well as attractor types may alsoprove interesting.

References

Balch, T. 1998. Behavioral diversity in learning robot teams. Ph.D. thesis, College of Computing, GeorgiaInstitute of Technology.

Balch, T. & Arkin, R. 1994. Communication in reactive multiagent robotic systems. Autonomous Robots1(1).

Bonabeau, E. Therauluz, G. & Deneubourg, J.L. 1996. Quantitative study of the fixed threshold model forthe regulation of division of labour in insect societies. in: Proceedings Roy. Soc. London B. 263:1565–1569.

Charnov, E. 1976. Optimal foraging: Attack strategy of a mantid. The American Naturalist. 110:141-151

Denebourg, J.L. Goss, S. Franks, N. SendovaFranks, A. Detrain, C. & Chretien, L. 1990. The dynam-ics of collective sorting robot-like ants and ant-like robots. in: Meyers, J.A. & Wilson, S.W. eds.(SAB96, From Animals to Animats 4 : Proceedings of the 4th International Conference on Simulationof Adaptive Behavior, Cambridge, MA, MIT Press. pp. 356–365.

Goldberg, D. & Mataric, M. 1997. Interference as a tool for designing and evaluating multi-robot controllers.in: AAAI/IAAI, Providence, RI, pp. 637–642.

Jones, C. & Mataric, M. 2003. Adaptive division of labor in large-scale minimalist multi-robot systems. in:IEEE/RSJ International Conference on Robotics and Intelligent Systems (IROS), Las Vegas, Nevada.

Kaelbling, L.P., Littman, M.L., Moore, A.P. 1996. Reinforcement learning: A survey. Journal of ArtificialIntelligence Research 4:247–285.

Krebs, J. Erichsen, J. Webber, J. & Charnov, E. 1997. Optimal Prey Selection in the Great Tit (ParusMajor). Animal Behaviour 4:30–38

MacArthur, R. & Pianka, E. 1966. On Optimal Use of a Patchy Environment. The American Naturalist100: 603–609.

Martinson, E. & Arkin, R. 2003. Learning to role-switch in multi-robot systems. in: IEEE InternationalConference on Robotics and Automation (ICRA). Taipei, Taiwan.

Mataric, M. 1997. Reinforcement learning in the multi-robot domain. Autonomous Robots 4: 73–83.

Real, L. 1991. Animal choice behavior and the evolution of cognitive architecture. Science 243: 980–986.

Sutton, R.S. & Barto, A.G. 1998. Reinforcement learning: An introduction. MIT Press, Cambridge, Ma.

Therauluz, G., Bonabeau, E. & Deneubourg, J.L. 1998. Threshold reinforcement and the regulation ofdivision of labour in insect societies. in: Proceedings Roy. Soc. London B. 265: 327-335.

Wilson, D.S. and Yoshimura, J. 1994. On the coexistance of specialists and generalists. The AmericanNaturalist 144: 607–707.

— Page 167 —


Cost/Benefit: Information Dissemination inDistributed Systems

Ashish Umre, Ian WakemanNetwork Research Lab, School of Cognitive and Computing Sciences, University of Sussex, Falmer,

Brighton, BN1 9QH, U.K. Corresponding author : [email protected]

Abstract

The sharing and collective processing of information by individuals in any social system is anattempt to reduce the uncertainty associated with key features of their environments by col-lecting and storing information. By sampling each of its options regularly, an individual gainsfrom being able to exploit them when they are productive and avoid them otherwise (Dall andJohnstone, 2002). In this way, collection of information can be thought of as a solution to theuncertainty problem that maximises potential opportunities (Stephens, 1989; Mangel, 1990).However, doing so may entail certain costs because of valuable resources, including time, energyand attention. In this paper, we explore the cost/benefits of cooperation within the domain ofdistributed systems, where biologically inspired agents communicate and cooperate with eachother using the environment to disseminate information about resources/services. The simula-tion does not have a real situated environment; instead the agents encounter the informationthrough process and resource generator programs that generate processes and resources sto-chastically. In the sections that follow, we describe briefly the theory of cooperation, socialforaging theory, the simulation model and some interesting experiments that we carried out tounderstand/analyse the dynamics of social foraging in stochastic environments.

keywords: Information Dissemination, Social Foraging Theory, Collective Behaviour, Stigmergy, and Coop-eration.

1 Biological Basis and Theory of Cooperation

To account for the manifest existence of cooperation and related group behaviour, such as Altruism andRestraint in competition, evolutionary theory has acquired two kinds of extension: Genetic kinship theoryand reciprocity theory. If the agents are sufficiently closely related, altruism can benefit reproduction ofthe set, despite loses to the individual altruist. The evolution of the suicidal barbed sting of the honeybeeworker could be taken as a paradigm for this line of theory (Hamilton, 1972).

Many of the benefits sought by living things are disproportionally available to cooperating populations.The problem lies with the fact that while an individual can benefit from mutual cooperation, each can alsodo so even better by exploiting the cooperative efforts of others. Over a period of time, the same individualsmay interact again, allowing for more complex patterns of strategic interactions. (Dugatkin 1998) arguesthat there are at least three ways that cooperation can evolve among unrelated individuals: reciprocity,group selection, and by-product mutualism. Though, kin selection is a fourth candidate.

2 On the other hand, how beneficial cooperation really is?

The acquisition and use of socially acquired information is commonly assumed to be profitable. But, therecould be scenarios where the use of such information either provides no benefit or can actually incur a cost.It is suggested (Giraldeau et al., 2002) that the level of incompatibility between the acquisition of personaland socially acquired information will directly affect the extent of profitability of the information, when these

— Page 168 —


two sources of information cannot be acquired simultaneously, because of cognitive or physical constraints.Also, a solitary individual’s behavioural decisions will be based on cues revealed by its own interactions withthe environment.

However, in many cases, for social animals the only socially acquired information available is the behav-ioural actions of others that expose their decisions, rather than the cues on which the decision was based. Insuch a situation it is thought that the use of socially acquired information can lead to informational cascadesthat sometimes result in sub-optimal behaviour.

In our experiments, we look for results that suggest the presence of information cascades in the contextof information sharing/cooperation in distributed systems. Designing agents that rely both on individualforaging and shared information, or agents that just rely on shared information, or solitary foragers. Ongoingstudies are focussed on understanding whether this might happen in a highly dynamic environment; wherethere are constant changes in the flow of information about resources/services that demand frequent updates.

2.1 Cost for everything

In any social group, individuals possess various behaviours that define the assortment of the interactions at allsorts of levels, individual, groups, cliques, teams etc. The social foraging theory suggests that, the functionalconsequence of an individual’s foraging behaviour depends on both the individual’s own actions and thebehaviour of other foragers. There may be conflicts of interest between signallers and receivers. Where sucha conflict exists, the receiver’s need to acquire information may favour sensitivity to the cues provided bythe behaviour and appearance of the signaller. In turn, this sensitivity may give rise to opportunities formanipulation and exploitation by the signaller.

It is understood that exploitative strategies are unlikely to persist in the long run, because they generateselection for a change in receiver responses. However, it is argued, that the evolution of exploitation mayprove a recurrent, though, transient phenomenon.

There are costs associated with broadcasting information publicly, as exemplified by the productionof “food vocalisations” in many social animals. The issues that come under this context are, dangers ofpredation, and mass recruitment to a very less profitable resource may lead to starvation (Real and Caraco,1986). This is equivalent to the “Slash Dot” effect that the Internet sometimes experiences.

Other costs within the context of a social system are cost of misinformation (lying), cost of access-ing/using the resources and cost of signalling/cooperation. We use foraging games to analyse the economicsof Kleptoparasitic 9 behaviour (Hamilton, 2001), to predict the ecological circumstances under which thebehaviour is maintained. We modelled the cost function for an individual as a function of the delay in infor-mation transfer/acquisition. Other costs are expressed as survival rate; if an agent keeps failing/delaying tolocate resources for the requested processes/services it gets penalised and if this increases above a threshold,then the agent dies and a new agent replaces the old agent.

3 Simulation Model

We implement a discrete-event simulation of cooperative agents, which share information (through theenvironment, Stigmergy) about the location of resources/services for the completion of requests generatedby users. A process generator generates processes/requests with Poisson distribution. Processes that enterthe system, enter into a queue where they wait to be served by agents. As soon as an agent becomes freefrom its previous task, it takes up the next request in the queue. A resource generator generates a randomnumber of required resources for the successful execution/completion of a process/request. A random numberof agents are launched, initially set randomly to a number less than or equal to the number of requestedresources for the first process that enters the system. When an agent encounters some information about

9Kleptoparasitism refers to all forms of exploitation of others’ food discoveries or captures. It constitutes theinformation-sharing models in the Social Foraging Theory paradigm.

— Page 169 —


Figure 1: Schematic representation of the information dissemination system

a resource/service, it probabilistically stores the information in its resource vector and/or publishes theinformation onto a “HotSpot.”

If the resource encountered is one of the requested resources for the process the agent is trying to serve,then it locks the resource and marks the resource entry in the target vector (which contains the list ofprocesses waiting to be finished and the status of the resources they have requested) under the specificprocess. Once all the required resources/services have been located, the process is executed. The agent canonly lock the resource for a fixed time after which it will have to rejoin the queue. The resource once lockeddiminishes by a certain amount due to its consumption.

We have considered the resource handling time as negligible and the process execution time as a randomtime factor. Other agents looking for the same resource can access the HotSpot and share informationthrough it. The HotSpot contains the information about resources and their location. Each resource/servicepublished at the HotSpot has a reinforcement value (similar to pheromone deposit) associated with it, whichsignifies the demand of the resource. Every time an agent accesses a resource information at the HotSpot, itreinforces the pheromone deposit so that the resource path continues to exist, whereas if the reinforcementvalue goes below a certain value, it gets over written by the first new resource/service that joins the system.Hence, the table is being constantly updated with the latest information about resource/service paths.

The agents possess a simple learning mechanism that enables them to adapt to the stochastic environ-ment. Each agent records the information about the degree of cooperation and its search efficiency for theprior process it executed/failed. Its cooperative strategy (probability of publishing/sharing information)is updated accordingly in the next step so as to apply a better information sharing strategy for the nextprocess. This is more or less an equivalent NASH equilibrium 10 for the agent.

The level of cooperation can be tuned with a probability parameter (probability with which it pub-lishes/shares information) that is defined in the agent. Agents do not share information directly with eachother, although this scenario will be modelled into the simulation for future experiments based on trust andsecurity.

3.1 Results and Analysis

We analyse some aspects of artificial and biological social systems, such as, optimal number of agents inthe system (Pacala et al., 1996), throughput of the system, degree of cooperation (which can depend on

10Nash Equilibrium is a combination of strategies for the players of a game, such that each player’s strategy is abest response to the other players’ strategies. A best response is a strategy, which maximises a player’s expectedpayoff against a fixed combination of strategies played by the others.

— Page 170 —


Figure 2: Optimal size of agent population.

an implicit factor of relatedness). Demonstration of the use of Nash equilibrium, to show the “tragedyof the commons” for certain situations both in the simulations and in real life, e.g. Slash Dot effect.How a certain resource/service gets over exploited because of it being over publicised and may lead to itsexhaustion/starvation.

Similarity with Caraco’s food calling game (Giraldeau and Caraco, 2000), when agents individually lookfor resources/services and on finding it, decide to publish it or not. According to Caraco’s model if theydecide against publishing the information, then they are more susceptible to predation.

3.1.1 How many agents make a working whole?

In general, we observe a peaked fitness function (Clark and Mangel, 1986) when we analyse the system asa collection of agents trying to maximise the throughput and minimise the delay in acquiring information.The peaked function we see in figure 2 illustrates the existence of only one optimal agent population sizefor which, the throughput of the system is maximum, given that certain other parameters in the simulationremain fixed, like the number of resources.

This suggests that initially an increase in the agent population is beneficial in obtaining a good through-put, but the throughput peaks at some point for a certain size of population implying that there are enoughagents to process requests for resources/services any further increase will result in delays due to queuingfor resources/services. We think that this can change if the number of resources/services is abundant orincreasing, in which case, the graph will be an increasing function.

Figure 3: Average Time taken to establish the resource/service path.

— Page 171 —


Figure 4: Demonstration of Slash dot effect at a resource r12 and the corresponding drop in Throughput forprocesses requiring that resource/service.

3.1.2 Throughput of the System

The time taken to find all the resources for a request can vary depending on the number of resources required.Therefore, we calculated the average time taken for finding the various resources over a series of runs andaccumulated the data for all the possible number of resources in the system. We were interested in findingout the trend that follows in terms of time taken to locate all those resources or hops. As seen from Figure3 out that there is an increasing trend with respect to the number of hops. As the number of requiredresources increases it takes more time to find them, but the trend shows that there could be a decrease lateron in the system as the agents develop an optimum response for each request, as the number of resourcesincrease.

3.1.3 Slash dot effect/Kleptoparasitic Behaviour

Slash dot effect, whereby popular data becomes less accessible because of the load of the requests on acentral server. The following figure 4 demonstrates the percentage increase in the number of agents in thequeue for a resource e.g. resource r12 in this figure. The figure also displays the corresponding decline inthe throughput for processes requiring the service r12.

This implies that popular request for a service can lead to it being highly advertised or “vocalised,”resulting in the depletion and decreased performance of the service. Therefore, unless there is a way toadapt to this phenomenon, the services will continue to fail or perform at a sub-optimal behaviour. Futurework is aimed at studying the possibility of introducing service replication in the locality of the currentservice. This will distribute the load of the service and help process more requests. Also, it will handle to acertain extent the dynamic nature of the system wherein the services can fail.

Kleptoparasitic behaviour is observed when an agent frequently refers to the environment for informationregarding resources/services instead of foraging itself. Also, there isn’t a change observed in its cooperativestrategy. Implying that the agent is satisfied getting most of its information from other agents that havepublished/shared the information and itself does not gather information.

4 Discussion/Conclusions

Our experiments explore various cooperative/competitive strategies that encompass most aspects of socialbehaviour. Mixed strategy models showing the possibility of freeloaders or lying. Ongoing implementations

— Page 172 —


include scenarios like modelling trust in the system, altruism, and misinformation/malicious agents. Toshow how information sharing game-theoretic models can make novel, quantitative, and testable predictionsconcerning social foraging theory, within the application domain of distributed systems e.g. P2P networks.

The experiments revel some interesting dynamics of the system with respect to the information dissem-ination algorithm. Our main objective is to keep the agent imperceptible and its behaviour very simple.Ongoing work is aimed at modelling a trust factor within the system and perhaps try to reduce the numberof malicious agents. We are in the process of developing formalisations for the current algorithmic approach,so as to do a detailed mathematical analysis of the underlying theory.

This work takes its inspiration from work done in the field of swarm intelligence. Multi-agent systems canhave agents that have complex set of behaviour/rules of interaction modelled into them. Of course, there aresituations where these work fine, but scaling up is always an issue. P2P networks depend on the informationflow within the network and the techniques of locating services efficiently. Our study hopefully gives insightsinto certain kinds of behaviour persistent in the system which bear some resemblance to biological socialsystems. Areas such as, foraging, danger of predation, sharing information regarding food/nest sites etc.

The simulation model discussed should eventually be able to help understand some of the contexts inwhich cooperation emerges, is beneficial or not, and to what extent. Also, how the various costs can beminimised/optimised, by implementing dynamic strategies.

Acknowledgements

This research is supported by, Future Technologies Group and Programmable Networks Lab, at the BritishTelecom Research Labs, Adastral Park, UK.

References

Dall, S.R.X. & Johnstone, R.A. 2002. Managing Uncertainty: Information and insurance under the risk ofstarvation. Phil. Trans. Royal Society London.

Dugatkin, L 1998. Game Theory and Animal Behaviour. Oxford University Press.

Giraldeau, L-A & Caraco, T. 2000. Social Foraging Theory, Princeton University Press.

Giraldeau, L-A., Valone T & Templeton, J.J. 2002. Potential disadvantages of using socially acquiredinformation, Phil. Trans. of the Royal Society London.

Hamilton, W. 1972. Altruism and Related Phenomena, Mainly in Social Insects. Annual Review of Ecologyand Systemics 3:193–232.

Hamilton, W. 2001. Kleptoparasitism and the distribution of unequal competitors. Behavioural Ecology13(2): 260–267.

Mangel, M. 1990 Dynamic Information in uncertain and changing worlds. Journal of Theoretical Biology146: 317–332.

Pacala, S.W., Gordon, D.M., & Godfray, H.C.J. 1996. Effects of social group size on information transferand task allocation. Evolutionary Ecology 10: 127–165.

Real, L. & Caraco, T 1986. Risk and Foraging in Stochastic Environments. Annual Review Ecol. Syst.17:371–90.

Stephens, D.W. 1989 Variance and the value of information. American Naturalist 134: 128–140.

— Page 173 —

POSTERS


Teamwork in Animals, Robots, and HumansCarl Anderson1, Nigel R. Franks2

1. School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0205, USA. Current address: Icosystem Corporation, 10 Fawcett St., Cambridge, MA 02138, USA.E-mail: [email protected]

2. School of Biological Sciences, University of Bristol, Bristol, BS8 1UG, UK. E-mail: Nigel.Franks@

bristol.ac.uk

Abstract

Teamwork is common in our own social interactions but it is not restricted to humans. Animals,from ants to whales, may also work in teams. Robots too, when part of certain multi-robotsystems, may also use teamwork. However, do we really have the same notion of a team in eachof these cases? Do the same definitions, concepts, and issues apply whether considering thesethree seemingly disparate types of agents: animals, robots and humans?

Building upon earlier work (Anderson & Franks, 2001; Anderson & McMillan, 2003), weuse both conceptual and experimental studies, as well as important illustrative examples acrossseveral fields, to argue that a natural, conceptual unification of teamwork in social systemsis indeed possible (Anderson & Franks, 2004). In other words, we demonstrate that a single,generic definition of teamwork—a task that necessarily requires multiple individuals to performdifferent subtasks simultaneously—applies in vastly different social systems and thus suggestthat teamwork is a fundamental aspect of cooperative activity in highly social systems. We con-sider how one might rigorously and objectively distinguish between teamwork and other closelyrelated phenomena, such as groupwork. Our pandisciplinary perspective also helps identify anumber of common misconceptions about teamwork. That is, researchers in one field, based onthe examples they usually encounter, make certain claims about teams which, when we com-pare teams across fields, are not true generally. Thus, by discrediting some of these claimswith revealing examples from other fields, we attempt to draw out some of the truly genericfeatures of teams. We propose that these misconceptions are that: 1) groupwork is synonymousto teamwork; 2) teamwork requires interindividual differences; 3) teamwork requires individualdifferences; 4) some tasks are inherently team tasks; 5) efficient teamwork requires direct com-munication; 6) teams require a leader; and 7) team members need to know the state and goalsof other members. Interestingly, team size, the number of members that constitute a team, isconsistently low, even among social organizations whose size (total number of members) differsin many orders of magnitude. We consider why this so. In summary, we hope that our workmay form the basis of a common framework which can be used to study teamwork objectively,independent of the particular form and nature of a team’s members.

References

Anderson, C. & Franks, N. R. (2001). Teams in animal societies. Behavioral Ecology 12: 534–540.

Anderson, C. & Franks, N. R. (2004). Teamwork in Animals, Robots and Humans. Adavances in the Studyof Behavior 33: in press.

Anderson, C. & McMillan, E. (2003). Of Ants and Men: Self-Organized Teams in Human and InsectSocieties. Emergence 5(2): 29–41.

— Page 175 —


Regulation of activity patternsin a social caterpillar

Emma Despland

Department of Biology, Concordia University, Montreal, H4B 1R6 Canada.E-mail: [email protected]

Abstract

Sociality among insects is not limited to the eusocial Hymenoptera, but also occurs in sev-eral other orders, and is widespread among larval Lepidoptera. Social behaviour, defined asreciprocal, cooperative communication between individuals, occurs in 300 species of caterpillarsspread across 27 families. The forest tent caterpillar (Malacosoma disstria) exhibits some of themost complex social behaviour found in the Lepidoptera, including trail following and synchro-nised activity. Forest tent caterpillars are nomadic foragers that build silk mats as temporarybivouacs between feeding sites. Colonies alternate between periods of quiescence and activity,and movement between feeding and resting sites occurs as a group. Locomotion is facilitated bya sophisticated system of trail-following and chemical communication: moving caterpillars spina fine thread of silk from their mouthparts and mark it with a highly stable non-volatile sterolpheromone. Social foraging of caterpillars colonies differs from that of eusocial Hymenopteraon one key point: caterpillars forage to meet their individual nutritional requirements ratherthan to feed the entire colony. A caterpillar’s pattern of foraging is therefore expected to de-pend not only on social cues from colony-mates but also on its own internal hunger state. Weprovide empirical evidence to suggest that the interaction between internal and external reg-ulation of behaviour generates patterns of cohesive group movement whose frequency dependson environmental conditions. First we assessed the role of social communication in regulatingindividual locomotion: we used Markov chain analysis to investigate the effects of pheromonetrails and colony-mates on an individual’s tendency to switch between behavioural states. Trailsincrease the probability that an individual begins motion and favour directional walking oversearching behaviour. However, an active individual in contact with quiescent colony-matesreturns to quiescence if the group does not join in activity. Social cues could thus generategroup synchronisation. Second, we examined the foraging schedules of groups of forest tentcaterpillars. The frequency of alternation between bouts of activity and quiescence increasedat higher temperature, due to a decrease in the duration of resting bouts. Caterpillars are ec-tothermic and therefore as external temperature rises, their metabolic rate increases and theyprocess their food more rapidly. The slower foraging schedule observed at lower temperaturewould thus reflect the fact that these insects require a longer quiescent interval following a mealto complete digestion prior to feeding again. In conclusion, a forest tent caterpillar colony ismade up of loosely coupled individuals, each with its own complex internal rhythms. Our datasuggest that social communication between these individuals could synchronise the alternationof foraging and resting bouts among colony members and that the schedule of group-level syn-chronised foraging depends on individual physiological processes. We demonstrate some of therules underlying individual caterpillar behaviour; the way in which these rules produce adaptivegroup-level foraging patterns remains to be explored.

— Page 176 —


Foraging success, recruitment benefitsand spatial resource distribution

Anna Dornhaus1, Franziska Klugl2, Christoph Oechslein2, Lars Chittka3, Frank Puppe2

1. School of Biological Sciences, University of Bristol, Bristol, BS8 1UG, England. Corresponding author :[email protected]

2. Deptartment of Artificial Intelligence and Applied Computing, University of Wurzburg, 97074 Wurzburg,Germany

3. School of Biological Sciences, Queen Mary College, London, E1 4NS, England

Abstract

Some social insects employ sophisticated strategies to recruit nestmates to food sources, whileother species seem to lack such communication systems. This could be due to different selectionpressures acting on different species, with some species standing more to gain from recruitment.But which social or environmental factors determine how much a social insect colony benefits?Experiments have shown that benefits of recruitment may strongly depend on the colony’s en-vironment. Using an individual-based model of honey bee foraging, we examined the influenceof spatial resource distribution and colony size on colony-level foraging success and on the ben-efits conveyed by communication. Our aim is to show that the experimental results can beexplained by the differences in resource distribution in the different habitats of bees. We havetherefore attempted to arrive at quantitatively realistic results by using parameter values takendirectly from experiments and performing extensive sensitivity analyses. Honey bees use thewaggle dance to indicate the location of food to others, but besides such recruitment, scoutingand selective abandoning also influence the pattern of food sources exploited. These processesare incorporated in our model. The model was implemented using the multi-agent simulationenvironment SeSAm, which enables a modeler to combine explicit behavior descriptions for sim-ulated “agents” with stochastic simulation. ”Agents” can make flexible and context-dependentdecisions, which are specified using rules. The formalization of inhomogenous environmental fea-tures and dynamics additionally leads to rich and detailed models. This can lead to difficultiesin estimating parameter values and makes testing the sensitivity of results to small parameterchanges necessary. Here, SeSAm supports transparent model implementation.

We performed simulations with honey bee-type recruitment and compared the resultingcolony-level foraging success with simulations where bees were not able to communicate food lo-cations. We found that foraging success was influenced much more by the quality of the availableresources than by their number. The main factor influencing the importance of recruitment wasresource patchiness, with environments with few, high-quality resources conveying the highestbenefits to a colony able to recruit. Colony size had only a linear influence on foraging success,i.e. the energy return per bee was the same for all tested colony sizes in our simulations. Therewas also no significant interaction effect of colony size and ability to recruit on the energy returnper bee. It thus seems that the resource availability and distribution is crucial for the evolutionof recruitment behaviors, whereas colony size does not seem to be a determining factor.

— Page 177 —


Modeling Disease Resistance throughSocial Interactions in Termites

Nina Fefferman1, Rebeca Rosengaus2, Daniel Calleri3, Marcio Pie3, James Traniello3

1. Biology Dept., Tufts University, Medford, MA, 02155 Corresponding author :[email protected]

2. Biology Dept., Northeastern University, Boston, MA, 02115

3. Biology Dept., Boston University, Boston, MA, 02215

Abstract

We employ a cellular automata model of interaction within a simplified termite nest to study theimpact of different defense mechanisms on pathogen transmission and the outbreak of infection.We allow a simple set of logical rules to govern insect movement and survival based on termitestage of development and individual health. Our model assumes a two dimensional, circular neststructure, with reproductives and first and second instar larvae at the center; older instars areallowed to move at random. We examine the disease-resistance consequences of various spatialarrangements, colony demographics, instar-dependent susceptibilities and social behaviors tounderstand their influence on disease outbreak following initial exposure.

By incorporating a probabilistic transmission of disease through contact either with otherdiseased individuals or a direct primary source of exposure, we project levels of colony infectionunder a variety of different circumstances to understand colony-level disease resistance. Weexamine a number of scenarios of infection spread, including constant, low levels of pathogenexposure at sites within the nest and periodic higher levels of exposure followed by a prolongedabsence of sources of primary infection. In this way we study the impact on infection control of1) the demographic distribution of the colony; 2) the spatial distribution and density of termiteswithin the nest; 3) pre-existing and disease-responsive levels of immunity within the colony; 4)the cannibalism of infected individuals; and 5) the modified mobility of exposed termites. Weexplore the interplay among these social and physiological mechanisms of resistance to determinethe optimal set of pathogen defenses. We identify the critical thresholds for both immunity anddisease density that are responsible for colony survival.

Additionally, we employ these methods to study the dynamics of socially transmitted immu-nity. By controlling the ability of immune individuals to induce immunity in naive individualsin the absence of direct contact with either an infected individual or an external source ofpathogen, we compare the outcome structure of the models with empirical data. We explorethe survival benefits of different ratios of immunized to naive individuals that may influence thesocial transfer of immunity.

Initial findings suggest a varied importance of different factors depending on initial colonyconditions, but invariably stabilizing to particular, sustainable colony population scenarios. Thebenefits of the early cannibalism of infected individuals seem stronger than other mechanismsof resistance in isolation while particular demographies appear to be beneficial at later stagesin a ’single onset followed by steady exposure’ scenario.

Although we used the dampwood termite Zootermopsis angusticollis and the fungus Metarhiz-ium anisopliae as a model host/pathogen system to inform our simulations, we believe that ourapproach provides general insight into the mechanisms and efficacy of disease resistance in socialinsects. In this way, we attempt to understand how colony-level disease resistance is related toindividual physiological processes and behavior.

— Page 178 —


Using Normalized Mutual Entropyto Quantify Division of Labor

Root Gorelick1, Susan M. Bertram2, Peter R. Killeen3, Jennifer H. Fewell2

1. School of Life Sciences, Arizona State University, Tempe, AZ, 85287, USACorresponding author : [email protected]

2. School of Life Sciences, Arizona State University, Tempe, AZ, 85287, USA

3. Department of Psychology, Arizona State University, Tempe, AZ, 85287, USA

Abstract

Division of labor is one of the primary adaptations of sociality and the focus of much theoreticalwork on self-organization. This work has been hampered by lack of a quantitative measure ofdivision of labor that can be applied across systems. A division of labor statistic should havethe following three properties. (i) It should permit comparison of division of labor statisticsacross data sets with different numbers of monitored individuals or tasks. (ii) It should reflectthe extent to which each individual is a specialist. (iii) It should reflect the extent to which eachtask requires specialized performance (the distribution of individuals across tasks).

We characterize division of labor (DOL) by a set of individuals, labeled indiv, a set of tasks,labeled tasks, and a bivariate probability distribution over these two sets. Create a data matrix inwhich each entry represents the time an individual spends on a task. Normalize the data matrixby dividing each entry by the total time spent by all individuals on all tasks. Calculate a divisionof labor statistic that is a function of the normalized matrix. Let H(indiv) and H(tasks) beShannon’s index, H(X) = −∑x∈X p(x). log(p(x)), over the marginal distributions of individualsand tasks. Let I(indiv, tasks) be the mutual entropy between individuals and tasks, which is

computed over all cells in the matrix. Then DOL = I(indiv,tasks)H(indiv)

, which asymptotically has anF-distribution.

DOL allows for phylogenetic comparison of division of labor across diverse systems andcontexts, even ranging across colonies of dozens to millions of individuals or across taxa whoselast common ancestor lived tens or hundreds of millions of years ago.

DOL can be used to test hypothesized increases in division of labor over ontogeny of a socialgroup. A majority of social insect colonies begin as a single or a few reproductive individuals.Self-organizational theory predicts that division of labor should increase with group size. Anontogenetic time history of DOL also provides a measure of when the colony has reached steadystate.

— Page 179 —


Evolving Swarm Intelligence Solutionsfor the Foraging Problem

Justin Hayes

Computer Science Department, George Mason University, Fairfax, VA 22204, USA.E-mail : [email protected]

Abstract

Swarm intelligence is a phenomenon in which many unsophisticated agents interact locally withtheir environment to produce global patterns of collective, emergent behavior. Stigmergy isa form of implicit communication via alteration of the shared environment. Designing mobilerobotics systems that incorporate these two concepts results in colonies of simple, cheap robotsthat can be used to solve complex problems such as foraging and terrain coverage in a robust,distributed manner. Using this design strategy can be difficult, however, because it is often notintuitively clear how to program emergence or swarm intelligence. This presentation describes asystem for evolving mobile robotics controllers that display swarm intelligence by using syntheticpheromones, a form of stigmergy, to solve a foraging problem. The controllers have simple, built-in behaviors (e.g. wandering, avoiding obstacles) and these are augmented by evolved rulesetsthat add stigmergic communication strategies. Examples of evolved rules are “if a pheromoneof type p is sensed nearby, move in that direction” and “if I am carrying a resource of typeq, deposit x units of type p pheromones.” Multiple rules in a ruleset must work togetherto yield effective foraging strategies. A genetic algorithm is used to evolve complete rulesets,resulting in a Pitt Approach classifier system. Each individual is represented by a fixed-lengthchromosome which is decoded into a ruleset and evaluated online in a foraging simulator. Inthe first experiment, the best evolved controllers behave similar to and perform as well asthe best hand-coded controller, which uses pheromone trails to lead from the nest to knownresource locations. This hand-coded strategy is similar to that used in several other syntheticpheromone foraging systems (e.g. Sauter, et al, 2001; Nakamura, Kurumatani, 1997). Thesystem also identifies an effective new strategy which uses pheromones as indicators for when toperform certain actions, as opposed to viewing them as trails. In the second experiment, whichuses a more difficult world consisting of resources with differing utilities, the system evolves acontroller that performs better than the best hand-coded controller, which is again strategicallysimilar to other pheromone foraging systems. To test their relative flexibility, the best evolvedand best hand-coded solutions are evaluated on a new, more complex world containing obstaclesas well as multi-typed resources. The evolved solution, which uses a refinement of the novelstrategy found in the first experiment, again outperforms the best hand-coded solution.

References

Nakamura, M. Kurumatani, K. 1997. Formation Mechanism of Pheromone Pattern and Control of ForagingBehavior in an Ant Colony Model. Artificial Life Vol 5 (C. Langton, K. Shimohara ed.): 67-74.

Sauter, J. Parunak, H. Brueckner, S. Matthews, R. 2001. Tuning Synthetic Pheromones with EvolutionaryComputing. Evolutionary Computation and Multi-Agent Systems Workshop at the 2001 Genetic andEvolutionary Computation Conference. http://citeseer.nj.nec.com/sauter01tuning.html.

— Page 180 —


Adapting Negative Feedback in Honey Bee ForagerAllocation to Parallel LTL Violation Discovery

Michael Jones

Department of Computer Science, Brigham Young U., Provo, UT 84602, USA. Corresponding author :[email protected]

Abstract

Honey bee colonies employ a rather effective strategy for allocating foragers to gather re-sources. The strategy is decentralized, fault-tolerant and adaptable. Each of these are desir-able characteristics of a parallel tool for discovering violations of linear temporal logic (LTL)properties—a problem that arises in semi-formal verification. In practice, the transition relationmodels the behavior of a protocol or circuit under development, and semi-formal methods areused to find errors that are difficult to find using other methods.

The adaptation of honey bee forager allocation to LTL violation discovery is based on thefact that LTL violations have two parts: an accept state (defined by the property under test)and a cycle containing the accept state. Accept states correspond to foraging sites and, cyclescontaining accept states correspond to the resource. The quality of an accept state correspondsto its depth in the transition graph, with shallower accept states having greater quality becausethey lead to simpler counterexamples. The foragers are individual nodes and, the search isconducted by random walk. Multicast, rather than a waggle dance, is used to advertise newaccept states between search nodes. When an accept state is found, the discovering nodeassesses the quality of the state and sends a request to a fraction of the participating nodes.Upon receiving such a request, a node begins a random walk search for cycles containing thegiven accept state. The search finishes when a node finds a cycle containing an accept state.

In nature, negative feedback is a critical feature of the honey bee allocation scheme. In honeybee colonies, negative feedback appears to be regulated by individual observations of indirectindicators of supply and demand. In LTL violation search, feedback loops regulate the fractionof nodes to which a new accept state is sent. The rate at which messages are received at ahost is used as an indicator of the need for more requests. This negative feedback prevents thesearch from flooding the network with requests while retaining the ability to widely advertisesites when needed.

This search process, with and without feedback, has been implemented and deployed on aBeowulf cluster of 128 dual-processor nodes located in the Fulton Supercomputing Center atBrigham Young University. Preliminary results indicate that using negative feedback yieldsa faster, more efficient search for problems with many accept states, but yields a slower, lessefficient search for models with few accept states.

Future work seeks to improve cooperative and individual behavior. We aim to improvecooperative behavior by determining if it is possible to use feedback in the honey bee schemeto optimally allocate labor in this problem, and if so, how. We intend to improve individualsearch by replacing random walk with a guided search using a Bayesian meta-heuristic that wehave recently developed.

— Page 181 —


Organization Of Nest Construction Via A NaturalSubstance: Models And Field Studies

Istvan Karsai1, Gabor Balazsi2, John W. Wenzel3

1. Department of Biological Sciences, East Tennessee University, Johnson City TN 37614, USA.Corresponding author : [email protected]

2. Department of Pathology, Northwestern University School of Medicine, Chicago, IL 60611 USA.

3. Department of Entomology, The Ohio State University, Columbus, OH 43210 USA

Abstract

In social insects, colony level complexity emerges from simple individual-level behaviors andinteractions. Nest construction of social wasps is an excellent model system to study decentral-ized behavioral regulation (Karsai and Wenzel 1998; 2000). Our field observations and exper-iments revealed that construction behavior of social wasps is based on parallel processing anddistributed decision making. We propose a model where the regulation of construction behaviorand the division of labor is based upon a natural substance (water) which is itself also a buildingmaterial. By experimenting with the model system we show that the model’s predictions agreewith observational data and cover a wide range of evolutionary transitions. According to theinternal and external parameters, the colony builds up storage of water. Through individualinteractions pulp and water foragers specialize from general laborer individuals and their ratiowill be balanced with a steady construction flow. Perturbations of the system alter the colony-level dynamics in a similar way as was observed in nature: pulp and water additions increasepulp arrivals and building rate; removal of pulp foragers decreases pulp input and constructionrate, but not water influx; removal of water foragers causes overcompensation of water inputafter a delay (Karsai and Balazsi, 2002)..

References

Karsai, I. and Wenzel J. W. 1988. Productivity, individual-level and colony-level flexibility, and organizationof work as consequences of colony size. Proc. Natl. Acad. Sci. USA 95: 8665–8669.

Karsai I., and Wenzel, J. W. 2000. Organization and regulation of nest construction behaviour in Metapoly-bia wasps. J. Ins. Behav. 13: 111–140.

Karsai I., and Balazsi, G. 2002. Organization of work via a natural substance: regulation of nest construc-tion in social wasps. J. Theor. Biol. 218: 549–565.

— Page 182 —


Do ants paint trucks better than chickens?Markets versus response thresholds for

distributed dynamic schedulingOran Kittithreerapronchai1, Carl Anderson1,2

1. School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332.

2. Current address: Icosystem Corporation, 10 Fawcett St., Cambridge, MA 02138, USA.Corresponding author : [email protected]

Abstract

We examined dynamic allocation of trucks to paint booths (Kittithreerapronchai and Ander-son, 2003), contrasting two previous schemes: market-based (e.g., Morley & Ekberg 1998) andresponse threshold-based (e.g., Campos et al. 2001; originally developed to explain adaptive,distributed division of labor in ants). Both schemes involve paint booth “agents” bidding againsteach other at auction for the trucks as they roll off the assembly line. The objective is to maxi-mize throughput and minimize paint wasteage by allocating trucks (with their customer-desiredpaint color) to booths currently painting that color, thereby avoiding both time and paint-changeover costs. With response thresholds, each booth has a threshold for each of 20 colors,and its bid for a truck (of color i) is a function of the global demand for trucks of color i, abooth’s current threshold for that color, and the time the truck would have to wait before itcould be painted. With markets, paint booths follow four simple rules: 1) Try to take anothertruck the same color as the current color; 2) Take particularly important jobs; 3) Take any jobto stay busy; 4) Do not bid if paint booth is down or queue is full. Morley jokes that theyendow their paint booths with “sentient chicken brains” (meaning the simple rules), and henceout title: a case of ants versus chickens.

We address several important issues: 1) How does one parameterize and optimize the system?What tradeoffs exist between cycle time (the average duration to process a truck) and colorswitching rate (the number of paint flushes per truck)? [we find surprisingly smooth responsesurfaces]; 2) How best does one choose among booths that have the same highest bid? [we findthat such “breaking tie” decisions can have significant effects]; and 3) How best does one createspecialist booths, ones that preferentially paint a single color? [we find that global, rather thanthe more usual local, feedback is superior and thus may be of great interest to the ant colonyoptimization community].

References

Campos, M., Bonabeau, E., Theraulaz, G. & Deneubourg, J.L. (2001) Dynamic scheduling and division oflabor in social insects. Adaptiv. Behav. 8: 83–92.

Kittithrreerapronchai, O., and Anderson, C. (2003) Do ants paint trucks better than chickens? Marketsversus response thresholds for distributed dynamic scheduling. In Proceedings, Swarm Intelligenceand its Applications, a special session of the Congress on Evolutionary Computation (CEC2003),Canberra, Australia, 8–12th December 2003.

Morley, R. & Ekberg, G. (1998) Self-organizing military logistics. In Embracing Complexity: A Colloquiumon the Application of Complex Adaptive Systems to Business, The Ernst and Young Center for BusinessInnovation, Cambridge, 97–102.

— Page 183 —


MASON: A Java Multi-Agent Simulation Library

Sean Luke, Gabriel Catalin Balan, and Liviu Panait


Abstract

We present MASON, a new multiagent simulation library written for Java. MASON is a general-purpose, single-process, discrete-event simulation library intended to support diverse multiagentexperiments ranging from 3D continuous robotics to social complexity networks to discretizedant foraging algorithms.

MASON is of special interest to the social insect algorithm community because its primarydesign goal is to support very large numbers of agents efficiently. As such, MASON is fasterthan scripted systems such as StarLogo or Breve, while still remaining portable and producingguaranteed replicable results. In accompanying works at this workshop, we have successfullyused the system to develop by hand, and to apply evolutionary computation to search for, antforaging behaviors involving thousands of ants and multiple pheremones.

Many multi-agent simulation environments are designed to meet the needs of a particulardiscipline; for example, TeamBots emphasizes robotics, while RePast, Swarm, and Ascape em-phasize discrete environments with networks of interacting social agents. In contrast, MASON’ssecond design goal is to make it easy to build a wide variety of multi-agent simulation envi-ronments (in our case, to test machine learning and artificial intelligence algorithms). Ratherthan provide an all-encompassing, rigid framework to meet this generalist criterion, MASON isa small, portable core around which specialized tools may be built for different tasks.

MASON consists of two parts: the simulator model library proper, and tools for visualizingand manipulating the model via a graphical interface. The model and the visualization librariesare completely separated. This separation fulfills a third design goal of the simulator: to runefficiently while headless on back-end server machines, but permit the experimenter to view ormodify checkpointed simulations during an experimental run. The model may be serialized toand recovered from storage at any time, and the visualization system may be added or removedfrom the model at any point. Runs may be repeated on any platform with identical results.

MASON’s model library contains a discrete-event schedule to represent time, plus variousspatial representations called neighborhoods. MASON has no prescribed set of spatial models:at present the library comes with plain and toroidal models for 2D discrete, 2D hexagonal, 2Dcontinuous, 3D discrete, 3D continuous, and graph spaces. Any object may be stored in theseneighborhoods, and the models may be used in any combination and any number in a givensimulation. MASON separates the notion of an “agent” from embodiedness: agents are simplyobjects which may be scheduled to be executed. When executed, agents typically manipulateobjects stored in the neighborhoods. Like any other object, agents may be embodied in theneighborhoods if this is appropriate to the simulation proper.

To enable complete separation of model from visualization, MASON adopts the notion ofportrayal objects which are tasked to display various neighborhoods or individual objects withinthose neighborhoods. Portrayals also permit a user to graphically manipulate the objects andneighborhoods. The library provides basic, easily extended portrayals for all of its model envi-ronments, including ones which draw 2D models in 3D.

MASON is open source and comes with several built-in example applications, includingant foraging, flocking behaviors in continuous 2D and 3D, continuous models simulating virusinfection and cooperative target observation, and 2D discrete and hexagonal heat bugs. Thefirst version of MASON, plus links to the other software packages mentioned above, can befound at http://cs.gmu.edu/~eclab/projects/mason/

— Page 184 —


The influence of group size and resourcedistribution on a group of central place foragers

Dhruba Naug, Graham Davis, John Wenzel

Ohio State University. Corresponding author : [email protected]

Abstract

In social insect biology, one of the most long standing debates concern the observation of anapparent decrease in per capita brood production with increasing group size (Michener 1964),a paradox to the supposed advantages of group living. One of the many explanations of thisobservation uses the tenets of central limit theorem to hypothesize that despite the decrease inmean per capita productivity, the reduced variance associated with increasing sample (group)size could sufficiently explain the paradox (Wenzel and Pickering, 1991). This reduction invariance with increasing group size was hypothesized to be a direct consequence of a moresteady foraging performance when the group size is larger. However, this explanation assumesthat foraging by individuals amounts to a sampling with replacement and the resource basefollows a Gaussian distribution. In contrast, the real life sampling space faced by most socialinsect groups is quite different and resources may be distributed uniformly or clumped over spaceand they may be infinite or finite over time. We therefore explore the influence of such differenttypes of resource distributions and sampling on the performance of central place foraging groupsof different sizes using a cellular automaton approach. The pertinent variables we measure are:a) total resource collected in a given time b) rate of resource collection, c) per capita foodcollection, d) ratio of successful to unsuccessful trips, e) search time per collected item, f)variance in resource collection across time. Our results show that the ideas of central limittheorem apply more pertinently to resources that are clumped and finite such as food. Foruniformly distributed infinite resources such as pulp, there are no significant benefits with anincrease in group size. The results also suggest a functional mechanism that could explain thelow per capita productivity group size paradox. In addition, we model the effect of learningto show that without learning there is no significant advantage in foraging associated with alarger group size when resources are distributed in a clumped fashion. We test the predictionsof the model with the foraging data from species of social wasps and there is a significant matchbetween the two.

References

Michener, C. D. 1964. Reproductive efficiency in relation to colony size in hymenopterous societies.Insectes. Soc. 11: 317–342.

Wenzel, J. W., Pickering, J. 1991. Cooperative foraging, productivity, and the central limit theorem.Proc. Natl. Acad. Sci. USA. 88: 36–38.

— Page 185 —


Navigation Networks: Biological Inspirationfor Large-Scale Multi-Robot Navigation

Keith J. O’Hara

College of Computing, Georgia Insitute of Technology, Atlanta, GA. E-mail: [email protected]

Abstract

We present a type of large-scale multi-robot navigation called a navigation network. In anavigation network, each robot senses stimuli and repeats its closest stimulus for others to sense,acting as a pseudo-stimulus. The brightness of the stimulus, or its magnitude, depends on thedistance to the stimulus. The stimulus gets dimmer as it is propagated through the network. Arobot can then find the true stimulus by hill-climbing, treating the pseudo-stimuli along the wayas waypoints. The technique is a physical manifestation of asynchronous or distributed dynamicprogramming, a multi-agent approach to dynamic programming. A navigation network uses aphysical, situated, multi-agent system to approximate the entire state space of the path-findingproblem. This approach to navigation does not rely on building maps, or even localization,just some way of sensing the other robots and being able to communicate with them. We areexploring how different communication strategies, team compositions, and team sizes impactperformance of large-scale multi-robot navigation.

— Page 186 —


Ant Foraging RevisitedLiviu Alexandru Panait and Sean Luke


Abstract

Previous artificial (non-biological) ant foraging models have to date relied to some degree on a priori knowl-edge of the environment, in the form of explicit gradients generated by the nest, by hard-coding the nestlocation in an easily-discoverable place, or by imbuing the artificial ants with the knowledge of the nestdirection. In contrast, the work presented solves ant foraging problems using two pheromones, one appliedwhen searching for food and the other when returning food items to the nest. This replaces the need for usingcomplicated devices to locate the nest source with simpler mechanisms based on pheromone information,which in turn reduces the ant system complexity. The resulting algorithm is orthogonal and simple, yet antsare able to establish increasingly efficient trails from the nest to the food in the presence of obstacles.

Depending on whether they are carrying food or not, ants are sensitive to and deposit specific pheromones.When foraging, ants move stochastically in the direction of increasing food pheromone, and deposit someamount of nest pheromone. If there is already more nest pheromone than the desired level, the ant depositsnothing. Otherwise, the ant “tops off” the pheromone value to the desired level. As the ant wanders awayfrom the nest, its desired level of nest pheromone drops. This decrease in deposited pheromone establishesan effective gradient. When the ant is carrying food, the movement and pheromone laying behaviors use theopposite pheromones than when exploring for food.

The basic algorithm is made stochastic in two ways. First, our model assumes that no more than tenants may occupy a grid square; an ant will move to its best choice among non-full, non-obstacle locations.Second, we add some degree of randomness to the ant’s choice of location. Ants move in random order.Ants live for 500 time steps; a new ant is born at the nest each time step unless the total number of ants isat its limit. Pheromones both evaporate and diffuse in the environment.

Our experimental substrate was a 100x100 non-toroidal grid environment, with a nest located at (70,70)and food at (20,20), and with 1000 ants. The experiments were run on the MASON simulation library, pre-sented in an accompanying paper at this workshop. In the experiments, different approaches were comparedby the total amount of food brought back to the nest. For each approach, we drew a sample of 50 runs, andapplied a Welch two-sample statistical test at 95% confidence. Separate experiments investigated foragingin the presence of obstacles; for this paper we did not use any such obstacle in the environments.

We first compared pheromone-depositing rules. The rule common in the literature is to simply add to theenvironment a desired — usually fixed and small — amount of pheromones. We compared this rule to our“topping off” rule described earlier. In this experiment, our “topping off” rule was statistically significantlysuperior to the common rule, returning well over twice the total amount of food.

We then tested system performance under different evaporation rates and different diffusion rates. Mod-erate diffusion rates significantly outperformed lower and higher amounts. Similarly, small evaporation ratesperformed best, but not statistically better than no evaporation at all. Large evaporation performed poorly.

We also compared different rates of exploration versus exploitation, by varying the degree of stochasticityin the movement rules. We imagined that a moderate degree of exploration would permit a good balancebetween finding shorter paths and retrieving many food items. However, experiments showed that thegreedier the behavior, the significantly higher the total amount of food foraged. Further analysis revealedthat, due to a constraint on maximum number of ants per location, the algorithm was already performing aform of hill-climbing for shorter trails. Ants “bumped” to less desirable locations would eventually smoothout the path until it became optimal. In our final experiment, we laid down a clearly suboptimal trail ofpheromones for ants to follow, and observed that the trail was successfully smoothed to the optimal one.

— Page 187 —


Implementing Collective RoboticConstruction with Blind Bulldozing

Chris A. C. Parker, Hong Zhang

Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada.Corresponding author : [email protected]

Abstract

Since the beginning of collective robotics, social insects and their algorithms have been of greatinterest. In many cases, the inspiration for a particular collective robotic application has itsroots in biology. In this research, we present the results of a series of experiments that appliedthe nest construction behaviour of the ant Leptothorax tubero-interruptus (Franks et al, 1992)to a team of robots. We conducted two series of experiments with robot teams. A pair ofrobotic bulldozers were built to carry out a form of the “blind bulldozing” construction algo-rithm (Franks et al, 1992). One or two bulldozers operated in a scaled up version of Franks’experimental environment (Parker and Zhang, 2002). The robots built their nests by expandingan initial clearing in a field of evenly spread gravel. Two robots were able to build useful nestsmore reliably than one, even though explicit cooperation was absent, by reducing the numberof isolated piles of gravel in their nest, which we call inclusions. Following our first set of ex-periments, we revisited the insect literature. The robots were redesigned and a larger team wasconstructed. Now, up to four robots were deployed at a time with controllers that were believedto produce a behaviour more similar to the actual behaviour of the Leptothorax ants than thatof our earlier design. These robots were able to eliminate the inclusion problem of our firstexperiments. With their simple force controlled plowing, teams of up to four robots were ableto build open, near-circular nests. The robots’ rate of progress and final products agreed witha mathematical model that was developed to describe the blind bulldozing process (Parker andZhang, 2003). Also, the robots’ final products closely resembled an actual Leptothorax nest. Akey difference between our robots and their insect inspirations was that the robots carried outconstruction only by pushing. The ants, on the other hand, could pick up and place buildingmaterial. This difference, we believe, explains the most significant discrepancy between thebehaviours of the ants and of our robots. Whereas the ants built nests of sizes tailored to theircolony’s sizes, the sizes of the nests built by our robots were dictated by their environment.In the long run, we believe that our research in collective construction will lead to multi-robotsystems that build geometrically interesting structures without centralized control or explicitcommunication.

References

Franks, N. R. et al 1992. Self-organizing nest construction in ants: sophisticated building by blind bull-dozing. Journal of Animal Behaviour 44: 357–375.

Parker, C. and Zhang, H. 2002. Robot Collective Construction by Blind Bulldozing. In Proc. IEEEConference on Systems, Cybernetics and Man 2002.

Parker, C. A. C. and Zhang, H. 2003. Blind Bulldozing: Multiple Robot Nest Construction. To appear inProc. IROS 2003.

— Page 188 —


Hybrid Stigmergy for Information ExtractionH. Van Dyke Parunak, Peter Weinstein, Sven Brueckner, John Sauter

Altarum Institute, 3520 Green Court Suite 300, Ann Arbor, MI 48109, USA.Corresponding author : [email protected]

Abstract

Instances of stigmergy may be classified by the kinds of changes that agents make in theirenvironments, and by the structure of that environment. We are combining stigmergic mecha-nisms that vary across both dimensions in an information extraction application. Our systemidentifies information in a mass of documents that matches a submitted concept map, throughthree levels that execute concurrently: concept clustering, relation recognition, and scenario self-organization. Concept Clustering: To increase the density of relevant documents for the othertwo levels, paragraphs within documents seek to cluster themselves on virtual processors withsimilar paragraphs, using the quantitative sematectonic algorithm used by natural ants in ceme-tery sorting. The similarity metric employed is a similarity metric over the concepts attestedin the concept map. The structure in which the stigmergy takes place is an arbitrary computernetwork, and such networks tend to be small-world and scale-free. Relation Recognition: Arelation consists of a verbal concept and one or more related nominal concepts. Ants represent-ing individual relations, spawned from the concept map, explore the clustered paragraphs todetermine the presence of their relations. These ants use multiple flavors of digital pheromones(marker-based qualitative stigmergy) to recruit their peers to the appropriate cluster of doc-uments and to the most promising documents within a cluster. The environmental structureis the graph union of the processor topography used in concept clustering and a linear topol-ogy defined by the order of paragraphs within each document. Scenario Self-Organization:When a relation is substantiated in one or more documents, it is eligible to be assembled into anextended structure with other relations to form a scenario. Two stigmergic processes guide theunification of concepts in relations to form extended structures. First, the concepts in differentrelations seek to find one another in a semantic lattice using digital pheromones (marker-basedstigmergy). Second, the strength of attraction of two concepts is increased if other members ofthe relations in which they are embedded have also linked together (qualitative sematectonicstigmergy). These processes operate within two structures: the semantic lattice and the growingconcept map. Preliminary experiments show that these mechanisms can be combined, and thatthey can enable self-organization in structures other than the manifolds that are ubiquitous innatural systems.

This study was supported and monitored by the Advanced Research and Development Activ-ity (ARDA) and the National Imagery and Mapping Agency (NIMA) under Contract NumberNMA401-02-C-0020. The views, opinions, and findings contained in this report are those of theauthors and should not be construed as an official Department of Defense position, policy, ordecision, unless so designated by other official documentation.

— Page 189 —


Modeling, Analysis, and Biomimicry of HoneyBee Distributed Decision Making

Kevin M. Passino

Dept. Electrical Engineering, The Ohio State University, 2015 Neil Ave., Columbus, OH, 43210E-mail: [email protected]

Abstract

The poster overviews the author’s initial efforts at using biomimicry of honey bee distributeddecision making processes to solve technological problems. First, a model of the social foragingof honey bees is described. Second, our progress on the development of a new model of the honeybee nest site selection process is summarized (this model is being developed in collaborationwith T.D. Seeley at Cornell University). The main intent, however, is to present ideas forhow to use biomimicry of honey bee distributed decision making to solve optimization anddistributed control problems. For this, connections to parallel nongradient optimization will beidentified. To study the use of social bee algorithms for distributed control we have designed twoexperiments, one for planar temperature control (to achieve maximum uniform temperature ona plane), and another to emulate juggling of balls (where resource allocation is needed to keepthe balls up). The poster will describe the status of our efforts to implement (i) social foragingalgorithms that seek to allocate foragers (heat) to “eat” error from the temperature grid, whereerror is the difference between the current temperature in a zone and the desired temperature ateach point in the plane; and (ii) cooperative agreement algorithms to determine the minimumball height (best nest site) with limited communications in order to decide which ball to liftnext. Our choice of using non-robotic applications will be discussed in order to motivate thepotential value of the use of biomimicry for range of other engineering problems in optimization,control, and automation.

Keywords: honey bee social foraging, honey bee nest site selection, biomimicry, temperature control, resourceallocation

References

Passino, K.M. 2004 Biomimicry for Optimization, Control, and Automation, in press, Springer-Verlag,London.

Quijano, N., Gil, A. E., and Passino, K. M. Experiments for Distributed and Networked Dynamic ResourceAllocation, Scheduling, and Control.Submitted to IEEE Control Systems Magazine, Sept. 2003.

Quijano, N. and Passino, K. M. A Multizone Temperature Control Experiment for Development andAnalysis of Dynamic Resource Allocation Strategies. Submitted for journal publication, Aug. 2003.

— Page 190 —


Retinue Behavior in Honeybee Colonies:Self-Organization or Pheromonal Control?

Holger Scharpenberg and Robin F.A. Moritz

Inst. of Zoology, Martin-Luther-University Halle-Wittenberg, 06099 Halle, GermanyCorresponding author : [email protected]

Abstract

The queen substance (9ODA) as main compound of the honeybee queen’s mandibular glandpheromones is important for regulating many colony processes. They operate at the global level(e.g. during swarming) but also elicit highly specific local reactions: bees are attracted to 9ODA-signal to form the retinue around the queen. Participation in the retinue is not static. After acertain time workers leave the retinue to switch to other tasks, while other bees join the queen inturn. Moreover, not all workers are attracted. Some workers are actually repelled by the queen’ssignal and actively avoid her proximity. As a result these workers start to produce queenlikepheromones themselves and elicit retinue behavior in other workers as well. To analyze theorganizing principles and the underlying mechanisms of retinue behavior, we created a model,which enables us to explain the opposed behavioral patterns by means of variable responsethresholds of the workers. Each worker in the colony network is modeled as a boolean switchingelement: it can be either ’suppressed’ (0), if the 9ODA concentration rises above the thresholdlevel or ’unsuppressed’ (1), if external pheromone-levels are below the threshold. Suppressionterminates own 9ODA-synthesis in worker bees and results in a reduction of the thresholdlevel. In future encounters lower 9ODA levels are sufficient to suppress the bee. As long as theexternal 9ODA level exceeds the threshold, bees are attracted towards the queen. As a resultthe local concentration of external 9ODA further rises. A worker with less queen contact isunsuppressed, starts to synthesize own 9ODA and raises its suppression-threshold. Thus higherexternal 9ODA levels are needed to suppress this bee. Both repellence and threshold-reductionlower the likelihood to be suppressed by the queen’s signal in future. We generated a multi-agent simulation where the worker’s behavior on a virtual comb depends on both the variableresponse threshold of each bee and the intensity of the pheromone stimulus. To validate patternformation processes of retinue behavior obtained in silico we use empirical in vivo data in smallexperimental groups. The coordinates of the individual workers on the comb were detected overa period of 7 days to calculate the distance (mean and variance) towards the queen. We founda large variance of distances between the worker and the queen indicating that there is a highdegree of fluctuation in the retinue. Our simulation of a self-organizing response threshold-model is quite suitable to explain the pattern formation process of retinue behavior in reality:bees habituate to the 9ODA-stimulus via changing threshold-levels, but are also able to sensitizeagain after leaving the queen’s proximity by means of self-organizing feedback. The frequencyof participation in the retinue differs significantly among workers showing that the variableattractiveness of the 9ODA-signal depends on the internal state of each bee. The production ofqueenlike pheromone may be closely connected to the increased frequency of queen avoidance.

— Page 191 —


A Measure of Two-Dimensional SortednessBased on Brood Sorting in Ants

Ana B. Sendova-Franks

School of Mathematical Sciences & Intelligent Autonomous Systems Laboratory, Faculty of Computing,Engineering and Mathematical Sciences, University of the West of England, Frenchay Campus,

Coldharbour Lane, Bristol BS16 1QY, U.K.

Abstract

The sorting of brood in Leptothorax ant colonies (Franks & Sendova-Franks 1992) is a primeexample of pattern formation in social insects that has captured the imagination of computerscientists and roboticists (Bonabeau et al., 1999; Wilson et al., 2003). Leptothorax ant coloniesmake their nests in almost flat crevices in rocks. They sort their brood in concentric annuli sothat, items of the smallest of five brood types are in the centre of the pattern and those of thelargest brood type, on its periphery.

A measure of how well the pattern is sorted (a measure of sortedness) would facilitate theunderstanding of the mechanisms underlying brood sorting in ants. Furthermore, it would helpcomparisons between (a) the patterns produced by ants and (b) the patterns produced by robotsfollowing algorithms gleaned from the behaviour of ants. Wilson et al. (2003) have used suc-cessfully a performance metric for annular sortedness to compare three mechanisms for creatingannular structures using minimalist robots. Their metric consists of four components evaluat-ing each of the four pattern characteristics: separation of items of different type, compactness,shape and completeness.

Here our approach is motivated by the long-term aim of understanding the network propertiesof ant brood patterns mapped as point patterns in two dimensions. Our measure of sortednessis based on a connectivity measure for the Gabriel graph (GG, Gabriel & Sokal 1969, Matula &Sokal 1980, Okabe et al. 1992), a type of planar graph. Our choice of connectivity measure is β(Haggett & Chorley 1969, James et al. 1970), which is equal to the ratio of edges to vertices ina graph. The value of β is known for the GG of patterns with regularly spaced vertices in theplane such as a hexagon on a triangular grid and a square on a Cartesian grid. This provides uswith a theoretical approximation model of the maximum connectivity we could expect for thecentral circle and the concentric annuli of the brood point pattern in the ants.

We construct the GG for each of 24 brood point patterns belonging to 24 different antcolonies. We calculate b for each of the five brood types when (a) within the whole broodpattern and (b) separately—to control for the dependence of connectivity on the number ofvertices. A general linear model of the proportional connectivity (connectivity within the wholebrood pattern divided by connectivity separately) demonstrates that it depends on brood typeand also on the number of vertices but not on the interaction between the two. This resultprovides us with a ranking of the five brood types. We use these ranks as weights for therelative connectivity of different brood types in the calculation of our sortedness measure.

We apply the sortedness measure we have developed to two groups of brood pattern: (a)patterns that develop over time (six months) in each of three colonies and (b) patterns re-established after an emigration to a new nest site in each of another three colonies. Finally,we discuss the advantages and shortcomings of our sortedness measure in a wider context. Inparticular, we consider the distinction between 2-D sortedness in concentric annuli versus par-allel bands or clusters as well as approaches for future work that could build on the present study.

— Page 192 —


On the Use of the Term “Stigmergy”Dylan A. Shell, Maja J Mataric

Interaction Laboratory, Department of Computer Science, University of Southern California, Los Angeles,CA USA 90089-0781 Corresponding author : [email protected]

Abstract

The concept of stigmergy has been applied to a number of fields since it was first coined in the1950’s (Grasse, 1959). We have compiled a bibliography11of over a hundred works that makeuse of the term, with the focus on artificial applications (robotics, optimization, routing,etc)but also including important works from entomology, and a few that treat the general conceptof stigmergy. Based on the varied perspectives of these documents, it is clear that the termhas come to mean different things to different people. One possible source of confusion is itsrelationship with self-organization. Bonabeau et al. (1999) mention that, when stigmergy andself-organization work together, they can account for a wide range of observable phenomena.This view, in which the two phenomena are seen as cooperating but distinct, is not shared byall (cf. Michalareas & Sacks, 2001). For instance, some authors consider positive feedback animportant part of stigmergy itself, while others consider the two phenomena to overlap (An-derson, 2002). Most frequently, authors use the term only in passing, often with a one-linedefinition claiming that it is a form of indirect communication. Some have taken stigmergy tomean someth ing similar to generative communication, while others mention shared externalmemory metaphors (e.g., Peshkin et al., 1999). Still others tend to take a less general view; anumber of papers from the ‘pheromone computation’ literature consider only a very limited no-tion of what stigmergy is, one that includes features of self-organization. Although P. P.Grasse’s(1959) original definition is well cited (in over half of the papers in our collection), even it hasbeen interpreted in multiple ways. The differences are typically subtle enough and based onperipheral examples, to further perpetuate confusion and misuse of the term.

References

Anderson, C. 2002. Self-Organization in Relation to Several Similar Concepts: Are the boundaries toSelf-Organization Indistinct? Biological Bulletin 202: 247-255

Bonabeau, E. Theraulaz, G. & Deneubourg, J.-L. 1999. Swarm Intelligence: From Natural to ArtificialSystems. Oxford University Press, New York, USA.

Grasse, P.-P. 1959. La Reconstruction du nid et les Coordinations Inter-Individuelles chez BellicositermesNatalensis et Cubitermes sp. La theorie de la Stigmergie: Essai d’interpretation du Comportementdes Termites Constructeurs. Insectes Sociaux 6: 41-81

Michalareas, T. and Sacks, L. 2001. Stigmergic Techniques for Solving Multi-constraint Routing for PacketNetworks Proc. 1st International Conference on Networking, Colmar, France: 687-697

Peshkin, L.M. Meuleau, N. & Kaelbling, L.P. 1999. Learning Policies with External Memory. Proc. 16thInternational Conference on Machine Learning (ICML-99), San Francisco, CA, USA: 307-314

11Available for download at http://robotics.usc.edu/~dshell/stigmergy.html

— Page 193 —


N O T E S

— Page 194 —


N O T E S

— Page 195 —


N O T E S

— Page 196 —


AUTHOR INDEXAbu-Mostafa, Y.S. . . . . . . . . . . . . . . . . . 91Anderson, C. . . . . . . . . . . . . . . . 9, 175, 183Arkin, R.C. . . . . . . . . . . . . . . . . . . . . . . . . . . 5Balazsi, G. . . . . . . . . . . . . . . . . . . . . . . . . . 182Balch, T. . . . . . . . . . . . . . . . . . . . . . . . 53, 161Bertram, S.M. . . . . . . . . . . . . . . . . . . . . .179Bonabeau, E. . . . . . . . . . . . . . . . . . . . . .6, 17Brueckner, S. . . . . . . . . . . . . . . . . . . . . . . 189Buhl, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Brown, T. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Calleri, D. . . . . . . . . . . . . . . . . . . . . . . . . . 178Catalin Balan, G. . . . . . . . . . . . . . . . . . 184Chase, I.D. . . . . . . . . . . . . . . . . . . . . . . . . . .41Chittka, L. . . . . . . . . . . . . . . . . . . . . . . . . . 177Crailsheim, K. . . . . . . . . . . . . . . . . . . . . .145Davis, G. . . . . . . . . . . . . . . . . . . . . . . . . . . .185Deneubourg, J.L. . . . . . . . . . . . . . . . . . . 33Deshmukh, A.V. . . . . . . . . . . . . . . . . . . . 41Despland, E. . . . . . . . . . . . . . . . . . . . . . . .176Dornhaus, A. . . . . . . . . . . . . . . .47, 75, 177Egerstedt, M. . . . . . . . . . . . . . . . . . . . . . 107Fefferman, N. . . . . . . . . . . . . . . . . . . . . . .178Feldman, A. . . . . . . . . . . . . . . . . . . . . . . . . 53Fewell, J.H. . . . . . . . . . . . . . . . . . . . . . . . . 179Franks, N.R. . . . . . . . . . . . . . . . . . . . 47, 175Funes, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Gautrais, J. . . . . . . . . . . . . . . . . . . . . . . . . . 33Gorelick, R. . . . . . . . . . . . . . . . . . . . . . . . 179Hayes, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . 180Jones, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Jones, M. . . . . . . . . . . . . . . . . . . . . . . . . . . 181Joshi, S.S. . . . . . . . . . . . . . . . . . . . . . . . . . . .68Karsai, I. . . . . . . . . . . . . . . . . . . . . . . . . . . .182Killeen, P.R. . . . . . . . . . . . . . . . . . . . . . . .179Kirschenbaum, M. . . . . . . . . . . . . . . . . 123Kittithreerapronchai, O. . . . . . . . . . . 183Klugl, F. . . . . . . . . . . . . . . . . . . . . . . . 75, 177Krothapalli, N. . . . . . . . . . . . . . . . . . . . . . 41Kuntz, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Lerman, K. . . . . . . . . . . . . . . . . . . . . . . . . . 83Li, L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Luke, S. . . . . . . . . . . . . . . . . . . . . . . . 184, 187Martinoli, A. . . . . . . . . . . . . . . . . . . . . . . . 91

Mataric, M.J. . . . . . . . . . . . . . . . . . . 60, 193Melhuish, C. . . . . . . . . . . . . . . . . . . . . . . .153Merkle, D. . . . . . . . . . . . . . . . . . . . . . . . . . . 99Middendorf, M. . . . . . . . . . . . . . . . . . . . . 99Moritz, R.F.A. . . . . . . . . . . . . . . . . . . . . 191Muhammad, A. . . . . . . . . . . . . . . . . . . . 107Murton, J. . . . . . . . . . . . . . . . . . . . . . . . . . 123Nakrani, S. . . . . . . . . . . . . . . . . . . . . . . . . 115Naug, D. . . . . . . . . . . . . . . . . . . . . . . . . . . . 185O’Hara, K.J. . . . . . . . . . . . . . . . . . . . . . . 186Oeschlein, C. . . . . . . . . . . . . . . . . . . . . . . 177Orme, B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Palmer, D.W. . . . . . . . . . . . . . . . . . . . . . 123Panait, L. . . . . . . . . . . . . . . . . . . . . . 184, 187Parker, C.A.C. . . . . . . . . . . . . . . . . . . . . 188Parunak, H. van Dyke. . . . . . . . . . . . 189Passino, K.M. . . . . . . . . . . . . . . . . . . . . . 190Pie, M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178Puppe, F. . . . . . . . . . . . . . . . . . . . . . . . . . . 177Quinn, R.D. . . . . . . . . . . . . . . . . . . . . . . . 123Reznikova, Z. . . . . . . . . . . . . . . . . . . . . . . 139Rosengaus, R. . . . . . . . . . . . . . . . . . . . . . 178Ryabko, B. . . . . . . . . . . . . . . . . . . . . . . . . 139Sauter, J. . . . . . . . . . . . . . . . . . . . . . . . . . . 189Schank, J.C. . . . . . . . . . . . . . . . . . . . . . . . . 68Scharpenberg, H. . . . . . . . . . . . . . . . . . 191Schmickl, T. . . . . . . . . . . . . . . . . . . . . . . . 145Scholes, S. . . . . . . . . . . . . . . . . . . . . . . . . . 153Sendova-Franks, A.B. . . . . . . . . 153, 192Shell, D.A. . . . . . . . . . . . . . . . . . . . . . . . . . 193Theraulaz, G. . . . . . . . . . . . . . . . . . . . . 7, 33Traniello, J. . . . . . . . . . . . . . . . . . . . . . . . .178Tovey, C. . . . . . . . . . . . . . . . . . . . . . . . . . . .115Triebig, C. . . . . . . . . . . . . . . . . . . . . . . . . . . 75Ulam, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Umre, A. . . . . . . . . . . . . . . . . . . . . . . . . . . .168Vaidyanathan, R. . . . . . . . . . . . . . . . . . 123Wakeman, I. . . . . . . . . . . . . . . . . . . . . . . . 168Weinstein, P. . . . . . . . . . . . . . . . . . . . . . . 189Wenzel, J.W. . . . . . . . . . . . . . . . . . 182, 185Wilson, M. . . . . . . . . . . . . . . . . . . . . . . . . .153Zhang, H. . . . . . . . . . . . . . . . . . . . . . . . . . . 188

— Page 197 —

Proceedings - Georgia Institute of Technology · J´erˆome Buhl, Jacques Gautrais, Jean-Louis...

Documents

Transcript of Proceedings - Georgia Institute of Technology · J´erˆome Buhl, Jacques Gautrais, Jean-Louis...