BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting...

12
1 Provably Competitive Adaptive Routing Baruch Awerbuch (JHU) David Holmer (JHU) Robert Kleinberg (MIT) Herbert Rubens (JHU) Abstract— An ad hoc wireless network is an au- tonomous self-organizing system of mobile nodes con- nected by wireless links where nodes not in direct range communicate via intermediary nodes. Routing in ad hoc networks is a challenging problem as a result of highly dynamic topology as well as bandwidth and energy constraints. In addition, security is critical in these networks due to the accessibility of the shared wireless medium and the cooperative nature of ad hoc networks. However, none of the existing routing algorithms can withstand a dynamic proactive adversarial attack. The routing protocol presented in this work attempts to provide throughput competitive route selection against an adaptive adversary. A proof of the convergence time of our algorithm is presented as well as preliminary simulation results. I. BACKGROUND The basic service offered by every node in an ad- hoc network is that of routing packets from their source to their ultimate destination. In general, rout- ing protocols are susceptible to a wide variety of attacks. For example, a malicious node may perform a denial of service attack by selectively jamming some areas of the network. A great deal of work has been done in terms of guaranteeing practical security considerations in existing network protocols (see description of existing work provided in Section II). In practice, adversarial attacks observed and documented in ad hoc networks might not be overly sophisticated. The ease of access to the medium has allowed extremely basic attacks to cause a great deal of damage. Consequently, such attacks can be thwarted by simple yet effective meth- ods. Existing work in the literature considered a number of strong adversary models. For example, [1] consid- ers a random fault pattern; [2] deals with a static fault pattern and [3] deals with an oblivious (non-adaptive) pattern. Our goal is to design routing protocols for net- works that are provably tolerant of arbitrary adaptive DOS attacks. The adversary that we will consider selectively attacks packets on a given node or link. This adversary benefits from knowledge of the traffic pattern (including packet contents); this includes all current traffic and all past traffic history. As a result, the algorithms and analysis techniques used in the previous work will not apply. Existing methods that do not ignore sophisticated adaptive attacks either use brute force (flooding) or assume the existence of some trusted servers or routers (see section II). We do not wish to make such restricting assumptions. As a result, the task of designing a throughput competitive routing algorithm is much harder. It may appear that our adversarial routing model may lead to impractical algorithms in benign (non- adversarial) settings. However, routing algorithms similar to the one studied here were developed and tested in real network environments by British Telecomm and NTT for both wired and wireless networks with superior results [6], [20], [9], [7], [17], [18], [5], [10]. AntNet, a particular such algorithm, was tested in routing for data communication net- works [6]. The algorithm performed better than OSPF, distributed Bellman-Ford with various dynamic met- rics, and various modifications of shortest path with a dynamic cost metric [4], [17]. Our contribution: We propose a new algorithm for adaptively selecting routing paths in a network with dynamic adversarial edge failures, and we give a rigorous mathematical analysis of this algorithm, proving that its packet loss will match the minimal cumulative loss of any path, up to an additive error which is sublinear in the number of trials. The general framework we propose is appropriate for analyzing routing protocols for networks operating under the extremely strong adversarial model specified above. Such strong models have not been considered in the literature to the best of our knowledge. In fact, adaptive dynamic denial of service attacks are suf- ficient to break most existing algorithmic work. (In Section III, we briefly describe why DoS attacks are so devastating.)

Transcript of BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting...

Page 1: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

1

Provably Competitive Adaptive RoutingBaruch Awerbuch (JHU) David Holmer (JHU) Robert Kleinberg (MIT) Herbert Rubens (JHU)

Abstract— An ad hoc wireless network is an au-tonomous self-organizing system of mobile nodes con-nected by wireless links where nodes not in direct rangecommunicate via intermediary nodes. Routing in ad hocnetworks is a challenging problem as a result of highlydynamic topology as well as bandwidth and energyconstraints. In addition, security is critical in thesenetworks due to the accessibility of the shared wirelessmedium and the cooperative nature of ad hoc networks.However, none of the existing routing algorithms canwithstand a dynamic proactive adversarial attack. Therouting protocol presented in this work attempts toprovide throughput competitive route selection againstan adaptiveadversary. A proof of the convergence timeof our algorithm is presented as well as preliminarysimulation results.

I. BACKGROUND

The basic service offered by every node in an ad-hoc network is that ofrouting packetsfrom theirsource to their ultimate destination. In general, rout-ing protocols are susceptible to a wide variety ofattacks. For example, a malicious node may perform adenial of service attack by selectively jamming someareas of the network.

A great deal of work has been done in termsof guaranteeing practical security considerations inexisting network protocols (see description of existingwork provided in Section II). In practice, adversarialattacks observed and documented in ad hoc networksmight not be overly sophisticated. The ease of accessto the medium has allowed extremely basic attacksto cause a great deal of damage. Consequently, suchattacks can be thwarted by simple yet effective meth-ods.

Existing work in the literature considered a numberof strong adversary models. For example, [1] consid-ers a random fault pattern; [2] deals with a static faultpattern and [3] deals with anoblivious(non-adaptive)pattern.

Our goal is to design routing protocols for net-works that are provably tolerant ofarbitrary adaptiveDOS attacks. The adversary that we will consider

selectively attacks packets on a given node or link.This adversary benefits from knowledge of the trafficpattern (including packet contents); this includes allcurrent traffic and all past traffic history.

As a result, the algorithms and analysis techniquesused in the previous work will not apply. Existingmethods that do not ignore sophisticated adaptiveattacks either use brute force (flooding) or assumethe existence ofsometrusted servers or routers (seesection II). We do not wish to make such restrictingassumptions. As a result, the task of designing athroughput competitive routing algorithm is muchharder.

It may appear that our adversarial routing modelmay lead to impractical algorithms in benign (non-adversarial) settings. However, routing algorithmssimilar to the one studied here were developedand tested in real network environments by BritishTelecomm and NTT for both wired and wirelessnetworks with superior results [6], [20], [9], [7], [17],[18], [5], [10]. AntNet, a particular such algorithm,was tested in routing for data communication net-works [6]. The algorithm performed better than OSPF,distributed Bellman-Ford with various dynamic met-rics, and various modifications of shortest path witha dynamic cost metric [4], [17].

Our contribution: We propose a new algorithmfor adaptively selecting routing paths in a networkwith dynamic adversarial edge failures, and we givea rigorous mathematical analysis of this algorithm,proving that its packet loss will match the minimalcumulative loss of any path, up to an additive errorwhich is sublinear in the number of trials. The generalframework we propose is appropriate for analyzingrouting protocols for networks operating under theextremely strong adversarial model specified above.Such strong models have not been considered inthe literature to the best of our knowledge. In fact,adaptive dynamic denial of service attacks are suf-ficient to break most existing algorithmic work. (InSection III, we briefly describe why DoS attacks areso devastating.)

Page 2: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

2

What distinguishes the present work is our in-sistence on proving that undercompletely arbitraryadversarial behavior, with essentially no assumptionsabout the network, the packet loss of our protocol willessentially match the minimal cumulative loss (i.e.,sum of losses of individual links) of any path. Whileit may seem counter-intuitive that such a goal can beachieved, the key is that although we assume an ar-bitrary dynamicadversary, we compare the algorithmagainst the beststatic path; if the adversary workshard to damage the algorithm’s throughput, it mustnecessarily inflict a large cumulative loss oneverypath in the network.

At this point, one could debate whether such so-phisticated adversaries are ever going to surface inreality, or whether they are only monsters in ourimagination. Our counter-argument is that if, as wetheorize, ubiquitous wireless networks become theunderlying fabric that binds our society together, wecannot afford not to plan against an adversary witharbitrary powers.

The rest of this paper is organized as follows.In Sections II and III, we review some of the ex-isting work in this area and survey some of thechallenges which illustrate why our problem is notsolved by simpler approaches. In Sections IV weoutline the main ideas underlying our algorithm,which is specified precisely in Section V, analyzedmathematically in Section VI, and experimentallytested in Section VII. The algorithm contains sometunable parameters, e.g. a “sampling rate”δ and a“learning rate”β. Adjustingβ allows one to smoothlyinterpolate between the greedy algorithm (β near0)and algorithms which are less responsive but morerobust against an adaptive adversary (β near 1). Themathematical analysis in Section VI indicates thatsettingβ very close to1 guarantees good performanceagainst an arbitrary adaptive adversarial attack; how-ever in many circumstances (e.g. random edge fail-ures) greedy approaches achieve faster convergence,which leads one to expect superior performance froma smaller value ofβ in such circumstances. Thesetheoretical considerations are substantiated by theexperiments in Section VII, where the algorithm istested in a variety of network topologies with randomedge failures, and smaller values ofβ indeed achievefaster convergence.

II. EXISTING WORK

a) Algorithmic Work: The only algorithmic re-sults that possibly work under this strong adversarialmodel are based on a computational learning frame-work [12], [8]. Near optimal learning algorithms,withreliable global informationfor finding a shortest pathin a graph, where at each time a differentknowncostis assigned to each edge, were studied in [12], [8];these solutions have an exponential computationaloverhead. Schemes with polynomial computationaloverhead were recently suggested by [21], [11]. Thesolutions in [12], [8] and [21], [11] correspond to“link-state” routing, and the case where the adversarycan only exercise dynamic DOS attacks, but cannotcheat. Byzantine behavior causes such algorithms tocollapse. Most recently, “on-demand” routing againstan oblivious adversary was suggested in [3]. In thismodel, an adversary cannot cheat and the pattern ofcheating and blocking is assumed to be oblivious, i.e.,this work does not handledynamicDOS attacks.

b) Reinforcement Learning:The “Swarm In-telligence” paradigm is an approach to routing indistributed networks of cooperative agents, inspiredby studying the process by which swarms of antsconverge to the optimal route to a food source byprogressively reinforcing the successful paths usingpheromone secretions. Interest in applications of ant-based routing in mobile ad-hoc networks (MANETs)has risen, and many recent papers have addressedthe subject [14], [4]. Gunes et al [15] considers anant-based approach to routing in MANETs, with acompletely reactive algorithm. Marwaha et al. [16]studies a hybrid approach using both AODV andreactive ant-based exploration. Baras et al [4] de-scribes a new algorithm that utilizes the inherentbroadcast nature of wireless networks to multicastcontrol and signaling packets (ants). ARAMA [14]uses an analogous approach. Work on the SwarmIntelligence paradigm is described in [13], [6], [20],[9], [7], [17], [18], [5], [10], [14], [4].

III. R ESEARCHCHALLENGES

c) Past performance is no guarantee of futuresuccess:The problem is that we are making decisionsin an online environment, where the online algorithmneeds to forward packets by selecting routes whilehaving only information regarding past packets, andno information regarding the future conditions. Wemake no assumptions about the adversary’s behavior

Page 3: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

3

or the sequence of fault patterns generated. Moreover,it is also assumed that there may be a powerful ad-versary generating the worst possible input sequencefor the online algorithm. A fundamental questionregarding online algorithms is how to evaluate theirperformance. It is rare, and in some cases outrightimpossible that one deterministic algorithmalwaysoutperforms another deterministic algorithm. Oneclassical method has been to assume a model wherethe future resembles the past (i.e. astochasticmodel)regarding packet losses, and compare the performanceof different algorithms on the same model. However,in an adversarial setting, there is no reason why theadversary should follow the rules of any particularstochastic model. If anything, a malicious adversarywill do exactly thethe oppositeof what our modelwould predict. (There is no reason to hope that theadversary will not know our model.)

One may be tempted to think that converging tothe best fixed route may be easily accomplished byutilizing a “greedy” heuristic: keep track of past errorrates on different links, and simply select the pathwith the smallest overall failure rate, namely the sumof the failure rates of its links. For example, existingwork on routing in overlay networks, such as RON[1] runs a greedy strategy for windows of a specificsize. The intuition is very strong: extrapolate the pastfailure pattern into the future. This may work if thefailure pattern is static or if there is a statisticalmodel of failures. However this method will fail quitespectacularly in the case of dynamic adversaries. Forexample, consider 100 options for choosing a relaypoint between the sender and receiver, which define100 different paths. At timei, pathi (modulo 100) isunder the control of the adversary, and is failing allpackets; at all other times the path is perfect. Thegreedy algorithm will always pick the worst path,even though at any moment in time only 1% of thepaths are faulty.

To see the principal flaw in this greedy strategy,consider the performance of an investor who tries tofollow the best performing stock on the stock market,with the naive assumption that “past performance is aguarantee of future results.” In fact, in an adversarialsetting (e.g. the stock market) it may very well be thecase that past performance correlatesnegativelywithfuture results, and algorithms must not be fooled intothis easy trap.

d) Our path selection must withstand competi-tive analysis: In general, there may not be an ideal

fault-free path, but yet we can define the best pathin terms of accumulated loss on that path. Our goalis to make path selections, such that our overallperformance is comparable to that of the “best path.”

Our goal is thus to find a robust randomized algo-rithm that works well on all inputs, in the sense thatthe expected behavior of our algorithm is comparableto the optimum fixed path on each input. The goal ofthis paper is to introduce novel routing algorithms forroute selection in an adversarial environment that areprovablyoptimal in the sense that the total number ofmessages lost in our algorithm exhibit a very smalladditive gap with respect to theoptimum prescientroute assignment. The optimum assignment is madewith complete knowledge of the adversary’s actionsin the past, present, and future, and has infinite com-putational power. The only restriction on the optimumroute is that it must change somewhat less frequently.The loss of performance of the algorithm that we areseeking should be based on competitive analysis [19].We compare the performance of our online algorithmsto the best static offline selection. Namely the offlinealgorithm selects a fixed path and uses it during alltime steps. Our results bound the difference (referredto in the literature asregret) between the cost of thebest static offline selection and our online algorithms.

For example, consider a wireless network with 200users, 1000 potential wireless links, and at leastonefixed path of 7 hops between sender and receiverthat has an average link fault rate of 0.01 %, i.e.99.99% of the time the whole path is reliable. Theonly problem is to select one out of around2007

possible paths, while only having information aboutpast experiments over these paths. In this case, onecan construct a counter-example in which the greedyalgorithm maynever succeed in delivering asinglemessage, i.e. it has a 100% fault rate, in spite ofalways selecting the best of the2007 paths so far!The explanation for this counter-intuitive fact is thatany deterministic algorithm can be easily fooled byan adversary, forcing it to pick a path that alwaysfails.

In contrast, the “competitive” randomized algo-rithm that we are seeking should be able to “zeroin” on the reliable path, or at least get comparableperformance. The algorithm we are seeking shouldguarantee, for any adversarial behavior (subject to theadversary’s “pledge” to keep some path of length 7hops being 99.99% reliable on each link), a fault rateof just below7·0.01% = 0.07% over long sequences.

Page 4: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

4

e) Randomness is not a guarantee of success:Another “quick fix” is to try to select a randompath. There is a misconception in the communitythat selecting one of many node-disjoint paths, orselecting a random path, will somehow guaranteesuccess. (The stock-market equivalent of this beliefis asserting that a stock index such as the NASDAQ100 cannot go down by much.) There is somethingappealing about random strategies in the sense thatthey allow the spreading of risk; what is wrong withthe above quick fix is that there is no attempt toadapt the probabilities after receiving feedback aboutdifferent paths. In fact, it is easy to generate counter-examples in which either one of these policies fails.For example, consider a faulty edge leading into adense subnetwork. If one generates paths at randomby assigning equal probability to traversing each edgein the network, one is pretty much guaranteed toselect this faulty edge.

f) Multiple paths do not guarantee success:Ithas been generally recognized that sending packetsalong multiple paths is more fault-tolerant than send-ing packets over a single path. However, it is easyto come up with a example involving a collection ofedge-disjoint paths (e.g., disjoint paths in an expandergraph where the error probability is constant) suchthat each path has at least one fault, and yet arandom path is likely to not have any fault with highprobability.

In conclusion, while different heuristics that canalleviate the problems caused by the adversaries doexist, the challenge is to utilize and combine thesetechniques in such a way that one can guaranteenear-optimal performance even against very powerfuladaptive adversaries

IV. PROPOSEDAPPROACH

A. Solution Outline

In our routing approach, we act as follows: theprocess of route detection and fault avoidance iscarried out by a distributed process of “learning” faultfree paths, in spite of deceptive techniques pursued byadversaries. The routing process creates and adjustsa probability distribution at each node for the node’sneighbors. The probability associated with a neighboris a local estimate of the the relative likelihood ofthat neighbor forwarding and eventually deliveringthe packet to the destination. The algorithm is similarto the one in [3]; however its analysis is different.

B. More details of our approach

The choice of the routing path used in the algo-rithm is best explained, moving backwards, from thedestination towards the source. It is also easiest toimagine a “fictional” distributed algorithm running atthe individual nodes, the purpose of which is to sendmessages to the source; in reality the whole route isdetermined by the source, and messages travel fromthe source toward the destination.

Imagine that each node is attempting to pick aparent edge towards the source. The set of all parentedges forms a tree rooted at the source. The packetsare sent on the unique path in this tree from the sender(the root) to the receiver, and are acknowledged bythe receiver. Each node on the path between thesender and the receiver sets a timer after forwardingthe packet. If an acknowledgement is not receivedbefore the timer expires, then that node assumes thepacket was lost and instead sends back a negativeacknowledgement to the source. Proper calibration ofthe timers leads to a single aggregated ack messagetraversing each link, reporting on the status of thedownstream portion of the path, i.e., we do nothave hop-by-hop acks resulting in overhead growinglinearly in the length of the path. Note that these ac-knowledgements are separate from acknowledgmentsused by upper-layer protocols (e.g. TCP) and are usedsolely for the purposes or our routing algorithm.

Using these acknowledgements, a lost packet de-composes the path into two parts: the part from thesource to the last node that succeeded in acknowl-edging the packet, and the rest of the path that failedto return an acknowledgement. We now adjust thechoice of parent edges as follows. The part of thepath that succeeds in acknowledging reinforces itsconfidence in the parent, while the other part of thepath reduces its confidence. The parent will be chosenprobabilistically based on the confidences acquired.The intuition is that confidence in the parent reflectsnot only the reliability of the link between child andparent, but also the fact that the parent is “intelligent”enough to pick the right (grand)-parent.

We wish to stress that these probabilistic confi-dence measures are being maintained and updated atthe source, not at the nodes in the interior of the path.Upon receiving a positive ack from the destinationor a negative ack from an intermediate node, thesource has the full picture of both portions of thepath and emulates the adjustments of probabilities

Page 5: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

5

on behalf of the nodes on the path. Thus, counter-intuitively, it will often be the case that nodes beyondthe failing edge have never seen some of the packetsthat were destined for them, and are not aware thattheir “confidence” in their parent is being degraded.

By using a probabilistic distribution over the parentedges we generate a probability distribution over allsource-rooted trees, such that the probability of eachtree grows exponentially as a function of its perfor-mance. This discriminates against edges which arefrequently controlled by the adversary, and reinforcesedges where the adversary is very often absent.

Now, we proceed step by step in explaining thedifferent phases taken by our algorithm to assignprobabilities to each edge of the network graph.

g) Transforming a graph into a layered directedgraph: The first step is to transform the originalundirected network graphG(V,E) (e.g., see Fig. 1,2)into a directed layered graph, with the destinationbeing in layer 0 of this graph while the source isat the other “end” of the graph. Note that the sameprocess is repeated for each destination.

s

a

r

b

c

Fig. 1. The original network w.r.t. receiverr (in layer 0)

s3 s

2

c2

a3 a

2

b2

s4

s1

c1

r0

Fig. 2. The Layered graph w.r.t. receiverr (in layer 0)

Informally, all we are doing here is inserting intoeach packet a TTL (time-to-live) counter, and usinga separate routing table for each destination and each

TTL counter value. We will say that the packet arrivedto a node in layeri if it has TTL value equal toihops. Thus, all the nodes that canpotentially reachthe destination ini hops (or less), for0 ≤ i ≤ H arerepresentedin layer i of the graph (e.g., that’s whynode b is not on layer1 in Fig. 1). HereH is theupper bound on hop count of a routing path (i.e. theoriginal TTL value). The directed edges connect therepresentative nodes in layeri to representative nodesin layer i−1. Suppose that a packet starts at sources,and while traversing the network carries a hop count.When the packet arrives to nodev after traversingi hops, we consider the packet as if it arrives at“virtual” node vH−i.

The reader can easily see that a directed leveledgraphG′ of depthH simulates any communicationwhere packets traverse a bounded number of hopsH ≤ |V |. Thus, for the rest of this description,without loss of generality, we consider our networkgraph to be leveled, directed, acyclic, and we considerevery node to be reachable from the source.

h) Establishing a feedback mechanism for thepackets :As we already said, we request that eachpacket be acknowledged using a secure acknowledge-ment scheme by the destination. The purpose of theseackowledgements is to identify nodes that drop pack-ets. The adversary can definitely interfere with eitherthe packet or the ack propagation process. However,any adversarial action can be summarized as follows:the source has received a negative acknowledgementfrom some intermediate node, indicating the down-stream edge on the path (we call it theseparator)that appears to have failed. All the edges on the pathfrom sender to receiver prior to the separator arecalledlucky. All the edges on the path from sender toreceiver beyond the separator edge are calledunlucky.

i) Quantitatively ranking the incoming edges:Once the source receives a NACK from a node, itviews this as a positive ack from all edges prior tonode sending NACK, and as a NACK from all theedges beyond this node. As in [3], the source ranksthe edges based on the percentage of “luck” theybring. Luck is the ratio of positive acks from the edge,and the total number of acks (positive and negative)associated with this edge. Each node will be selectingthe best edge in terms of “luck” on its path towardsthe sender. Let us call such edge a “parent edge”.The set of parent edges forms a routing tree rootedat the source. In reality, routing from source to thedestination is performed over the tree from its root

Page 6: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

6

(source) towards the destination, from parents to theirchildren on the path.

The next question is how to set edge weights as afunction of each edge’s “luck”. The greedy strategy(which does not work) would be to set edge weightsin decreasing order of luckiness. Instead, it turns outthat setting edge weights based on theexponent ofluckiness— i.e. assigning weightβx, whereβ is aparameter between0 and 1, and x is the numberof times that an edge has been unlucky — willessentially yield optimal performance. The value ofβ is crucial for the performance of our algorithm.Low values of β result in our decision sequenceresembling that of the greedy algorithm, and thuswill be vulnerable to adversarial attacks. On the otherhand, values ofβ which are close to 1 will cause thealgorithm to respond slowly to changes. The optimalvalue ofβ can be determined using the “best expert”framework from machine learning (see [12], [8]).The mathematical analysis in Section VI indicatesthat the optimal value ofβ for withstanding arbitraryadversarial attacks is essentially1− 3

√m/HT , where

m is the number of edges in the network,H is anupper bound on the path length, andT is the lengthof time that the protocol has been running.

j) Choosing a random path from the source:When the source wishes to select a path to the desti-nation along which to route packets, it emulates thefollowing process in the layered graph: the destinationat layer 0 selects a parent node in layer 1 randomlyaccording to its probability distribution on incomingedges. This node, in turn, uses its probability distri-bution on incoming edges to sample a parent nodein layer 2, and so on up the layers of the graphuntil the source is reached. The path from source todestination is chosen to be the reverse of this path. Anequivalent (but less computationally efficient) way ofsampling this random path would be to haveeverynode in the layered graph sample a random parentusing its probability distribution, resulting in a treeof edges directed outward from the source; the sourcethen designates a path to the destination which is theunique such path contained in this directed tree.

k) Sampling unexplored territory:A subtle butimportant component of the algorithm is its ability tosample edges in the network, since this is the onlyway it is able to detect changes in the adversarialfault pattern, and to sample relatively “stale” portionsof the network. This means that the algorithm isoccasionally (and probabilistically) deviating from the

optimal routes in order to make sure that feedback isobtained from each edge in the network.

This deviation is accomplished by forcing thealgorithm to pass, with a small probability whichis completely under our control, over a completelyrandom edgee = (u, v) in the layered graph, usinga different path-sampling process. The portion of thepath from the source toe is computed as above, i.e.we start atu and work our way backward to thesource, level by level, using the designated probabilitydistribution at each node along the way to samplethe next edge. The portion of the path frome to thedestination may be taken to be an arbitrary path, e.g.a fixed route determined at the outset.

Let us give some intuition on why this is desirable.By allowing β to be small the algorithm will movequickly to the least fault path, but will only utilizeeven slightly less reliable paths with low probability.In order to ensure that the algorithm is making correctdecisions, it continuously needs to gain up-to-datefeedback on the other paths in the network. Thisis accomplished by random sampling. If every datapacket were used as a sampling packet, meaningthe sampling rate was 100%, the algorithm wouldhave extremely high loss rates from exploring faultypaths, but would have extremely accurate estimatesof the current reliability of paths in the network. Onthe other hand, if the algorithm only samples pathswith 1% of its traffic, then it will be slightly slowerin detecting changes, but will be sending a higherpercentage of its packets over links that do not fail.

V. SPECIFICATION OF THE ALGORITHM

Our algorithm is a variant of the algorithm in [3]for adapting to a reliable network path. We will usethe following notations. We are given a directed graphwith specified senders and receiverr. Time steps runfrom 1 to T and are denoted byt. Cost functionsCt :E → {0, 1} are specified by an adaptive adversary.The interpretation ofCt is that1 represents an edgefailure, 0 represents an edge which does not fail. Ineach time step the algorithm chooses a pathπt from sto r and is charged a cost of1 if any edge of this pathfailed, 0 otherwise. The algorithm receives feedbackregarding the location of the first edge failure (i.e. theone nearest tos) if there were any edge failures.

As explained above, we first transform the networkgraph into a leveled directed graphG = (V, E) withsenders and receiverr, and withH levels altogether,

Page 7: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

7

as described in section IV-B. The algorithm runs inphases of lengthτ ; later we will specify τ so asto optimize the regret. (The optimal value ofτ willbe roughly 3

√(m/H)2T .) A typical phase will be

denoted byφ.

For each vertexv, the algorithm maintains a blackboxBEX(v), which runs an online learning algorithm,namely the Weighted Majority Algorithm of [12].(For pseudocode, see Figure 5.) This algorithm de-pends on a parameterβ = 1 − ε, whereε will bespecified later after the analysis of the algorithm.(The optimum ε will be roughly 3

√m/HT .) The

black boxBEX(v) outputs a probability distributionpφ on the incoming edges ofv during each phaseφ.This probability distribution does not change duringthe phase. (Roughly speaking,pφ(e) is proportionalto βc(e) where c(e) is the cost of edgee in thephases precedingφ.) At the end of a phase,BEX(v)receives as input a vector containing a “simulatedcost” Cφ(e) ∈ [0, 1] for each incoming edgee ofv. The black box satisfies the following performanceguarantee:∑

φ

∑e

pφ(e)Cφ(e) ≤∑

φ

Cφ(e0)+O

(εT

τ+

log(∆)ε

).

(1)(Here ∆ is the in-degree ofv, and e0 denotes anyfixed incoming edge ofv.) Roughly speaking, thismeans that the random edges output byBEX(v) per-form nearly as well as any single incoming edgee0, ifperformance is measured according to thesimulatedcost Cφ.

Every time stept within a phaseφ is classifiedas either asampling stepor an exploitation step,according to the outcome of an independent ran-dom coin toss with probabilityδ of determining asampling step. In an exploitation step each vertexv independently chooses an incoming edge usingthe distribution specified byBEX(v); this edge setdefines an arborescence directed away froms, and wechoose the path joinings to r in this arborescence.In a sampling step, we choose an edgee = (v, w)in G uniformly at random. We choose a randomarborescence directed away froms using the samerule as in an exploitation step. We take the path inthis arborescence froms to v, join it with the edgee,and the join it with “our favorite” path fromw to r.(Our favorite path may be, for example, the path fromw to r in some fixed arborescence directed towardr.The only important thing is that this path does notdepend one, only onw.) The sampling step is judged

to be “successful” if there were no edge failures onthe sub-path froms to w, otherwise unsuccessful.

The simulated cost of an edgee in a phaseφ is thenumber of unsuccessful sampling steps for that edge,divided by the expected total number of samplingsteps. To express this symbolically in terms of aformula, let At be the random arborescence chosenat timet using the black-box at each vertex, letπt(v)denote the path froms to v in this arborescence, andlet Ct(v) denote the maximum ofCt(e) over all edgesin πt(v). Finally let χt(e) denote the event thatt isa sampling step fore and letσt(e) denote the eventthat t is an unsuccessful sampling step fore. Thenfor an edgee = (v, w),

Cφ(e) =∑

t∈φ σt(e)δτ/m

.

The pseudo-code for all steps in the algorithm isspecified in Figures 3, 4, 5.

for φ = 0, 1, . . . , bT/τcfor t = τφ + 1, . . . , τ(φ + 1)

/* Choose routing paths for phaseφ. */Call ROUTE procedure to obtain pathπt

and distinguished edgeet.π+

t ← ACK’ed sub-path ofπt.if et 6= null andet 6∈ π+

t

σt(et) ← 1σt(e) ← 0 for all other edgese.

endfor e ∈ E, /* Compute simulated edge costs. */

Cφ(e) ← (∑

t∈φ σt(e)/(δτ/m).for (u, v) ∈ E /* Update edge probabilities */

νφ+1(u, v) ← BEXφ(v)[u, v].end /* End main loop */

Fig. 3. Routing algorithm.

VI. A NALYSIS

A. Overview

Our goal is to prove, foreveryvertex v, that therandom pathπt(v) is nearly the optimum path froms to v, up to a regret term which depends on thedistance froms to v. This claim will be establishedby induction on the distance froms to v, the basecases = v being trivial.

In order for this idea to work, we need a provablerelationship between our simulated costs and the truecosts. Proving this in the context of an adaptiveadversary is trickier than in the oblivious case, which

Page 8: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

8

Function ROUTE

TR ← a fixed tree inG directed towardr.for v ∈ V

Sample a random incoming edgein(v) usingprobabilitiesνφ(·, v).

TS ← {in(v) : v ∈ V }.Flip coin with δ probability of heads.if heads /* Exploration */

Chooseet = (v, w) uniformly at random fromE;Π(v) ← the path froms to v in TS ;Π(w) ← the path fromw to r in TR;πt ← Π(v) · (v → w) · Π(w)

else/* Exploitation */et ← null ;πj ← the path froms to r in TS ;

return (πt, et)

Fig. 4. Path generation procedureROUTE with parameterδ.

Function BEXφ(v)[u, v]/* Sum weights of incoming edges tov. */W ← ∑

(w,v)∈E βCφ(w,v)

/* Return probability of edge(u, v). */return βCφ(u,v)/W

Fig. 5. BEXφ(v): Black box BEXφ(v) with parameterβ usingscoresCφ(v, w); picks edge(v, u) with prob. νφ(v, u).

explains why we have a subtler definition ofCφ thanin [3].

The bound on the algorithm’s performance is ex-pressed in terms of two notions of cost for a pathπfrom s to r. For such a path, we defineBCOSTt(π)to be 1 if any edge ofπ failed at time t, and 0otherwise. We defineACOSTt(π) to be the totalnumber of edge failures onπ at time t. Finally, wedefine

BCOST (π) =1T

T∑

t=1

BCOSTt(π)

ACOST (π) =1T

T∑

t=1

ACOSTt(π).

Thus,BCOST (π) measures the average number oftimes the pathπ failed, andACOST (π) measuresthe average cumulative number of edge failures onπ.

Theorem 6.1:For appropriate choices of the pa-rametersτ, ε, δ, the algorithm’s performance satisfies

the bound

E(BCOST (algorithm)) ≤ E(ACOST (π))

+ O

(H2m log(mT ) log(∆)

T

)1/3 .

Note that this implies that forT sufficientlylarge, i.e. for T = ω(H2m log(mT ) log(∆)), theactual cost of the algorithmE(BCOST (algorithm)will only be marginally larger than the benchmarkcostE(ACOST (π)). The following subsections willprove the theorem, and Section VII will examine whatthis asymptotic bound means in practice.

B. Characterization ofCφ

For a random variableY , let Et(Y ) denote theconditional expectation ofY , conditioned on all ofthe algorithm’s random decisions before timet.

Lemma 6.2:For each edgee = (v, w) and eachtime stept, Et(σt(e)) ≤ δ

m [Et(Ct(e)) + Et(Ct(v))] .

Proof: From the definition ofσt(e) we have

σt(e) = χt(e) max{Ct(e), Ct(v)}.Hence

Et(σt(e))

= Et(χt(e)) ·Et(max{Ct(e), Ct(v)} ‖χt(e) = 1)

≤ (δ/m)Et(Ct(e) + Ct(v) ‖χt(e) = 1).

The lemma now follows, because the random vari-ablesCt(e) andCt(v) are independent ofχt(e). In thecase ofCt(e), this is simply because the algorithm’sdecision of whether or nott will be a sampling stepfor e is independent of the adaptive adversary’s choiceof edge costs at timet. In the case ofCt(v), it isalso because the random pathπv is sampled from adistribution which does not depend on whethert is asampling step for edgee.

Corollary 6.3:

τE(Cφ(e)) ≤ [E(Ct(e)) + E(Ct(v))] .

Lemma 6.4:For a vertexv, let E−(v) be the setof incoming edges ofv. For a time stept in phaseφ,

Et(Ct(πt(v))) =m

δ

e∈E−(v)

pφ(e)Et(σt(e))

.

Proof: Both sides are equal to the probability ofan edge failure on a random path sampled accordingto the following rule: choose a random incoming edge

Page 9: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

9

at v according to the probability distributionpφ, let ube the other endpoint of this edge, and join the edgeto the random pathπt(u).

The other thing we need to know aboutCφ beforecontinuing with the analysis is that it is boundedabove by a constant, with high probability.

Lemma 6.5:If τ ≥ m log(mT )/δ then with prob-ability at least1− 1/mT , Cφ(e) < 5 for all edgeseand all phasesφ.

Proof: Under the hypothesis onτ the expectednumber of sampling steps for any edge in any phaseis at leastlog(mT ). The true number of samplingsteps for edgee in phaseφ is a sum of independentBernoulli random variables, so Chefnoff’s bound tellsus that

Pr

t∈φ

χt(e) > 5 log(mT )

<[e4/55

]log(mT )< e−2 log(mT ) = (1/mT )2.

Using the fact thatσt(e) ≤ χt(e), this immediatelygives an upper bound of(1/mT )2 on the probabilitythat Cφ(e) > 5. Summing over all pairs(e, φ), weobtain the result stated in the lemma.

In the following analysis, we will assume through-out thatCφ(e) is bounded above by 5. The probabilitythat this assumption fails is< 1/mT , and in the eventthat it fails the total cost of all paths is at mostT , sothis event contributes at most1/m to the algorithm’sexpected cost and may therefore be ignored.

C. Local performance guarantee

The induction step of the analysis is a local perfor-mance guarantee relating the average cost ofπt(w)to the average cost ofπt(v), wherev is an upstreamneighbor ofw in G.

Lemma 6.6:If e = (v, w) is any edge ofG, then

1T

T∑

t=1

E(Ct(πt(w))) ≤ 1T

T∑

t=1

E(Ct(πt(v)))

+ 1T

∑Tt=1 E(Ct(e)) + O

(ε + τ log(∆)

εT

).

Proof: The performance guarantee forBEX(w)ensures that

τ

T

φ

e∈E−(w)

pφ(e)Cφ(e)

≤ τ

T

φ

Cφ(e0) + O

(ε +

τ log(∆)εT

).

Take the expectation of both sides of this inequality.Using Lemma 6.4, the expectation of the left side is

τ

T

T∑

t=1

e∈E−(w)

pφ(e)E(σt(e))

m

τδ

=1T

(T∑

t=1

E(Ct(πt(w)))

).

Using Corollary 6.3, the expectation of the right sideis bounded above by

1T

T∑

t=1

E(Ct(πt(v)))

+1T

T∑

t=1

E(Ct(e)) + O

(ε +

τ log(∆)εT

).

D. Global performance guarantee

Let π be any static path froms to r. Denote thenodes of π by s = v0, v1, . . . , vH = r, and letπj denote the subpath ofπ joining s to vj . Usinginduction on j, Lemma 6.6 implies the followingregret bound for eachj:

1T

T∑

t=1

E(Ct(πt(vj))) (2)

≤ 1T

T∑

t=1

∑e∈πj

E(Ct(e)) + O

(εj +

τ log(∆)jεT

).

Applying (2) for j = H, and adding in the expectedcost of the sampling steps, which is bounded aboveby δT , we obtain

E(BCOST (algorithm)) ≤ E(ACOST (π))

+ O

(εH +

τ log(∆)HεT

+ δ

).

Recall that τ is constrained to be at leastm log(mT )/δ. We optimize the right side by setting

ε =(

m log(mT ) log(∆)HT

)1/3

δ = εH

τ = m log(mT )/δ

which leads to the bound asserted in Theorem 6.1.

Page 10: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

10

VII. I MPLEMENTATION RESULTS

In order to substantiate the claims made in thiswork, simulations were conducted to investigate theconvergence time of the algorithm. The simulationswere conducted by developing a simple programwhich would simulate the decision making processof the algorithm and examine its performance againstadversarial inputs. The simulation consisted of asource selecting a path to the destination at eachunit of time. An adversarial model would then selectwhich nodes at the current unit of time were faulty.The packet would traverse the graph and receivepositive feedback from the destination if there wereno faulty nodes on the path, or from the last non-faulty node before the packet was dropped. Using thisfeedback the algorithm would adjust its probabilitiesand compute a new path for the next packet.

Source Destination

0.05

0.10 0.04

0.03

1 3

5

42

0

Fig. 6. Simple Simulation Topology

Beta = 0.5 Sample Rate = 0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Pe

rce

nt

on

Lin

k 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Fig. 7. Simple Configuration Results

A. Simple Configuration

The first set of simulations consisted of 10,000packets which were sent from the source to thedestination. The nodes were arranged as indicatedin figure 6. The intermediary nodes are labelled 0

Beta = 0.1 Sample Rate = 0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Pe

rce

nt

on

Lin

k 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Fig. 8. Simple Configuration Results

Beta = 0.05 Sample Rate = 0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Perc

en

t o

n L

ink 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Fig. 9. Simple Configuration Results

through 5 and their fault rates are indicated. At everyunit of time each node was probabilistically selectedto fail based on the nodes fault rate. On this particularsimple configuration the optimal path would be fromthe source to node 1, from node 1 to node 3, andthen from node 3 to node 5. The results of thesimulation are the sources estimated link preferencemetrics at every unit of time. The experiment wasrun for different values ofβ, which is the base of theexponent described in the algorithm specification. Asthe value ofβ decreases, the algorithm responds fasterto changes and is able to converge faster. However, asthe algorithm responds faster it begins to resemble thegreedy algorithm which has vulnerabilities. In orderto explore the effects ofβ on the convergence timeof the algorithm, multiple experiments were run withthe same adversarial input. The results are indicated inFigures 7,8, and 9. These figures show the probabilityof the source selecting a specific edge at every unitof time.

With β=0.5 the results show that the source is

Page 11: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

11

Beta = 0.05 Sample Rate = 0.01

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001 10001

Packets Sent

Perc

en

t o

n L

ink 0 to 1

0 to 2

1 to 3

1 to 4

2 to 3

2 to 4

3 to 5

4 to 5

Fig. 10. Simple Configuration Results

slowly realizing which path is correct and is sendinga majority of its traffic over the correct sequence ofnodes. Notice that once the algorithm converged itwas sending approximately 90% of its traffic acrossthe first hop to node 1 which was experiencing a 5%loss rate and only 10% of its traffic to node 2 whichhad a 10% loss rate. While this seems reasonable,notice the convergence whenβ=0.1 or better yet 0.05.With these values the source is able to completelydifferentiate between nodes and converge quickly onthe best path. Since the algorithm is continuouslyexploring sub-optimal paths with a small fraction ofits traffic, this low value ofβ allows it to react quicklyas the adversary changes its fault pattern. While theoptimal algorithm might know the fault pattern aheadof time, the algorithm we present is able to followit very closely. If the speed at which the adversarymoves is slightly slower then our decision makingprocess, then our algorithm would in fact follow theoptimal at every step.

In order to investigate the effects of the samplingpercentage on the convergence time an additionalexperiment was done using the same simple con-figuration described above, but with the samplingrate decreased from 10% to 1%. The results of thisexperiment are indicated in Figure 10. It appearsthat the lower sampling rate inhibits the algorithmsability to converge as well as the 10% sampling rateindicated in the Figure 9.

B. Large Configuration

The previous example provided a demonstration ofthe effects of various parameters on the performanceof our algorithm. While the previous example showedthe convergence of the algorithm, it is somewhat less

0

4

5

7

8

9

10

11 14

16

17

19

20

21

22

23

24

25

1

2

3 6 15

13

DestinationSource

12 18

Fig. 11. Large Simulation Topology

Beta=0.05 Sample Rate=0.01

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001

Packets Sent

Perc

en

t o

n L

ink

Fig. 12. Large Simulation Results

interesting since the example consisted of a small setof both nodes and links. In this second example wewill consider a network with 25 nodes forming a 10layered graph, with 3 nodes at each layer (exceptat the source and destination). The topology of thenetwork is indicated in Figure 11. In this examplethere is exists an optimal path from the source to thedestination which experiences no loss. This optimalpath is indicated in Figure 11 by the bold line. Allother links in the network exhibit 10% loss, meaningthat when we sample them they will successfullyforward the packet 90% of the time. As a result,this should make it more difficult for our algorithmto discover the optimal path since the non-optimalpaths in the network only experience marginal loss.This simulation consisted of a source attempting todeliver 10,000 packets to the destination. The graphsin Figures 12 and 13 show the results when we areperforming random sampling with 1% and 10% ofour packets respectively. In this experimentβ equalto 0.05 was used.

The results indicate that the algorithm is ableto successfully converge on the optimal path afterapproximately 1000 packets are sent. Once the al-gorithm learned the best path it was able to send ap-proximately 99% of its traffic successfully to the des-tination. The graph visually indicates this by showingthe source’s link preferences at every unit of time.

Page 12: BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT ... · 1 ProvablyCompetitiveAdaptiveRouting BaruchAwerbuch(JHU)DavidHolmer(JHU)RobertKleinberg(MIT)HerbertRubens(JHU) Abstract—An

12

Beta=0.05 Sample Rate=0.10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1001 2001 3001 4001 5001 6001 7001 8001 9001

Packets Sent

Pe

rce

nt

on

Lin

k

Fig. 13. Large Simulation Results

When the simulation begins the source considers allof the links in the network to be equal and then learnstheir reliability by sending traffic across the linksand receiving feedback. As the number of packets(or trials) increases, the source’s knowledge of thenetwork continuously becomes more accurate. This isevident as the reliable paths become separated fromthe less reliable paths and are selected with near 100%probability. Since the source is continuously samplingthe less desirable edges it is able to respond quicklyto changes in the adversarial fault pattern.

CONCLUSION

Through mathematical analysis and simulation re-sults we have presented an online adaptive routingalgorithm and have shown the algorithm’s compet-itive performance under a strong adversarial modelconsisting of dynamic proactive adversarial attacks,where the network may be completely controlled byadaptive adversaries.

The results of this work affirm the validity of ourapproach and motivate the need for future work in thisdirection. We intend on implementing this protocol ina more realistic simulation environment and exploringthe effects of both mobility and more sophisticatedactive adversarial attacks on the algorithm.

REFERENCES

[1] David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek,and Robert Morris. Resilient overlay networks. InProceed-ings 18th ACM SOSP, Banff, Canada, 10 October 2001.

[2] Baruch Awerbuch, Dave Holmer, Cristina Nita-Rotaru, andHerb Rubens. An on-demand secure routing protocolresilent to byzantine failures. InWireless Security WorkshopProceedings, September 2002.

[3] Baruch Awerbuch and Yishay Mansour. Online learning ofreliable network paths. InPODC, 2003.

[4] John S. Baras and Harsh Mehta. Dynamic adaptive routingin manets. InProc. Annual ARL CTA Symposium, 2003.

[5] E. Bonabeau, F. Henaux, S. Guerin, D. Snyers, P. Kuntz,and G. Theraulaz. Routing in telecommunications networkswith “smart” ant-like agents. InIntelligent Agents forTelecommunications Applications ’98 (IATA’98), 1998.

[6] G. Di Caro and M. Dorigo. AntNet: a mobile agentsapproach to adaptive routing. Technical Report IRIDIA/97-12, Universite Libre de Bruxelles, Belgium, 1997.

[7] Gianni Di Caro and Marco Dorigo. Antnet: Distributedstigmergetic control for communications networks.Journalof Artificial Intelligence Research, 9:317–365, 1998.

[8] Nicolo Cesa-Bianchi, Yoav Freund, David P. Helmbold,David Haussler, Robert E. Schapire, and Manfred K. War-muth. How to use expert advice. InProc. 25th ACM Symp.on Theory of Computing, pages 382–391, 1993. To appear,Journal of the Association for Computing Machinery.

[9] P. Druschel D. Subramaniam and J. Chen. Ants andreinforcement learning : A case study in routing in dynamicnetworks. InProceedings of IEEE MILCOM, Atlantic City,NJ, 1997.

[10] Marco Dorigo, Vittorio Maniezzo, and Alberto Colorni.The Ant System: Optimization by a colony of cooperatingagents. IEEE Transactions on Systems, Man, and Cyber-netics Part B: Cybernetics, 26(1):29–41, 1996.

[11] Adam Kalai and Santosh Vempala. Efficient algorithms forthe online decision problem. InProc. of 16th Conf. onComputational Learning Theory, Wash. DC, 2003.

[12] Nick Littlestone and Manfred K. Warmuth. Theweighted majority algorithm.Information and Computation,108:212–261, 1994. A preliminary version appeared inFOCS 1989.

[13] M. Littman and J. Boyan. A distributed reinforcementlearning scheme for network routing. InProceedings of theInternational Workshop on Applications of Neural Networksto Telecommunications. Alspector, J., Goodman, R. andBrown, T. X. (Ed.), pages 45–51, 1993.

[14] U. Sorges M. Gunes and I. Bouazizi. The ant colony basedrouting algorithm for manets. InProc. of the 2002 ICPPWorkshop on Ad Hoc Networks (IWAHN 2002), pages 79–85. IEEE Computer Society Press, 2002.

[15] U. Sorges M. Gunes and I. Bouazizi. “ara -” the ant colonybased routing algorithm for manets. InProc. 2002 ICPPWorkshop on Ad Hoc Networks (IWAHN 2002), pages 79–85. IEEE Computer Society Press Stephan Olariu edt., 2002.

[16] C. K. Tham S. Marwaha and D. Srinivasan. Mobile agentsbased routing protocol for mobile ad hoc networks. IninProceedings of IEEE Globecom,, 2002.

[17] R. Schoonderwoerd, O. E. Holland, and J. L. Bruten. Ant-like agents for load balancing in telecommunications net-works. In Proc. 1st ACM Intl. Conference on AutonomousAgents, Marina del Rey, California, pages 209–216, 1997.

[18] Ruud Schoonderwoerd, Owen E. Holland, Janet L. Bruten,and Leon J. M. Rothkrantz. Ant-based load balanc-ing in telecommunications networks.Adaptive Behavior,5(2):169–207, 1996.

[19] Sleator and Tarjan. Amortized efficiency of list update andpaging rules.Comm. ACM, 28(2):202–208, 1985.

[20] Devika Subramanian, Peter Druschel, and Johnny Chen.Ants and reinforcement learning: A case study in routingin dynamic networks. InIJCAI (2), pages 832–839, 1997.

[21] Eiji Takimoto and Manfred K. Warmuth. Path kernels andmultiplicative updates. InCOLT Proceedings, 2002.