thefengs.comCreated Date: 1/9/2004 5:58:00 PM

18
1 The BLUE Active Queue Management Algorithms Wu-chang Feng, Member, ACM, Kang G. Shin, Fellow, IEEE and ACM, Dilip D. Kandlur, Member, IEEE, Debanjan Saha, Member, IEEE Abstract—In order to stem the increasing packet loss rates caused by an exponential increase in network traffic, the IETF has been considering the deployment of active queue management techniques such as RED [14]. While active queue management can potentially reduce packet loss rates in the Internet, we show that current tech- niques are ineffective in preventing high loss rates. The inherent problem with these queue management algorithms is that they use queue lengths as the indicator of the severity of congestion. In light of this observation, a fundamentally different active queue manage- ment algorithm, called BLUE, is proposed, implemented and eval- uated. BLUE uses packet loss and link idle events to manage con- gestion. Using both simulation and controlled experiments, BLUE is shown to perform significantly better than RED both in terms of packet loss rates and buffer size requirements in the network. As an extension to BLUE, a novel technique based on Bloom filters [2] is described for enforcing fairness among a large number of flows. In particular, we propose and evaluate Stochastic Fair BLUE (SFB), a queue management algorithm which can identify and rate-limit non-responsive flows using a very small amount of state informa- tion. I. I NTRODUCTION It is important to avoid high packet loss rates in the Internet. When a packet is dropped before it reaches its destination, all of the resources it has consumed in transit are wasted. In extreme cases, this situation can lead to congestion collapse [19]. Improving the congestion con- trol and queue management algorithms in the Internet has been one of the most active areas of research in the past few years. While a number of proposed enhancements have made their way into actual implementations, connec- tions still experience high packet loss rates. Loss rates are especially high during times of heavy congestion, when a large number of connections compete for scarce network Wu-chang Feng is now with the Department of Computer Science and Engineering, Oregon Graduate Institute at OHSU, Beaverton, OR (email: [email protected]); Kang G. Shin is with Real-Time Com- puting Laboratory, Department of Electrical Engineering and Com- puter Science, The University of Michigan, Ann Arbor, MI 48109- 2122 (email: [email protected]); Dilip Kandlur is with IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 (email: kand- [email protected]); Debanjan Saha is with Tellium, Inc., 185 Monmouth Park Highway, Route 36, P.O. Box 158, West Long Branch, NJ 07764 (email: [email protected]). Wu-chang Feng and Kang G. Shin were supported in part by an IBM Graduate Fellowship and the ONR under Grant No. N00014-99-1-0465, respectively. bandwidth. Recent measurements have shown that the growing demand for network bandwidth has driven loss rates up across various links in the Internet [28]. In order to stem the increasing packet loss rates caused by an ex- ponential increase in network traffic, the IETF is consid- ering the deployment of explicit congestion notification (ECN) [12, 30, 31] along with active queue management techniques such as RED (Random Early Detection) [3, 14]. While ECN is necessary for eliminating packet loss in the Internet [9], we show that RED, even when used in conjunction with ECN, is ineffective in preventing packet loss. The basic idea behind RED queue management is to detect incipient congestion early and to convey conges- tion notification to the end-hosts, allowing them to re- duce their transmission rates before queues in the network overflow and packets are dropped. To do this, RED main- tains an exponentially-weighted moving average of the queue length which it uses to detect congestion. When the average queue length exceeds a minimum threshold (min th ), packets are randomly dropped or marked with an explicit congestion notification (ECN) bit. When the average queue length exceeds a maximum threshold, all packets are dropped or marked. While RED is certainly an improvement over tradi- tional drop-tail queues, it has several shortcomings. One of the fundamental problems with RED and other active queue management techniques is that they rely on queue length as an estimator of congestion. 1 . While the pres- ence of a persistent queue indicates congestion, its length gives very little information as to the severity of conges- tion, that is, the number of competing connections sharing the link. In a busy period, a single source transmitting at a rate greater than the bottleneck link capacity can cause a queue to build up just as easily as a large number of sources can. From well-known results in queuing theory, it is only when packet interarrivals have a Poisson distri- 1 We note that at the time of initial publication [10], BLUE was the only active queue management which did not use queue length. Subse- quent algorithms [1, 17, 20] have also shown the benefits of decoupling queue length from congestion management

Transcript of thefengs.comCreated Date: 1/9/2004 5:58:00 PM

Page 1: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

1

TheBLUE Active Queue ManagementAlgorithms

Wu-chang Feng,Member, ACM, Kang G. Shin,Fellow, IEEE and ACM, Dilip D. Kandlur,Member, IEEE, Debanjan Saha,Member, IEEE

Abstract—In order to stem the increasing packet loss rates causedby an exponential increase in network traffic, the IETF has beenconsidering the deployment of active queue management techniquessuch asRED [14]. While active queue management can potentiallyreduce packet loss rates in the Internet, we show that current tech-niques are ineffective in preventing high loss rates. The inherentproblem with these queue management algorithms is that they usequeue lengths as the indicator of the severity of congestion. In lightof this observation, a fundamentally different active queue manage-ment algorithm, called BLUE, is proposed, implemented and eval-uated. BLUE uses packet loss and link idle events to manage con-gestion. Using both simulation and controlled experiments,BLUE

is shown to perform significantly better than RED both in terms ofpacket loss rates and buffer size requirements in the network. Asan extension toBLUE, a novel technique based on Bloom filters [2]is described for enforcing fairness among a large number of flows.In particular, we propose and evaluate Stochastic FairBLUE (SFB),a queue management algorithm which can identify and rate-limitnon-responsive flows using a very small amount of state informa-tion.

I. I NTRODUCTION

It is important to avoid high packet loss rates in theInternet. When a packet is dropped before it reaches itsdestination, all of the resources it has consumed in transitare wasted. In extreme cases, this situation can lead tocongestion collapse [19]. Improving the congestion con-trol and queue management algorithms in the Internet hasbeen one of the most active areas of research in the pastfew years. While a number of proposed enhancementshave made their way into actual implementations, connec-tions still experience high packet loss rates. Loss rates areespecially high during times of heavy congestion, when alarge number of connections compete for scarce network

Wu-chang Feng is now with the Department of Computer Scienceand Engineering, Oregon Graduate Institute at OHSU, Beaverton, OR(email: [email protected]); Kang G. Shin is with Real-Time Com-puting Laboratory, Department of Electrical Engineering and Com-puter Science, The University of Michigan, Ann Arbor, MI 48109-2122 (email: [email protected]); Dilip Kandlur is with IBM T. J.Watson Research Center, Yorktown Heights, NY 10598 (email: [email protected]); Debanjan Saha is with Tellium, Inc., 185 MonmouthPark Highway, Route 36, P.O. Box 158, West Long Branch, NJ 07764(email: [email protected]). Wu-chang Feng and Kang G. Shin weresupported in part by an IBM Graduate Fellowship and the ONR underGrant No. N00014-99-1-0465, respectively.

bandwidth. Recent measurements have shown that thegrowing demand for network bandwidth has driven lossrates up across various links in the Internet [28]. In orderto stem the increasing packet loss rates caused by an ex-ponential increase in network traffic, theIETF is consid-ering the deployment of explicit congestion notification(ECN) [12, 30, 31] along with active queue managementtechniques such as RED (Random Early Detection) [3,14]. While ECN is necessary for eliminating packet lossin the Internet [9], we show that RED, even when used inconjunction withECN, is ineffective in preventing packetloss.

The basic idea behind RED queue management is todetect incipient congestionearly and to convey conges-tion notification to the end-hosts, allowing them to re-duce their transmission rates before queues in the networkoverflow and packets are dropped. To do this, RED main-tains an exponentially-weighted moving average of thequeue length which it uses to detect congestion. Whenthe average queue length exceeds a minimum threshold(minth), packets are randomly dropped or marked withan explicit congestion notification (ECN) bit. When theaverage queue length exceeds a maximum threshold, allpackets are dropped or marked.

While RED is certainly an improvement over tradi-tional drop-tail queues, it has several shortcomings. Oneof the fundamental problems with RED and other activequeue management techniques is that they rely on queuelength as an estimator of congestion.1. While the pres-ence of a persistent queue indicates congestion, its lengthgives very little information as to the severity of conges-tion, that is, the number of competing connections sharingthe link. In a busy period, a single source transmitting ata rate greater than the bottleneck link capacity can causea queue to build up just as easily as a large number ofsources can. From well-known results in queuing theory,it is only when packet interarrivals have a Poisson distri-

1We note that at the time of initial publication [10],BLUE was theonly active queue management which did not use queue length. Subse-quent algorithms [1, 17, 20] have also shown the benefits of decouplingqueue length from congestion management

Page 2: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

2

bution that queue lengths directly relate to the number ofactive sources and thus the true level of congestion. Un-fortunately, packet interarrival times across network linksare decidedly non-Poisson. Packet interarrivals from indi-vidual sources are driven byTCPdynamics and source in-terarrivals themselves are heavy-tailed in nature [21, 29].This makes placing queue length at the heart of an ac-tive queue management scheme dubious. Since the RED

algorithm relies on queue lengths, it has an inherent prob-lem in determining the severity of congestion. As a result,RED requires a wide range of parameters to operate cor-rectly under different congestion scenarios. While RED

can achieve an ideal operating point, it can only do sowhen it has a sufficient amount of buffer spaceand is cor-rectly parameterized [6, 34].

In light of the above observation, we propose a fun-damentally different active queue management algorithm,called BLUE, which uses packet loss and link utilizationhistory to manage congestion. BLUE maintains a sin-gle probability, which it uses to mark (or drop) packetswhen they are queued. If the queue is continually drop-ping packets due to buffer overflow, BLUE increments themarking probability, thus increasing the rate at which itsends back congestion notification. Conversely, if thequeue becomes empty or if the link is idle, BLUE de-creases its marking probability. Using both simulationand experimentation, we demonstrate the superiority ofBLUE to RED in reducing packet losses even when op-erating with a smaller buffer. Using mechanisms basedon BLUE, a novel mechanism for effectively and scalablyenforcing fairness among a large number of flows is alsoproposed and evaluated.

The rest of the paper is organized as follows. Section IIgives a description of RED and shows why it is ineffectiveat managing congestion. Section III describes BLUE andprovides a detailed analysis and evaluation of its perfor-mance based on simulation as well as controlled experi-ments. Section IV describes and evaluates Stochastic FairBLUE (SFB), an algorithm based on BLUE which scalablyenforces fairness amongst a large number of connections.Section V comparesSFB to other approaches which havebeen proposed to enforce fairness amongst connections.Finally, Section VI concludes with a discussion of futurework.

II. BACKGROUND

One of the biggest problems withTCP’s congestioncontrol algorithm over drop-tail queues is that sources re-duce their transmission rates only after detecting packetloss due to queue overflow. Since a considerable amountof time may elapse between the packet drop at the routerand its detection at the source, a large number of packets

may be dropped as the senders continue transmission at arate that the network cannot support. RED alleviates thisproblem by detecting incipient congestionearly and de-livering congestion notification to the end-hosts, allowingthem to reduce their transmission rates before queue over-flow occurs. In order to be effective, a RED queue must beconfigured with a sufficient amount of buffer space to ac-commodate an applied load greater than the link capacityfrom the instant in time that congestion is detected us-ing the queue length trigger, to the instant in time thatthe applied load decreases at the bottleneck link in re-sponse to congestion notification. RED must also ensurethat congestion notification is given at a rate which suf-ficiently suppresses the transmitting sources without un-derutilizing the link. Unfortunately, when a large numberof TCP sources are active, the aggregate traffic generatedis extremely bursty [8, 9]. Bursty traffic often defeats theactive queue management techniques used by RED sincequeue lengths grow and shrink rapidly, well before RED

can react. Figure 1(a) shows a simplified pictorial exam-ple of how RED functions under this congestion scenario.

The congestion scenario presented in Figure 1(a) oc-curs when a large number ofTCP sources are active andwhen a small amount of buffer space is used at the bot-tleneck link. As the figure shows, att = 1, a sufficientchange in aggregateTCP load (due toTCPopening its con-gestion window) causes the transmission rates of theTCP

sources to exceed the capacity of the bottleneck link. Att = 2, the mismatch between load and capacity causes aqueue to build up at the bottleneck. Att = 3, the averagequeue length exceedsminth and the congestion-controlmechanisms are triggered. At this point, congestion noti-fication is sent back to the end hosts at a rate dependenton the queue length and marking probabilitymaxp. Att = 4, the TCP receivers either detect packet loss or ob-serve packets with theirECN bits set. In response, dupli-cate acknowlegdements and/orTCP-basedECN signals aresent back to the sources. Att = 5, the duplicate acknowl-egements and/orECN signals make their way back to thesources to signal congestion. Att = 6, the sources finallydetect congestion and adjust their transmission rates. Fi-nally, att = 7, a decrease in offered load at the bottlenecklink is observed. Note that it has taken fromt = 1 untilt = 7 before the offered load becomes less than the link’scapacity. Depending upon the aggressiveness of the ag-gregateTCP sources [8, 9] and the amount of buffer spaceavailable in the bottleneck link, a large amount of packetloss and/or deterministicECN marking may occur. Suchbehavior leads to eventual underutilization of the bottle-neck link.

One way to solve this problem is to use a large amountof buffer space at the RED gateways. For example, it has

Page 3: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

3

Sources SinksA

Sources SinksA

Sources SinksA

Sending rate > L Mbs Queue increases some moreSinks generate DupAcks or ECN

Sources SinksA

Sending rate > L Mbs Queue increases some more

DupAcks/ECN travel back

Sources SinksA

7

Queue increases some moreQueue overflows, max_th triggeredSources detect loss/ECN

Sending rate < L Mbs

Sources SinksA

Sources

Sending rate increases above L Mbs

SinksL Mbs

A B

Sending rate > L Mbs Queue increases

Sending rate > L MbsEWMA increases to trigger REDQueue increases some more

Sending rate < L MbsSustained packet loss and ECN observed

Queue clears but period of underutilization imminent due to sustained packet loss and ECN

2

1

3

4

5

6

Fig. 1. RED example

L MbsA B SinksSources

Sending rate = L Mbs

Queue drops and/or ECN-marks exactly the correct amount of packets to keep sending rate of sources at L Mbs

Sinks generate DupAcks or ECN

Fig. 2. Ideal scenario

been suggested that in order for RED to work well, anintermediate router requires buffer space that amounts totwice the bandwidth-delay product [34]. This approach,in fact, has been taken by an increasingly large numberof router vendors. Unfortunately, in networks with largebandwidth-delay products, the use of large amounts ofbuffer adds considerable end-to-end delay and delay jit-ter. This severely impairs the ability to run interactive ap-plications. In addition, the abundance of deployed routerswhich have limited memory resources makes this solutionundesirable.

Figure 1(b) shows how an ideal queue management al-gorithm works. In this figure, the congested gateway de-livers congestion notification at a rate which keeps the ag-gregate transmission rates of theTCP sources at or just

below the clearing rate. While RED can achieve this idealoperating point, it can do so only when it has a sufficientlylarge amount of buffer space and is correctly parameter-ized.

III. B LUE

In order to remedy the shortcomings of RED, we pro-pose, implement, and evaluate a fundamentally differ-ent queue management algorithm called BLUE. Usingboth simulation and experimentation, we show that BLUE

overcomes many of RED’s shortcomings. RED has beendesigned with the objective to (1) minimize packet lossand queueing delay, (2) avoid global synchronization ofsources, (3) maintain high link utilization, and (4) removebiases against bursty sources. This section shows howBLUE either improves or matches RED’s performance inall of these aspects. The results also show that BLUE con-verges to the ideal operating point shown in Figure 1(b)in almost all scenarios, even when used with very smallbuffers.

A. The algorithm

The key idea behind BLUE is to perform queue man-agement based directly on packet loss and link utilization

Page 4: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

4

Upon packet loss (orQlen > L) event:if ( (now - last update) > freezetime)

pm := pm + δ1

last update := nowUpon link idle event:

if ( (now - last update) > freezetime)pm := pm - δ2

last update := now

Fig. 3. The BLUE algorithm

rather than on the instantaneous or average queue lengths.This is in sharp contrast to all known active queue man-agement schemes which use some form of queue occu-pancy in their congestion management. BLUE maintainsa single probability,pm, which it uses to mark (or drop)packets when they are enqueued. If the queue is contin-ually dropping packets due to buffer overflow, BLUE in-crementspm, thus increasing the rate at which it sendsback congestion notification. Conversely, if the queuebecomes empty or if the link is idle, BLUE decreasesits marking probability. This effectively allows BLUE to“learn” the correct rate it needs to send back congestionnotification. Figure 3 shows the BLUE algorithm. Notethat the figure also shows a variation to the algorithm inwhich the marking probability is updated when the queuelength exceeds a certain value. This modification allowsroom to be left in the queue for transient bursts and allowsthe queue to control queueing delay when the size of thequeue being used is large. Besides the marking probabil-ity, BLUE uses two other parameters which control howquickly the marking probability changes over time. Thefirst is freezetime. This parameter determines the mini-mum time interval between two successive updates ofpm.This allows the changes in the marking probability to takeeffect before the value is updated again. While the experi-ments in this paper fixfreezetimeas a constant, this valueshould be randomized in order to avoid global synchro-nization [13]. The other parameters used, (δ1 andδ2), de-termine the amount by whichpm is incremented when thequeue overflows or is decremented when the link is idle.For the experiments in this paper,δ1 is set significantlylarger thanδ2. This is because link underutilization canoccur when congestion management is either too conser-vative or too aggressive, but packet loss occurs only whencongestion management is too conservative. By weight-ing heavily against packet loss, BLUE can quickly react toa substantial increase in traffic load. Note that there are a

A B C

n5

n6

n7

n8

n9

n1

n0

n2

n3

n4

45Mbs 45Mbs

10ms10ms

20ms

20ms

5ms

5ms

1ms

20ms

1ms

5ms

5ms

20ms

100Mbs100Mbs

Fig. 4. Network topology

myriad of ways in whichpm can be managed. While theexperiments in this paper study a small range of parametersettings, experiments with additional parameter settingsand algorithm variations have also been performed withthe only difference being how quickly the queue manage-ment algorithm adapts to the offered load. It is relativelysimple process to configure BLUE to meet the goals ofcontrolling congestion. The first parameter,freezetime,should be set based on the effective round-trip times ofconnections multiplexed across the link in order to allowany changes in the marking probability to reflect back onto the end sources before additional changes are made.For long-delay paths such as satellite links,freezetimeshould be increased to match the longer round-trip times.The second set of parametersδ1 and δ2 are set to givethe link the ability to effectively adapt to macroscopicchanges in load across the link at the connection level.For links where extremely large changes in load occuronly on the order of minutes,δ1 andδ2 should be set inconjunction withfreezetime to allowpm to range from 0to 1 on the order of minutes. This is in contrast to currentqueue length approaches where the marking and droppingprobabilities range from 0 to 1 on the order of millisec-onds even under constant load. Over typical links, usingfreezetimevalues between10ms and500ms and settingδ1 andδ2 so that they allowpm to range from 0 to 1 onthe order of 5 to 30 seconds will allow the BLUE controlalgorithm to operate effectively. Note that while BLUE

algorithm itself is extremely simple, it provides a signifi-cant performance improvement even when compared to aRED queue which has been reasonably configured.

B. Packet loss rates usingRED andBLUE

In order to evaluate the performance of BLUE, a num-ber of simulation experiments were run usingns [23]over a small network shown in Figure 4. Using this net-work, Pareto on/off sources with mean on-times of 2 sec-

Page 5: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

5

Configuration wq

R1 0.0002R2 0.002R3 0.02R4 0.2

TABLE I

RED CONFIGURATIONS

Configuration freeze time δ1 δ2

B1 10ms 0.0025 0.00025B2 100ms 0.0025 0.00025B3 10ms 0.02 0.002B4 100ms 0.02 0.002

TABLE II

BLUE CONFIGURATIONS

onds and mean off-times of 3 seconds were run fromone of the leftmost nodes (n0, n1, n2, n3, n4) to one ofthe rightmost nodes (n5, n6, n7, n8, n9). In addition, allsources were enabled withECN support, were randomlystarted within the first 1 second of simulation, and used1KB packets. Packet loss statistics were then measuredafter 100 seconds of simulation for 100 seconds. Lossstatistics were also measured for RED using the same net-work and under identical conditions. For the RED queue,minth andmaxth were set to 20% and 80% of the queuesize, respectively. RED’s congestion notification mecha-nism was made as aggressive as possible by settingmaxp

to 1. For these experiments, this is the ideal setting ofmaxp since it minimizes both the queueing delay andpacket loss rates for RED [9]. Given these settings, arange of RED configurations are studied which vary thevalue ofwq, the weight in the average queue length cal-culation for RED. It is interesting to note that aswq getssmaller, the impact of queue length on RED’s conges-tion management algorithm gets smaller. For extremelysmall values ofwq, RED’s algorithm becomes decoupledfrom the queue length and thus acts more like BLUE. Ta-ble I shows the configurations used for RED. For theBLUE experiments,δ1 andδ2 are set so thatδ1 is an or-der of magnitude larger thanδ2. Using these values, thefreeze time is then varied between10ms and100ms.Additional simulations using a wider range of values werealso performed and showed similar results.

Figure 5 shows the loss rates observed over differ-ent queue sizes using both BLUE and RED with 1000and 4000 connections present. In these experiments, thequeue at the bottleneck link betweenA and B is sizedfrom 100KB to 1000KB. This corresponds to queueingdelays which range from17.8ms and178ms as shown in

the figure. In all experiments, the link remains over99.9%utilized. As Figure 5(a) shows, with 1000 connections,BLUE maintains zero loss rates over all queue sizes eventhose which are below the bandwidth-delay product of thenetwork [34]. This is in contrast to RED which suffersdouble-digit loss rates as the amount of buffer space de-creases. An interesting point in the RED loss graph shownin Figure 5(a) is that it shows a significant dip in loss ratesat a buffering delay of around80ms. This occurs becauseof a special operating point of RED when the averagequeue length stays abovemaxth all the time. At sev-eral points during this particular experiment, the buffer-ing delay and offered load match up perfectly to cause theaverage queue length to stay at or abovemaxth. In thisoperating region, the RED queue marks every packet, butthe offered load is aggressive enough to keep the queuefull. This essentially allows RED to behave at times likeBLUE with a marking probability of 1 and a queueingdelay equivalent tomaxth. This unique state of opera-tion is immediately disrupted by any changes in the loador round-trip times, however. When the buffering delayis increased, the corresponding round-trip times increaseand cause the aggregateTCP behavior to be less aggres-sive. Deterministic marking on this less aggressive loadcauses fluctuations in queue length which can increasepacket loss rates since RED undermarks packets at times.When the buffering delay is decreased, the correspond-ing round-trip times decrease and cause the aggregateTCP

behavior to be more aggressive. As a result, packet lossis often accompanied with deterministic marking. Whencombined, this leads again to fluctuations in queue length.At a load which is perfectly selected, the average queuelength of RED can remain atmaxth and the queue canavoid packet loss and prevent queue fluctuations by mark-ing every packet. As Figure 5(b) shows, when the num-ber of connections is increased to 4000, BLUE still sig-nificantly outperforms RED. Even with an order of mag-nitude more buffer space, RED still cannot match BLUE’sloss rates using17.8ms of buffering at the bottleneck link.It is interesting to note that BLUE’s marking probabilityremains at 1 throughout the duration of all of these exper-iments. Thus, even though every packet is being marked,the offered load can still cause a significant amount ofpacket loss. The reason why this is the case is that theTCP sources being used do not invoke a retransmissiontimeout upon receiving anECN signal with a congestionwindow of 1. Section III-D shows how this can signifi-cantly influence the performance of both RED and BLUE.

The most important consequence of using BLUE isthat congestion control can be performed with a mini-mal amount of buffer space. This reduces the end-to-enddelay over the network, which in turn, improves the ef-fectiveness of the congestion control algorithm. In addi-

Page 6: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

6

0.0 50.0 100.0 150.0 200.0Buffer Size (in ms of delay)

0.0

5.0

10.0

15.0

20.0

Per

cent

Pac

ket L

oss

B1B2B3B4R1R2R3R4

0.0 50.0 100.0 150.0 200.0Buffer Size (in ms of delay)

0.0

10.0

20.0

30.0

40.0

Per

cent

Pac

ket L

oss

B1B2B3B4R1R2R3R4

(a) 1000 sources (b) 4000 sources

Fig. 5. Packet loss rates of RED and BLUE

tion, smaller buffering requirements allow more memoryto be allocated to high priority packets [5, 16], and freesup memory for other router functions such as storing largerouting tables. Finally, BLUE allows legacy routers to per-form well even with limited memory resources.

C. UnderstandingBLUE

To fully understand the difference between the RED

and BLUE algorithms, Figure 6 compares their queuelength plots in an additional experiment using theB4 con-figuration of BLUE and theR2 configuration of RED. Inthis experiment, a workload of infinite sources is changedby increasing the number of connections by 200 every 20seconds. As Figure 6(a) shows, RED sustains continualpacket loss throughout the experiment. In addition, atlower loads, periods of packet loss are often followed byperiods of underutilization as deterministic packet mark-ing and dropping eventually causes too many sources toreduce their transmission rates. In contrast, as Figure 6(b)shows, since BLUE manages its marking rate more intel-ligently, the queue length plot is more stable. Congestionnotification is given at a rate which neither causes periodsof sustained packet loss nor periods of continual underuti-lization. Only when the offered load rises to 800 connec-tions, does BLUE sustain a significant amount of packetloss.

Figure 7 plots the average queue length (Qave) and themarking probability ( pb

1−count×pb) of RED throughout the

experiment. The average queue length of RED contributesdirectly to its marking probability sincepb is a linear func-tion of Qave (pb = maxp × Qave−minth

maxth−minth). As shown in

Figure 7(a), the average queue length of RED fluctuatesconsiderably as it follows the fluctuations of the instanta-neous queue length. Because of this, the marking prob-

ability of RED, as shown in Figure 7(b), fluctuates con-siderably as well. In contrast, Figure 8 shows the mark-ing probability of BLUE. As the figure shows, the mark-ing probability converges to a value that results in a rateof congestion notification which prevents packet loss andkeeps link utilization high throughout the experiment. Infact, the only situation where BLUE cannot prevent sus-tained packet loss is when every packet is being marked,but the offered load still overwhelms the bottleneck link.As described earlier, this occurs att = 60s when thenumber of sources is increased to 800. The reason whypacket loss still occurs when every packet isECN-markedis that for these sets of experiments, theTCP implementa-tion used does not invoke an RTO when anECN signal isreceived with a congestion window of 1. This adverselyaffects the performance of both RED and BLUE in thisexperiment. Note that the comparison of marking proba-bilities between RED and BLUE gives some insight as tohow to make RED perform better. By placing a low passfilter on the calculated marking probability of RED, it maybe possible for RED’s marking mechanism to behave in amanner similar to BLUE’s.

While low packet loss rates, low queueing delays, andhigh link utilization are extremely important, the queuelength and marking probability plots allow us to explorethe effectiveness of RED and BLUE in preventing globalsynchronization and in removing biases against burstysources. RED attempts to avoid global synchronizationby randomizing its marking decision and by spacing outits marking. Unfortunately, when aggregateTCP loadchanges dramatically as it does when a large amount ofconnections are present, it becomes impossible for RED

to achieve this goal. As Figure 7(b) shows, the mark-ing probability of RED changes considerably over veryshort periods of time. Thus, RED fails to mark pack-

Page 7: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

7

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0

40

80

120

160

200

Act

ual Q

ueue

Len

gth

(KB

)

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0

40

80

120

160

200

Act

ual Q

ueue

Len

gth

(KB

)

(a) RED (b) BLUE

Fig. 6. Queue length plots of RED and BLUE

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0

40

80

120

160

200

Ave

rage

Que

ue L

engt

h (K

B)

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Mar

king

Pro

babi

lity

(a)Qave (b) pb

1−count×pb

Fig. 7. Marking behavior of RED

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Mar

king

Pro

babi

lity

Fig. 8. Marking behavior of BLUE (pm)

ets evenly over time and hence cannot remove synchro-nization among sources. As Figure 8 shows, the markingprobability of BLUE remains steady. As a result, BLUE

marks packets randomly and evenly over time. Conse-quently, it does a better job in avoiding global synchro-nization.

Another goal of RED is to eliminate biases againstbursty sources in the network. This is done by limiting thequeue occupancy so that there is always room left in thequeue to buffer transient bursts. In addition, the markingfunction of RED takes into account the last packet mark-ing time in its calculations in order to reduce the probabil-ity that consecutive packets belonging to the same burstare marked. Using a single marking probability, BLUE

achieves the same goal equally well. As the queue lengthplot of BLUE shows (Figure 6), the queue occupancy re-mains below the actual capacity, thus allowing room for a

Page 8: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

8

burst of packets. In addition, since the marking probabil-ity remains smooth over large time scales, the probabilitythat two consecutive packets from a smoothly transmit-ting source are marked is the same as with two consecu-tive packets from a bursty source.

D. The effect ofECN timeouts

All of the previous experiments useTCP sources whichsupportECN, but do not perform a retransmission timeoutupon receipt of anECN signal with a congestion windowof 1. This has a significant, negative impact on the packetloss rates observed for both RED and BLUE especially athigh loads. Figure 9 shows the queue length plot of RED

and BLUE using the same experiments as in Section III-B, but with TCP sources enabled withECN timeouts. Fig-ure 9(a) shows that by deterministically marking packetsat maxth, RED oscillates between periods of packet lossand periods of underutilization as described in Section II.Note that this is in contrast to Figure 6(a) where with-out ECN timeouts,TCP is aggressive enough to keep thequeue occupied when the load is sufficiently high. Aninteresting point to make is that RED can effectively pre-vent packet loss by setting itsmaxth value sufficiently farbelow the size of the queue. In this experiment, a smallamount of loss occurs since deterministicECN markingdoes not happen in time to prevent packet loss. While theuse ofECN timeouts allows RED to avoid packet loss, thedeterministic marking eventually causes underutilizationat the bottleneck link. Figure 9(b) shows the queue lengthplot of BLUE over the same experiment. In contrast toRED, BLUE avoids deterministic marking and maintainsa marking probability that allows it to achieve high linkutilization while avoiding sustained packet loss over allworkloads.

Figure 10 shows the corresponding marking behaviorof both RED and BLUE in the experiment. As the fig-ure shows, BLUE maintains a steady marking rate whichchanges as the workload is changed. On the other hand,RED’s calculated marking probability fluctuates from 0to 1 throughout the experiment. When the queue is fullyoccupied, RED overmarks and drops packets causing asubsequent period of underutilization as described in Sec-tion II. Conversely, when the queue is empty, RED under-marks packets causing a subsequent period of high packetloss as the offered load increases well beyond the link’scapacity.

Figure 11 shows howECN timeouts impact the perfor-mance of RED and BLUE. The figure shows the loss ratesand link utilization using the 1000 and 4000 connectionexperiments in Section III-B. As the figure shows, BLUE

maintains low packet loss rates and high link utilizationacross all experiments. The figure also shows that the

Intellistation Intellistation

WinBookXL

10Mbs

233 MHz/32 MB 266 MHz/64 MB

Thinkpad 770

IBM PC 365

200 MHz/64 MB

ZProMPro

IBM PC 360

150 MHz/64 MB

400 MHz/128 MB 200 MHz/64 MB

100Mbs100Mbs

Fig. 12. Experimental testbed

use ofECN timeouts allows RED to reduce the amount ofpacket loss in comparison to Figure 5. However, becauseRED often deterministically marks packets, it suffers frompoor link utilization unless correctly parameterized. Thefigure shows that only an extremely small value ofwq

(ConfigurationR1) allows RED to approach the perfor-mance of BLUE. As described earlier, a smallwq valueeffectively decouples congestion management from thequeue length calculation making RED queue managementbehave more like BLUE.

E. Implementation

In order to evaluate BLUE in a more realistic setting, ithas been implemented in FreeBSD 2.2.7 using ALTQ [4].In this implementation,ECN uses two bits of the type-of-service (ToS) field in the IP header [31]. When BLUE

decides that a packet must be dropped or marked, it exam-ines one of the two bits to determine if the flow isECN-capable. If it is notECN-capable, the packet is simplydropped. If the flow isECN-capable, the other bit is setand used as a signal to theTCP receiver that congestionhas occurred. TheTCP receiver, upon receiving this sig-nal, modifies theTCP header of the return acknowledg-ment using a currently unused bit in theTCP flags field.Upon receipt of aTCP segment with this bit set, theTCP

sender invokes congestion-control mechanisms as if it haddetected a packet loss.

Using this implementation, several experiments wererun on the testbed shown in Figure 12. Each network nodeand link is labeled with the CPU model and link band-width, respectively. Note that all links are shared Ethernetsegments. Hence, the acknowledgments on the reversepath collide and interfere with data packets on the for-ward path. As the figure shows, FreeBSD-based routersusing either RED or BLUE queue management on theiroutgoing interfaces are used to connect the Ethernet andFast Ethernet segments. In order to generate load on thesystem, a variable number ofnetperf [26] sessions arerun from theIBM PC 360and theWinbook XLto theIBMPC 365and theThinkpad 770. The router queue on the

Page 9: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

9

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0

40

80

120

160

200

Act

ual Q

ueue

Len

gth

(KB

)

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0

40

80

120

160

200

Act

ual Q

ueue

Len

gth

(KB

)

(a) RED (b) BLUE

Fig. 9. Queue length plots of RED and BLUE with ECN timeouts

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Mar

king

Pro

babi

lity

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0Time (s)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Mar

king

Pro

babi

lity

(a) pb

1−count×pbof RED (b) pm of BLUE

Fig. 10. Marking behavior withECN timeouts

congested Ethernet interface of theIntellistation Zproissized at50KB which corresponds to a queueing delay ofabout40ms. For the experiments with RED, a configura-tion with aminth of 10KB, amaxth of 40KB, amaxp

of 1, and awq of 0.002 was used. For the experimentswith BLUE, aδ1 of 0.01, aδ2 of 0.001 and afreeze timeof 50ms was used. To ensure that the queue managementmodifications did not create a bottleneck in the router, thetestbed was reconfigured exclusively with Fast Ethernetsegments and a number of experiments between networkendpoints were run using the BLUE modifications on theintermediate routers. In all of the experiments, the sus-tained throughput was always above 80 Mbps.

Figures 13(a) and (b) show the throughput and packetloss rates at the bottleneck link across a range of work-loads. The throughput measures the rate at which packetsare forwarded through the congested interface while the

packet loss rate measures the ratio of the number of pack-ets dropped at the queue and the total number of packetsreceived at the queue. In each experiment, throughput andpacket loss rates were measured over five 10-second inter-vals and then averaged. Note that theTCP sources used inthe experiment do not implementECN timeouts. As Fig-ure 13(a) shows, both the BLUE queue and the optimallyconfigured RED queue maintain relatively high levels ofthroughput across all loads. However, since RED periodi-cally allows the link to become underutilized, its through-put remains slightly below that of BLUE. As Figure 13(b)shows, RED sustains increasingly high packet loss as thenumber of connections is increased. Since aggregateTCP

traffic becomes more aggressive as the number of con-nections increases, it becomes difficult for RED to main-tain low loss rates. Fluctuations in queue lengths occurso abruptly that the RED algorithm oscillates between pe-

Page 10: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

10

0.0 50.0 100.0 150.0 200.0Buffer Size (in ms of delay)

0.0

1.0

2.0

3.0

4.0

5.0

Per

cent

Pac

ket L

oss

B1B2B3B4R1R2R3R4

0.0 50.0 100.0 150.0 200.0Buffer Size (in ms of delay)

70.0

75.0

80.0

85.0

90.0

95.0

100.0

Per

cent

Lin

k U

tiliz

atio

n

B1B2B3B4R1R2R3R4

(a) Loss rates (1000 sources) (b) Link utilization (1000 sources)

0.0 50.0 100.0 150.0 200.0Buffer Size (in ms of delay)

0.0

1.0

2.0

3.0

4.0

5.0

Per

cent

Pac

ket L

oss

B1B2B3B4R1R2R3R4

0.0 50.0 100.0 150.0 200.0Buffer Size (in ms of delay)

70.0

75.0

80.0

85.0

90.0

95.0

100.0

Per

cent

Lin

k U

tiliz

atio

n

B1B2B3B4R1R2R3R4

(c) Loss rates (4000 sources) (d) Link utilization (4000 sources)

Fig. 11. Performance of RED and BLUE with ECN timeouts

0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0Number of Connections

9.00

9.10

9.20

9.30

9.40

Thro

ughp

ut (M

bs)

BlueRED

0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0Number of Connections

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

Per

cent

Pac

ket L

oss

BlueRED

(a) Throughput (b) Percent packet loss

Fig. 13. Queue management performance

riods of sustained marking and packet loss to periods ofminimal marking and link underutilization. In contrast,

BLUE maintains relatively small packet loss rates acrossall loads. At higher loads, when packet loss is observed,

Page 11: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

11

BLUE maintains a marking probability which is approxi-mately 1, causing it to mark every packet it forwards.

IV. STOCHASTIC FAIR BLUE

Up until recently, the Internet has mainly relied on thecooperative nature ofTCP congestion control in order tolimit packet loss and fairly share network resources. In-creasingly, however, new applications are being deployedwhich do not useTCP congestion control and are not re-sponsive to the congestion signals given by the network.Such applications are potentially dangerous because theydrive up the packet loss rates in the network and can even-tually cause congestion collapse [19, 28]. In order to ad-dress the problem of non-responsive flows, a lot of workhas been done to provide routers with mechanisms forprotecting against them [7, 22, 27]. The idea behind theseapproaches is to detect non-responsive flows and to limittheir rates so that they do not impact the performance ofresponsive flows. This section describes and evaluatesStochastic FairBLUE (SFB), a novel technique based onBloom filters [2] for protectingTCP flows against non-responsive flows. Based on the BLUE algorithm. SFB ishighly scalable and enforces fairness using an extremelysmall amount of state and a small amount of buffer space.

A. The algorithm

Figure 14 shows the basicSFBalgorithm.SFB is aFIFO

queueing algorithm that identifies and rate-limits non-responsive flows based on accounting mechanisms sim-ilar to those used with BLUE. SFB maintainsN × L ac-counting bins. The bins are organized inL levels withNbins in each level. In addition,SFB maintains (L) inde-pendent hash functions, each associated with one level ofthe accounting bins. Each hash function maps a flow, viaits connection ID(Source address, Destination address,Source port, Destination port, Protocol), into one of theN accounting bins in that level. The accounting bins areused to keep track of queue occupancy statistics of pack-ets belonging to a particular bin. This is in contrast toStochastic Fair Queueing [24] (SFQ) where the hash func-tion maps flows into separate queues. Each bin inSFB

keeps a marking/dropping probabilitypm as in BLUE,which is updated based on bin occupancy. As a packetarrives at the queue, it is hashed into one of theN bins ineach of theL levels. If the number of packets mapped toa bin goes above a certain threshold (i.e., the size of thebin), pm for the bin is increased. If the number of packetsdrops to zero,pm is decreased.

The observation which drivesSFB is that a non-responsive flow quickly drivespm to 1 in all of theL binsit is hashed into. Responsive flows may share one or two

init()B[l][n]: AllocateL×N array of bins(L levels,N bins per level)

enque()Calculate hashesh0, h1, . . . , hL−1;Update bins at each levelfor i = 0 to L− 1

if (B[i][hi].qlen > bin size)B[i][hi].pm+ = ∆;Drop packet;

else if (B[i][hi].qlen == 0)B[i][hi].pm− = ∆;

pmin = min(B[0][h0].pm,. . .,B[L][hL].pm);

if (pmin == 1)ratelimit()

elseMark/drop with probabilitypmin;

Fig. 14. SFB algorithm

������������������������������������������

������������������������������������������

������������������������������������������������������

������������������������������������������������������

������������������������������������������������������

������������������������������������������������������

������������������������������������������������������

������������������������������������������������������

h0 h h h1 L-2 L-1

0

Non-responsiveFlow

TCP Flow

P=0.3

P=0.2

P=1.0

P=1.0

P=1.0

P=1.0

P=0.2

minP =0.2

P =1.0min

B-1

Fig. 15. Example ofSFB

bins with non-responsive flows, however, unless the num-ber of non-responsive flows is extremely large comparedto the number of bins, a responsive flow is likely to behashed into at least one bin that is not polluted with non-responsive flows and thus has a normalpm value. Thedecision to mark a packet is based onpmin, the mini-mum pm value of all bins to which the flow is mappedinto. If pmin is 1, the packet is identified as belongingto a non-responsive flow and is then rate-limited. Notethat this approach is akin to applying a Bloom filter onthe incoming flows. In this case, the dictionary of mes-sages or words is learned on the fly and consists of the

Page 12: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

12

IP headers of the non-responsive flows which are multi-plexed across the link [2]. When a non-responsive flow isidentified using these techniques, a number of options areavailable to limit the transmission rate of the flow. In thispaper, flows identified as being non-responsive are sim-ply limited to a fixed amount of bandwidth. This policy isenforced by limiting the rate of packet enqueues for flowswith pmin values of 1. Figure 15 shows an example ofhow SFB works. As the figure shows, a non-responsiveflow drives up the marking probabilities of all of the binsit is mapped into. While theTCP flow shown in the figuremay map into the same bin as the non-responsive flow ata particular level, it maps into normal bins at other levels.Because of this, the minimum marking probability of theTCPflow is below 1.0 and thus, it is not identified as beingnon-responsive. On the other hand, since the minimummarking probability of the non-responsive flow is 1.0, itis identified as being non-responsive and rate-limited.

Note that just as BLUE’s marking probability can beused inSFB to provide protection against non-responsiveflows, it is also possible to apply Adaptive RED’s maxp

parameter [9] to do the same. In this case, a per-binmaxp value is kept and updated according to the behav-ior of flows which map into the bin. As with RED, how-ever, there are two problems which make this approachineffective. The first is the fact that a large amount ofbuffer space is required in order to get RED to performwell. The second is that the performance of a RED-basedscheme is limited since even a moderate amount of con-gestion requires amaxp setting of 1. Thus, RED, used inthis manner, has an extremely difficult time distinguish-ing between a non-responsive flow and moderate levelsof congestion. In order to compare approaches, Stochas-tic Fair RED (SFRED) was also implemented by applyingthe same techniques used forSFB to RED.

B. Evaluation

Usingns , theSFBalgorithm was simulated in the samenetwork as in Figure 4 with the transmission delay ofall of the links set to10ms. The SFB queue is config-ured with200KB of buffer space and maintains two hashfunctions each mapping to 23 bins. The size of each bin isset to 13, approximately 50% more than123 of the avail-

able buffer space. Note that by allocating more than123

of the buffer space to each bin,SFB effectively “over-books” the buffer in an attempt to improve statistical mul-tiplexing. Notice that even with overbooking, the size ofeach bin is quite small. Since BLUE performs extremelywell under constrained memory resources,SFB can stilleffectively maximize network efficiency. The queue isalso configured to rate-limit non-responsive flows to 0.16Mbps.

In the experiments, 400TCP sources and one non-responsive, constant rate source are run for 100 secondsfrom randomly selected nodes in (n0, n1, n2, n3, n4) torandomly selected nodes in (n5, n6, n7, n8, n9). In oneexperiment, the non-responsive flow transmits at a rate of2 Mbps while in the other, it transmits at a rate of45Mbs.Table III shows the packet loss observed in both experi-ments forSFB. As the table shows, for both experiments,SFB performs extremely well. The non-responsive flowsees almost all of the packet loss as it is rate-limited to afixed amount of the link bandwidth. In addition, the tableshows that in both cases, a very small amount of packetsfrom TCP flows are lost. Table III also shows the perfor-mance of RED. In contrast toSFB, RED allows the non-responsive flow to maintain a throughput relatively closeto its original sending rate. As a result, the remainingTCP

sources see a considerable amount of packet loss whichcauses their performance to deteriorate.SFRED, on theother hand, does slightly better at limiting the rate of thenon-responsive flow, however, it cannot fully protect theTCP sources from packet loss since it has a difficult timediscerning non-responsive flows from moderate levels ofcongestion. Finally, the experiments were repeated usingSFQ with an equivalent number of bins (i.e., 46 distinctqueues) and a buffer more than twice the size (414KB),making each queue equally sized at9KB. For each bin inthe SFQ, the RED algorithm was applied withminth andmaxth values set at2KB and8KB, respectively. As thetable shows,SFQ with RED does an adequate job of pro-tectingTCPflows from the non-responsive flow. However,in this case, partitioning the buffers into such small sizescauses a significant amount of packet loss to occur dueto RED’s inability to operate properly with small buffers.Additional experiments show that as the amount of bufferspace is decreased even further, the problem is exacer-bated and the amount of packet loss increases consider-ably.

To qualitatively examine the impact that the non-responsive flow has onTCP performance, Figure 16(a)plots the throughput of all 400TCP flows usingSFB whenthe non-responsive flow sends at a45Mbs rate. As thefigure shows,SFB allows eachTCP flow to maintain closeto a fair share of the bottleneck link’s bandwidth while thenon-responsive flow is rate-limited to well below its trans-mission rate. In contrast, Figure 16(b) shows the sameexperiment using normal RED queue management. Thefigure shows that the throughput of allTCP flows suffersconsiderably as the non-responsive flow is allowed to graba large fraction of the bottleneck link bandwidth. Fig-ure 16(c) shows that whileSFRED does succeed in rate-limiting the non-responsive flow, it also manages to dropa significant amount of packets fromTCP flows. This isdue to the fact that the lack of buffer space and the in-

Page 13: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

13

2Mbps non-responsive flow 45Mbps non-responsive flowPacket Loss (Mbps) SFB RED SFRED SFQ+RED SFB RED SFRED SFQ+RED

Total 1.86 1.79 3.10 3.60 44.85 13.39 42.80 46.47Non-responsive flow 1.85 0.03 0.63 1.03 44.84 10.32 40.24 43.94All TCP flows 0.01 1.76 2.57 2.47 0.01 3.07 2.56 2.53

TABLE III

SFB LOSS RATES INMBPS (ONE NON-RESPONSIVE FLOW)

0 100 200 300 400Flow Number

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Thro

ughp

ut (M

bs)

Non−responsive Flow Throughput = 0.16 Mbs

Fair Share

0 100 200 300 400Flow Number

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Thro

ughp

ut (M

bs)

Non−responsive Flow Throughput = 34.68 Mbs

Fair Share

(a) SFB (b) RED

0 100 200 300 400Flow Number

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Thro

ughp

ut (M

bs)

Non−responsive Flow Throughput = 4.76 Mbs

Fair Share

0 100 200 300 400Flow Number

0.00

0.20

0.40

0.60

0.80

1.00

1.20

Thro

ughp

ut (M

bs)

Non−responsive Flow Throughput = 0.10 Mbs

Fair Share

(c) SFRED (d) SFQ+REDFig. 16. Bandwidth ofTCP flows (45Mbs non-responsive flow)

effectiveness ofmaxp combine to causeSFRED to per-form poorly as described in Section IV-A. Finally, Fig-ure 16(d) shows that whileSFQwith RED can effectivelyrate-limit the non-responsive flows, the partitioning ofbuffer space causes the fairness between flows to dete-riorate as well. The large amount of packet loss inducesa large number of retransmission timeouts across a sub-set of flows which causes significant amounts of unfair-ness [25]. Thus, through the course of the experiment, a

few TCP flows are able to grab a disproportionate amountof the bandwidth while many of the flows receive signif-icantly less than a fair share of the bandwidth across thelink. In addition to this,SFQ with RED allows 1

46 of the400 flows to be mapped into the same queue as the non-responsive flow. Flows that are unlucky enough to mapinto this bin receive an extremely small amount of the linkbandwidth. SFB, in contrast, is able to protect all of theTCP flows in this experiment.

Page 14: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

14

C. Limitations ofSFB

While it is clear that the basicSFB algorithm can pro-tect TCP-friendly flows from non-responsive flows with-out maintaining per-flow state, it is important to under-stand how it works and its limitations.SFB effectivelyusesL levels withN bins in each level to createNL vir-tual buckets. This allowsSFB to effectively identify a sin-gle non-responsive flow in anNL flow aggregate usingO(L ∗ N) amount of state. For example, in the previoussection, using two levels with 23 bins per level effectivelycreates 529 buckets. Since there are only 400 flows in theexperiment,SFB is able to accurately identify and rate-limit a single non-responsive flow without impacting theperformance of any of the individualTCP flows. As thenumber of non-responsive flows increases, the number ofbins which become “polluted” or havepm values of 1 in-creases. Consequently, the probability that a responsiveflow gets hashed into bins which are all polluted, and thusbecomes misclassified, increases. Clearly, misclassifica-tion limits the ability ofSFB to protect well-behavedTCP

flows.

Using simple probabilistic analysis, Equation (1) givesa closed-form expression of the probability that a well-behaved TCP flow gets misclassified as being non-responsive as a function of number of levels (L), thenumber of bins per level (B), and the number of non-responsive/malicious flows (M ), respectively.

p = [1− (1− 1B

)M ]L (1)

In this expression, whenL is 1, SFB behaves much likeSFQ. The key difference is thatSFBusing one level is stilla FIFO queueing discipline with a shared buffer whileSFQ

has separate per-bin queues and partitions the availablebuffer space amongst them.

Using the result from Equation (1), it is possible to op-timize the performance ofSFB givena priori informationabout its operating environment. Suppose the number ofsimultaneously active non-responsive flows can be esti-mated (M ) and the amount of memory available for usein the SFB algorithm is fixed (C). Then, by minimizingthe probability function in Equation (1) with the addi-tional boundary condition thatL × N = C, SFB can betuned for optimal performance. To demonstrate this, theprobability for misclassification across a variety of set-tings is evaluated. Figure 17 shows the probability ofmisclassifying a flow when the total number of bins isfixed at 900. In this figures, the number of levels used inSFB along with the number of non-responsive flows arevaried. As the figures show, when the number of non-responsive flows is small compared to the number of bins,the use of multiple levels keeps the probability of misclas-

0 100 200 300 400 500 600 700 800 900Number of Non−Responsive Flows

0

0.2

0.4

0.6

0.8

1

Pro

babi

lity

of M

iscl

assi

ficat

ion

L=1L=2L=3

Fig. 17. Probability of misclassification using 900 bins

0 100 200 300 400Flow Number

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Thro

ughp

ut (M

bs)

Non−responsive Flows Throughput = 0.21 Mbs

Fair Share

Fig. 18. Bandwidth ofTCPflows usingSFBwith 8 non-responsive flows

sification extremely low. However, as the number of non-responsive flows increases past half the number of binspresent, the single levelSFB queue affords the smallestprobability of misclassification. This is due to the fact thatwhen the bins are distributed across multiple levels, eachnon-responsive flow pollutes a larger number of bins. Forexample, using a single levelSFB queue with 90 bins,a single non-responsive flow pollutes only one bin. Us-ing a two-levelSFB queue with each level containing 45bins, the number of effective bins is 45×45 (2025). How-ever, a single non-responsive flow pollutes two bins (oneper level). Thus, the advantage gained by the two-levelSFB queue is lost when additional non-responsive flowsare added, as a larger fraction of bins become pollutedcompared to the single-level situation.

In order to evaluate the performance degradation ofSFBas the number of non-responsive flows increases, Fig-ure 18 shows the bandwidth plot of the 400TCP flowswhen 8 non-responsive flows are present. In these exper-

Page 15: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

15

iments, each non-responsive flow transmits at a rate of5Mbs. As Equation (1) predicts, in anSFB configurationthat contains two levels of 23 bins,8.96% (36) of theTCP

flows are misclassified when 8 non-responsive flows arepresent. When the number of non-responsive flows ap-proachesN , the performance ofSFB deteriorates quicklyas an increasing number of bins at each level becomespolluted. In the case of 8 non-responsive flows, approx-imately 6 bins or one-fourth of the bins in each level arepolluted. As the figure shows, the number of misclassi-fied flows matches the model quite closely. Note that eventhough a larger number of flows are misclassified as thenumber of non-responsive flows increases, the probabilityof misclassification in a two-levelSFBstill remains belowthat ofSFQor a single-levelSFB. Using the same numberof bins (46), the equation predicts thatSFQ and a single-level SFB misclassify16.12% of theTCP flows (64) when8 non-responsive are present.

D. SFB with moving hash functions

In this section, two basic problems with theSFB algo-rithm are addressed. The first, as described above, is tomitigate the effects of misclassification. The second is tobe able to detect when non-responsive flows become re-sponsive and to reclassify them when they do.

The idea behindSFB with moving hash functions is toperiodically or randomly reset the bins and change thehash functions. A non-responsive flow will continuallybe identified and rate-limited regardless of the hash func-tion used. However, by changing the hash function, re-sponsiveTCP flows that happen to map into polluted binswill potentially be remapped into at least one unpollutedbin. Note that this technique effectively creates virtualbins across time just as the multiple levels of bins in theoriginal algorithm creates virtual bins across space. Inmany ways the effect of using moving hash functions isanalogous to channel hopping in CDMA [18, 33] systems.It essentially reduces the likelihood of a responsive con-nection being continually penalized due to erroneous as-signment into polluted bins.

To show the effectiveness of this approach, the ideaof moving hash functions was applied to the experimentin Figure 18(b). In this experiment, 8 non-responsiveflows along with 400 responsive flows share the bottle-neck link. To protect against continual misclassification,the hash function is changed every two seconds. Fig-ure 19(a) shows the bandwidth plot of the experiment. Asthe figure shows,SFB performs fairly well. While flowsare sometimes misclassified causing a degradation in per-formance, none of theTCP-friendly flows are shut out dueto misclassification. This is in contrast to Figure 18 wherea significant number ofTCPflows receive very little band-

width.

While the moving hash functions improve fairnessacross flows in the experiment, it is interesting to notethat every time the hash function is changed and the binsare reset, non-responsive flows are temporarily placed on“parole”. That is, non-responsive flows are given the ben-efit of the doubt and are no longer rate-limited. Only afterthese flows cause sustained packet loss, are they identi-fied and rate-limited again. Unfortunately, this can po-tentially allow such flows to grab much more than theirfair share of bandwidth over time. For example, as Fig-ure 19(a) shows, non-responsive flows are allowed to con-sume3.85Mbs of the bottleneck link. One way to solvethis problem is to use two sets of bins. As one set ofbins is being used for queue management, a second set ofbins using the next set of hash functions can be warmedup. In this case, any time a flow is classified as non-responsive, it is hashed using the second set of hash func-tions and the marking probabilities of the correspondingbins in the warmup set are updated. When the hash func-tions are switched, the bins which have been warmed upare then used. Consequently, non-responsive flows arerate-limited right from the beginning. Figure 19(b) showsthe performance of this approach. As the figure shows,the double buffered moving hash effectively controls thebandwidth of the non-responsive flows and affords theTCP flows a very high level of protection. Note that oneof the advantages of the moving hash function is that itcan quickly react to non-responsive flows which becomeTCP-friendly. In this case, changing the hash bins placesthe newly reformed flow out on parole for good behavior.Only after the flow resumes transmitting at a high rate,is it again rate-limited. Additional experiments show thatthis algorithm allows for quick adaptation to flow behav-ior [11].

V. COMPARISONS TOOTHER APPROACHES

A. RED with Penalty Box

The RED with penalty box approach takes advantageof the fact that high bandwidth flows see proportionallylarger amounts of packet loss. By keeping a finite log ofrecent packet loss events, this algorithm identifies flowswhich are non-responsive based on the log [7]. Flowswhich are identified as being non-responsive are then rate-limited using a mechanism such as class-based queue-ing [15]. While this approach may be viable under certaincircumstances, it is unclear how the algorithm performsin the face of a large number of non-responsive flows.Unless the packet loss log is large, a single set of highbandwidth flows can potentially dominate the loss log andallow other, non-responsive flows to go through without

Page 16: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

16

0 100 200 300 400Flow Number

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Thro

ughp

ut (M

bs)

Non−responsive Flow Throughput = 3.85 Mbs

Fair Share

0 100 200 300 400Flow Number

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Thro

ughp

ut (M

bs)

Non−responsive Flow Throughput = 0.19 Mbs

Fair Share

(a) Moving hash (b) Double buffered moving hash

Fig. 19. Bandwidth ofTCP flows using modifiedSFB algorithms

rate-limitation. In addition, flows which are classified asnon-responsive remain in the “penalty box” even if theysubsequently become responsive to congestion. A peri-odic and explicit check is thus required to move flows outof the penalty box. Finally, the algorithm relies on aTCP-friendliness check in order to determine whether or nota flow is non-responsive. Withouta priori knowledge ofthe round-trip time of every flow being multiplexed acrossthe link, it is difficult to accurately determine whether ornot a connection isTCP-friendly.

B. StabilizedRED

Stabilized RED is a another approach to detecting non-responsive flows [27]. In this case, the algorithm keepsa finite log of recent flows it has seen. The idea behindthis is that non-responsive flows will always appear in thelog multiple times and can be singled out for punishment.Unfortunately, for a large number of flows, using the lastM flows can fail to catch non-responsive flows. For in-stance, consider a single non-responsive flow sending ata constant rate of 0.5 Mbps in an aggregate consisting of1000 flows over a bottleneck link of 100 Mbps where afair share of bandwidth is 0.1 Mbps. In order to ensurethat the non-responsive flow even shows up in the lastMflows seen,M needs to be at least 200 or 20% of the totalnumber of flows. In general, if there are a total ofN flowsand a non-responsive flow is sending atX times the fairshare,M needs to be at leastN

X in order to catch the flow.The SFB algorithm, on the other hand, has the propertythat the state scales with the number of non-responsiveflows. To ensure detection of the non-responsive flow inthe above situation, a static10×3 SFBqueue which keepsstate on 30 bins or 3% of the total number of flows is suf-ficient. With the addition of mutating hash functions, an

even smallerSFB queue can be used.

C. FRED

Another proposal for using RED mechanisms to pro-vide fairness is Flow-RED (FRED) [22]. The idea behindFRED is to keep state based on instantaneous queue oc-cupancy of a given flow. If a flow continually occupiesa large amount of the queue’s buffer space, it is detectedand limited to a smaller amount of the buffer space. Whilethis scheme provides rough fairness in many situations,since the algorithm only keeps state for flows which havepackets queued at the bottleneck link, it requires a largebuffer space to work well. Without sufficient buffer space,it becomes difficult for FRED to detect non-responsiveflows, as they may not have enough packets continuallyqueued to trigger the detection mechanism. In addition,non-responsive flows are immediately re-classified as be-ing responsive as soon as they clear their packets from thecongested queue. For small queue sizes, it is quite easy toconstruct a transmission pattern which exploits this prop-erty of FRED in order to circumvent its protection mech-anisms. Note thatSFB does not directly rely on queueoccupancy statistics, but rather long-term packet loss andlink utilization behaviors. Because of this,SFB is bet-ter suited for protectingTCPflows against non-responsiveflows using a minimal amount of buffer space. Finally, aswith the packet loss log approach, FRED also has a prob-lem when dealing with a large number of non-responsiveflows. In this situation, the ability to distinguish theseflows from normalTCP flows deteriorates considerablysince the queue occupancy statistics used in the algorithmbecome polluted. By not using packet loss as a meansfor identifying non-responsive flows, FRED cannot makethe distinction betweenN TCP flows multiplexed across

Page 17: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

17

a link versusN non-responsive flows multiplexed acrossa link.

D. RED with per-flow Queueing

A RED-based, per-active flow approach has been pro-posed for providing fairness between flows [32]. Theidea behind this approach is to do per-flow accountingand queueing only for flows which are active. The ap-proach argues that, since keeping a large number of statesis feasible, per-flow queueing and accounting is possibleeven in the core of the network. The drawbacks of thisapproach is that it provides no savings in the amount ofstate required. IfN flows are active,O(N) states mustbe kept to isolate the flows from each other. In addition,this approach does not address the large amount of legacyhardware which exists in the network. For such hardware,it may be infeasible to provide per-flow queueing and ac-counting. BecauseSFB provides considerable savings inthe amount of state and buffers required, it is a viable al-ternative for providing fairness efficiently.

E. Stochastic Fair Queueing

Stochastic Fair Queueing (SFQ) is similar to anSFB

queue with only one level of bins. The biggest differ-ence is that instead of having separate queues,SFB usesthe hash function for accounting purposes. Thus,SFB hastwo fundamental advantages overSFQ. The first is that itcan make better use of its buffers.SFB gets some statis-tical multiplexing of buffer space as it is possible for thealgorithm to overbook buffer space to individual bins inorder to keep the buffer space fully-utilized. As describedin Section IV-B, partitioning the available buffer spaceadversely impacts the packet loss rates and the fairnessamongstTCP flows. The other key advantage is thatSFB

is a FIFO queueing discipline. As a result, it is possibleto change the hash function on the fly without having toworry about packet re-ordering caused by mapping flowsinto a different set of bins. Without additional tagging andbook-keeping, applying the moving hash functions toSFQ

can cause significant packet re-ordering.

VI. CONCLUSION AND FUTURE WORK

We have demonstrated the inherent weakness of currentactive queue management algorithms which use queueoccupancy in their algorithms. In order to address thisproblem, we have designed and evaluated a fundamen-tally different queue management algorithm called BLUE.BLUE uses the packet loss and link utilization history ofthe congested queue, instead of queue lengths to managecongestion. In addition to BLUE, we have proposed andevaluatedSFB, a novel algorithm for scalably and accu-

rately enforcing fairness amongst flows in a large aggre-gate. UsingSFB, non-responsive flows can be identifiedand rate-limited using a very small amount of state.

REFERENCES

[1] S. Athuraliya, S. Low, V. Li, and Q. Yin. REM Active QueueManagement.IEEE Network Magazine, 15(3), May 2001.

[2] B. Bloom. Space/time Trade-offs in Hash Coding with AllowableErrors.Communications of the ACM, 13(7), July 1970.

[3] R. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. Es-trin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L. Peterson,K. Ramakrishnan, S. Shenker, J. Wroclawski, and L. Zhang. Rec-ommendations on Queue Management and Congestion Avoidancein the Internet.RFC 2309, April 1998.

[4] K. Cho. A Framework for Alternate Queueing: Towards TrafficManagement by PC-UNIX Based Routers.USENIX 1998 AnnualTechnical Conference, June 1998.

[5] I. Cidon, R. Guerin, and A. Khamisy. Protective Buffer Manage-ment Policies.IEEE/ACM Transactions on Networking, 2(3), June1994.

[6] S. Doran. RED Experience and Differentiated Queueing. InNANOG Meeting, June 1998.

[7] K. Fall and S. Floyd. Router Mechanisms to Support End-to-EndCongestion Control. ftp://ftp.ee.lbl.gov/papers/collapse.ps, Febru-ary 1997.

[8] W. Feng, D. Kandlur, D. Saha, and K. G. Shin. Techniques forEliminating Packet Loss in Congested TCP/IP Networks. InUMCSE-TR-349-97, October 1997.

[9] W. Feng, D. Kandlur, D. Saha, and K. G. Shin. A Self-ConfiguringRED Gateway. InProc. IEEE INFOCOM, pages 1320–1328,March 1999.

[10] W. Feng, D. Kandlur, D. Saha, and K. G. Shin. Blue: A New Classof Active Queue Management Algorithms. InUM CSE-TR-387-99, April 1999.

[11] W. Feng, D. Kandlur, D. Saha, and K. G. Shin. Stochastic FairBlue: A Queue Management Algorithm for Enforcing Fairness.In Proc. IEEE INFOCOM, pages 1520–1529, April 2001.

[12] S. Floyd. TCP and Explicit Congestion Notification.ComputerCommunication Review, 24(5):10–23, October 1994.

[13] S. Floyd and V. Jacobson. On Traffic Phase Effects in Packet-Switched Gateways.Internetworking: Research and Experience,3(3):115–156, September 1992.

[14] S. Floyd and V. Jacobson. Random Early Detection Gateways forCongestion Avoidance.ACM/IEEE Transactions on Networking,1(4):397–413, August 1993.

[15] S. Floyd and V. Jacobson. Link-sharing and Resource Manage-ment Models for Packet Networks.IEEE/ACM Transactions onNetworking, 3(4), August 1995.

[16] R. Guerin, S. Kamat, V. Peris, and R. Rajan. Scalable QoS Provi-sion Through Buffer Management. InProceedings of ACM SIG-COMM, September 1998.

[17] C. Hollot, V. Misra, D. Towsley, and W. Gong. On designing im-proved controllers for aqm routers supporting tcp flows. InPro-ceedings of IEEE INFOCOM, Apr. 2001.

[18] IEEE 802.11 Working Group. IEEE 802.11 Standard, June 1997.[19] V. Jacobson. Congestion Avoidance and Control. InProceedings

of ACM SIGCOMM, pages 314–329, August 1988.[20] S. Kunniyur and R. Srikant. Analysis and Design of an Adaptive

Virtual Queue (AVQ) Algorithm for Active Queue Management.In Proceedings of ACM SIGCOMM, Aug. 2001.

[21] W. Leland, M. Taqqu, W. Willinger, and D. Wilson. On the Self-Similar Nature of Ethernet Traffic (Extended Version).IEEE/ACMTransactions on Networking, 2(1), February 1994.

[22] D. Lin and R. Morris. Dynamics of Random Early Detection. InProc. of ACM SIGCOMM, September 1997.

Page 18: thefengs.comCreated Date: 1/9/2004 5:58:00 PM

18

[23] S. McCanne and S. Floyd. http://www-nrg.ee.lbl.gov/ns/. ns-LBNL Network Simulator, 1996.

[24] P. McKenney. Stochastic Fairness Queueing. InProc. IEEE IN-FOCOM, March 1990.

[25] R. Morris. TCP Behavior with Many Flows. InProc. IEEE Inter-national Conference on Network Protocols, October 1997.

[26] Netperf. The Public Netperf Homepage: http://www.netperf.org/.The Public Netperf Homepage, 1998.

[27] T. Ott, T. Lakshman, and L. Gong. SRED: Stabilized RED. InProceedings of IEEE INFOCOM, Mar. 1999.

[28] V. Paxson. End-to-End Internet Packet Dynamics. InProc. ofACM SIGCOMM, September 1997.

[29] V. Paxson and S. Floyd. Wide-Area Traffic: The Failure of PoissonModeling. InProc. of ACM SIGCOMM, pages 257–268, August1994.

[30] K. K. Ramakrishan and R. Jain. A Binary Feedback Scheme forCongestion Avoidance in Computer Networks.ACM Transactionon Computer Systems, 8(2):158–181, May 1990.Review: Com-puting Reviews, December 1990.

[31] K. Ramakrishnan and S. Floyd. A Proposal to Add Explicit Con-gestion Notification (ECN) to IP.RFC 2481, January 1999.

[32] B. Suter, T. V. Lakshman, D. Stiliadis, and A. Choudhury. DesignConsiderations for Supporting TCP with Per-flow Queueing.Proc.IEEE INFOCOM, March 1998.

[33] V. K. Garg and K. Smolik and J. E. Wilkes. Applications OfCDMA In Wireless/Personal Communications. Prentice Hall Pro-fessional Technical Reference, October 1996.

[34] C. Villamizar and C. Song. High Performance TCP in ANSNET.Computer Communication Review, 24(5):45–60, October 1994.

Wu-chang Feng is currently an Assistant Pro-fessor at the Oregon Graduate Institute (OGI)at the Oregon Health and Science University(OHSU) where he is currently a member ofthe Systems Software Laboratory. Prior tojoining OGI/OHSU, Wu-chang served as aSenior Architect at Proxinet/Pumatech, Inc.Wu-chang received his B.S. degree in Com-puter Engineering from Penn State Universityand his M.S.E. and Ph.D. degrees in Computer

Science Engineering from the University of Michigan.

Kang G. Shin is the Kevin and NancyO’Connor Chair Professor of Computer Sci-ence, and the Founding Director of the Real-Time Computing Laboratory, Department ofElectrical Engineering and Computer Sci-ence, The University of Michigan, Ann Ar-bor, Michigan. His current research focuseson QoS-sensitive networking and computingas well as on embedded real-time OS, middle-ware and applications, all with emphasis on

timeliness and dependability.He has supervised the completion of 42 PhD theses, and au-

thored/coauthored over 600 technical papers and numerous book chap-ters in the areas of distributed real-time computing and control, com-puter networking, fault-tolerant computing, and intelligent manufactur-ing. He has co-authored (jointly with C. M. Krishna) a textbook “Real-Time Systems,” McGraw Hill, 1997. He received the Outstanding IEEETransactions on Automatic Control Paper Award in 1987, Research Ex-cellence Award in 1989, Outstanding Achievement Award in 1999, Ser-vice Excellence Award in 2000, and Distinguished Faculty Achievement

Award in 2001 from The University of Michigan. He also coauthoredpapers with his students which received the Best Student Paper Awardsfrom the 1996 IEEE Real-Time Technology and Application Sympo-sium, and the 2000 UNSENIX Technical Conference.

Dilip D. Kandlur heads the Networking Soft-ware and Services department at the IBM T.J. Watson Research Center. Since joining theIBM T. J. Watson Research Center, his re-search work has covered various aspects ofproviding quality of service in hosts and net-works and their application to multimedia sys-tems, network and server performance, webcaching, etc. In particular, he has workedon protocols and architectures for providing

quality of service in IP and ATM networks, their impact on transportprotocols, and their realization in protocol stacks for large servers andswitch/routers. He has been awarded an Outstanding Technical Achieve-ment Award for his work in creating the QoS architecture for IBM serverplatforms.

Dr. Kandlur received the M.S.E. and Ph.D. degrees in ComputerScience and Engineering from the University of Michigan, Ann Arbor.He is a member of the IEEE Computer Society and currently vice-chairof the IEEE Technical Committee on Computer Communications. Dr.Kandlur holds 10 U.S. patents and has been recognized as an IBM Mas-ter Inventor.

Deganjan Sahamanages the advanced devel-opment group at Tellium, Inc. Previous to histenure at Tellium, he spent several years atIBM Research and Lucent Bell Labs wherehe designed and developed protocols for IProuters and Internet servers. He is activelyinvolved with various standards bodies, mostnotably IETF and OIF. He also serves as edi-tors of international journals and magazines,technical committee members of workshops

and conferences. Dr. Saha is a notable author of numerous technicalarticles on various topics of networking and is a frequent speaker at aca-demic and industrial events. He holds a B.Tech degree from IIT, India,and MS and Ph.D. degrees from the University of Maryland at CollegePark, all in Computer Science.