Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... ·...

38
Flash Crowd Mitigation via Adaptive Admission Control Based on Application-Level Observations XUAN CHEN and JOHN HEIDEMANN University of Southern California We design an adaptive admission control mechanism, network early warning system (NEWS), to pro- tect servers and networks from flash crowds and maintain high performance for end-users. NEWS detects flash crowds from performance degradation in responses and mitigates flash crowds by ad- mitting incoming requests adaptively. We evaluate NEWS performance with both simulations and testbed experiments. We first investigate a network-limited scenarion in simulations. We find that NEWS detects flash crowds within 20 seconds. By discarding 32% of incoming requests, NEWS protects the target server and networks from overloading, reducing the response packet drop rate from 25% to 2%. For admitted requests, NEWS increases their response rate by two times. This performance is similar to the best static rate limiter deployed in the same scenario. We also inves- tigate the impact of detection intervals on NEWS performance, showing it affects both detection delay and false alarm rate. We further consider a server memory-limited scenario in testbed ex- periments, confirming that NEWS is also effective in this case. We also examine the runtime cost of NEWS traffic monitoring in practice and find that it consumes little CPU time and relatively small memory. Finally, we show NEWS effectively protects bystander traffic from flash crowds. Categories and Subject Descriptors: C.2.3 [Computer-Communication Networks]: Network Operations; C.4 [Computer Systems Organization]: Performance of Systems General Terms: System design, Performance and reliability Additional Key Words and Phrases: Flash crowds, admission control, simulations, experimentation with testbeds 1. INTRODUCTION Recent studies [Jung et al. 2002; Mahajan et al. 2001; Park and Lee 2001; Jamjoom and Shin 2003] show that the Internet is vulnerable to persistent overloading caused by flash crowds [Jung et al. 2002; Jamjoom and Shin 2003]. This material is based on work supported by DARPA via the Space and Naval Warfare Systems Center San Diego under Contract No. N66001-00-C-8066 (“SAMAN”), and by the CONSER project supported by the National Science Foundation. Author’s address: S. Chen, Information Sciences Institute, University of Southern California, Suite 1001, 4676 Admiralty Way, Marina del Rey, CA 90292; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]. C 2005 ACM 1533-5399/05/0800-0532 $5.00 ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005, Pages 532–569.

Transcript of Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... ·...

Page 1: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via AdaptiveAdmission Control Based onApplication-Level Observations

XUAN CHEN and JOHN HEIDEMANNUniversity of Southern California

We design an adaptive admission control mechanism, network early warning system (NEWS), to pro-tect servers and networks from flash crowds and maintain high performance for end-users. NEWSdetects flash crowds from performance degradation in responses and mitigates flash crowds by ad-mitting incoming requests adaptively. We evaluate NEWS performance with both simulations andtestbed experiments. We first investigate a network-limited scenarion in simulations. We find thatNEWS detects flash crowds within 20 seconds. By discarding 32% of incoming requests, NEWSprotects the target server and networks from overloading, reducing the response packet drop ratefrom 25% to 2%. For admitted requests, NEWS increases their response rate by two times. Thisperformance is similar to the best static rate limiter deployed in the same scenario. We also inves-tigate the impact of detection intervals on NEWS performance, showing it affects both detectiondelay and false alarm rate. We further consider a server memory-limited scenario in testbed ex-periments, confirming that NEWS is also effective in this case. We also examine the runtime costof NEWS traffic monitoring in practice and find that it consumes little CPU time and relativelysmall memory. Finally, we show NEWS effectively protects bystander traffic from flash crowds.

Categories and Subject Descriptors: C.2.3 [Computer-Communication Networks]: NetworkOperations; C.4 [Computer Systems Organization]: Performance of Systems

General Terms: System design, Performance and reliability

Additional Key Words and Phrases: Flash crowds, admission control, simulations, experimentationwith testbeds

1. INTRODUCTION

Recent studies [Jung et al. 2002; Mahajan et al. 2001; Park and Lee 2001;Jamjoom and Shin 2003] show that the Internet is vulnerable to persistentoverloading caused by flash crowds [Jung et al. 2002; Jamjoom and Shin 2003].

This material is based on work supported by DARPA via the Space and Naval Warfare SystemsCenter San Diego under Contract No. N66001-00-C-8066 (“SAMAN”), and by the CONSER projectsupported by the National Science Foundation.Author’s address: S. Chen, Information Sciences Institute, University of Southern California, Suite1001, 4676 Admiralty Way, Marina del Rey, CA 90292; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for profit or direct commercialadvantage and that copies show this notice on the first page or initial screen of a display alongwith the full citation. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,to redistribute to lists, or to use any component of this work in other works requires prior specificpermission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]© 2005 ACM 1533-5399/05/0800-0532 $5.00

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005, Pages 532–569.

Page 2: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 533

Fig. 1. Comparison of TCP congestion control and NEWS.

Flash crowds happen when many end-users simultaneously send requests toone Web site usually because of special events attracting interest of the masspopulation. These events could be pre-scheduled such as a webcast of a pop-ular movie or unpredictable including natural disasters such as earthquakes,breaking news stories and links from popular Web sites (that is, the “slash-doteffect” [Adler 1999]).

Flash crowd traffic persistently overloads server or networks. When a flashcrowd happens, the volume of requests to the target Web server increases dra-matically. The magnitude may be tens or hundreds of times more than normalconditions. These requests may overload network connections [Mahajan et al.2001; Jung et al. 2002] and the target server. In the meanwhile, response traf-fic could also congest networks (often in the first few links where traffic con-centration is the largest). As a result, most users perceive unacceptably poorperformance. In addition, flash crowds unintentionally deny services to otherend-users who either share common network links with flash crowd traffic orretrieve unrelated information from the target server. We call the correspondingtraffic bystander traffic.

1.1 Aggregate-Level Control Between Requests and Responses

The Internet applies TCP congestion control to cope with resource con-straints [Floyd and Fall 1999; Jacobson 1988]. As shown in Figure 1(a), TCPadjusts its window size according to acknowledgments received and infers net-work congestion from packet loss. However, TCP is not able to relieve the persis-tent overloading during flash crowds because it only regulates per-connectionbehavior. Other congestion control algorithms that aggregate information per-host (for example, the congestion manager [Balakrishnan et al. 1999]) also failto solve this problem because connections arrive from many hosts during flashcrowds.

In this work, we propose network early warning system (NEWS), arouter-based system, to impose aggregate-level control between requests and

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 3: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

534 • X. Chen and J. Heidemann

Fig. 2. NEWS control diagrams.

responses. As shown in Figure 1(b), NEWS observes Web response performance(that is, flash crowd detection) and adjusts the incoming request rate to the tar-get server accordingly.

Figure 2(a) shows the logical relationship between NEWS and TCP. The In-ternet has many individual TCP congestion control loops to enforce the rightbehavior at the connection level. However, they only operate on their own anddo not cooperate with each other. It is the lack of cooperation that makes flashcrowds possible. On the other hand, NEWS enforces cooperation among indi-vidual TCPs. As a result, we have a global control of TCP connections duringflash crowds that will not allow excessive connections to overwhelm serversand networks. Figure 2(b) depicts the control logic of NEWS with a traditionalcontrol diagram.

1.2 Contributions

NEWS is a router-based adaptive admission control mechanism to imposeaggregate-level control between requests and responses. It has two novelaspects.

First, NEWS does not consider per-flow service requirements for incomingrequests like most admission control schemes [Breslau et al. 2000; Cetinkayaand Knightly 2000; Mundur et al. 1999; Rahin and Kara 1998]. Instead, itdetermines end-user perceived performance through measurement. Moreover,NEWS only measures performance for aggregates (details in Section 4.1),rather than for all existing flows like measurement-based admission control(MBAC) [Breslau et al. 2000; Cetinkaya and Knightly 2000].

NEWS is also different from schemes that monitor increases in requests di-rectly [Blazek et al. 2001; Jung et al. 2002]: it detects flash crowds from perfor-mance degradation in Web response. As discussed in Section 3.2, observation inresponse traffic captures overloading both at the target server and in networks.On the other hand, an increase in request rate alone does not necessarily imply

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 4: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 535

low performance for end-users. Therefore, we believe that we should detect flashcrowds by monitoring response performance.

Second, NEWS monitors changes in the performance of high-bandwidth re-sponses because of their sensitivity to overloading conditions. Based on thisobservation, NEWS adjusts the admitted request rate automatically and adap-tively. These two approaches make NEWS a self-tuning system, adaptive inboth server- and network-limited scenarios (as we will show in Section 7.2, 8.2,and 8.3).

In the case of multiple servers connecting through the NEWS router, wepropose the hot-spot identification algorithm (Section 4.4) to help NEWS reg-ulate incoming requests intelligently, that is, automatically identifying onehot server even if many other servers are operational behind the NEWSrouter.

1.3 NEWS Performance Evaluation

We first evaluate NEWS performance through simulations in Section 7. We findthat NEWS detects flash crowds within 20 seconds. By discarding about 32% ofincoming requests, NEWS protects servers and networks from overloading: re-ducing the response packet drop rate from 25% to 2%. NEWS also increases theresponse rate for admitted requests by two times. This performance is similarto the best possible static rate limiter deployed in the same scenario.

We further implement NEWS on a Linux-based router (Section 6). Withtestbed experiments (Section 8), we validate NEWS performance in a network-limited scenario quantitatively in a more realistic experimental model thansimulations.

We also configure a server-limited scenario where request processing ex-hausts the server’s memory. We show that NEWS is an adaptive system thateffectively prevents server and network overloading in both cases. More specifi-cally, NEWS detects flash in crowds in around one detection interval (88 secondswith 64 seconds’ detection interval). It automatically regulates incoming re-quests to a proper rate. As a result, NEWS protects target server and networksfrom overloading by discarding about 49% of excessive requests. NEWS showsa similar performance improvement for accepted end-users as in the simulationstudy.

We also find through both simulations and testbed experiments that NEWSeffectively protects traffic to nearby, bystander servers. For example, it reducesmedian end-to-end latency for bystander Web traffic by about 10 times (Sec-tion 8.6).

We investigate the overhead of NEWS detection algorithm in Section 8.5.1.Our analysis shows that NEWS has similar algorithmic performance as a state-ful fire-wall. We further examine NEWS’ runtime overhead with testbed exper-iments, measuring router CPU and memory usage during a flash crowd. Ourstatistics (Section 8.5) show that NEWS consumes less than 5% of CPU timeand 3–10MB memory. Overall, NEWS imposes small overhead on routers and isapplicable to real networks. We also investigate system performance of a targetserver under flash crowds in detail.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 5: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

536 • X. Chen and J. Heidemann

We study the effect of detection intervals on NEWS performance in bothsimulations and testbed experiments. Our results show that NEWS triggersfalse alarms with small intervals like 15 or 30 seconds. On the other hand, largerdetection intervals increase the detection delay of NEWS. We investigate thistrade-off (between detection delay and false alarm rate) in Section 7.4 and 8.4.

2. RELATED WORK

In this section, we briefly describe mechanisms to accommodate flash crowdsthrough resource provisioning. We also review other commonly used schemesto protect servers and networks from overloading such as admission control,congestion control, and overloading control.

2.1 Web Caching and Content Delivery Networks

Infrastructure vendors such as Akamai deploy Web caches and content deliverynetworks (CDN) to protect servers from overloading during flash crowds. Theseservices need infrastructure support and are usually expensive. Further, recentstudies [Arlitt and Jin 2000; Jung et al. 2002] show that the current Web cachingscheme is not efficient because dynamic content such as pages for breaking newsare not cached before flash crowds. Jung et al. [2002] proposed a new scheme—adaptive Web caching—for improvement. On the other hand, we propose a low-cost alternative by regulating incoming traffic adaptively.

Ideally, we should provide enough resources to prevent networks and serversfrom overloading. For example, we can replicate the target server to distributeits load. This approach needs infrastructure support. Further, in reality, thereare circumstances where it is either difficult or impossible to estimate andprovide enough required resources. So, we propose NEWS to regulate excessiverequests based on currently available resources.

2.2 Admission Control

Admission control plays an important role in supporting applications with ser-vice requirements such as real-time constraint (for example, delay and jitter).Looking at public telephone networks [Gibbens et al. 1995] and integrated ser-vice networks [Clark et al. 1992], a call (or a new connection) explicitly describesits service requirement such as two channels for telephone conversation or1Mbps bandwidth for Video-on-Demand service [Mundur et al. 1999]. Basedon this service profile and currently available resources (circuits or networkbandwidth), admission control makes a decision as to whether it should acceptthe incoming request or not.

Although appropriate for telephone and integrated service networks, it maybe difficult for most Internet applications (e.g., Web and FTP) to accurately esti-mate and describe their service requirements. So, we believe that we need to de-termine application requirements dynamically through measurement. NEWSis such an example.

To avoid underutilization by estimating resource consumption with a sta-tistical model, measurement-based admission control (MBAC) [Breslau et al.2000; Cetinkaya and Knightly 2000; Jamin et al. 1995] determines the currently

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 6: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 537

available resources through traffic measurement. However, MBAC still needsthe service profile of incoming requests. Further, MBAC aggregates the per-formance of all existing flows to evaluate current resource consumption, whileNEWS only measures the transmission rate of high-bandwidth responses. Also,MBAC is more conservative and only accepts incoming requests upon sufficientresource; NEWS sends all requests through unless a flash crowd is detected.

In order to protect networks and servers from overloading, we could also ap-ply a static rate limiter to regulate the incoming request rate below a certainthreshold. It rejects excessive requests to ensure that networks and servers al-ways work under capacity. Despite its simplicity, a static rate limiter lacks adap-tivity to different environments. In order to work properly, network operatorsneed to choose the rate limit carefully based on current configurations and theirexperience [Barford and Plonka 2001]. Usually, this choice is specific to a certainserver or network connection and needs manual adjustment when the server’scapacity or the connection’s bandwidth changes. Contrary to this, we proposeself-tuning traffic control algorithms, NEWS, which adapt to both network- andserver-limited scenarios (as shown in Section 8.2 and 8.3).

Another similar work in this area is the network weather service(NWS) [Wolski et al. 1999]. It monitors and forecasts system performance suchas link utilization and server load. Both NWS and NEWS apply change de-tection algorithms. Unlike NWS, NEWS does not rely on centralized data pro-cessing, and hence is more robust and adaptive in different scenarios. We alsopresent the novel flash crowd detection algorithm of NEWS in Section 4.

2.3 Congestion Control at Different Levels

TCP and its variations apply end-to-end congestion control, which is fundamen-tal to the stability of today’s Internet. However, since it operates at flow-level,TCP is not sufficient to regulate aggregate behavior during flash crowds.

Congestion manager (CM) [Balakrishnan et al. 1999] is a per-host-basedcontrol algorithm, multiplexing concurrent flows among different applicationsof one end-host to ensure that they react to congestion cooperatively. Differentfrom CM, NEWS does not regulate the behavior of individual hosts. As wewill show in Section 3, per-host-based information does not help to solve thepersistent overloading during flash crowds.

Researchers have also proposed to control Internet traffic at the ag-gregate level. For example, the aggregate-based congestion control (ACC)[Mahajan et al. 2001] regulates the rate of aggregates consuming most of thenetwork bandwidth and promotes fair bandwidth allocation among differentflows traversing the same link.

In terms of flash crowd mitigation, we can apply ACC to regulate responsetraffic in flash crowds which is likely to consume high bandwidth and causelarge packet drops. However, ACC is not sufficient to mitigate flash crowds fun-damentally because it does not have the complete control loop between requestsand responses that NEWS applies. As we will show in Section 3, the aggregate-level control that NEWS imposes is important to protect server and networksfrom flash crowds.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 7: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

538 • X. Chen and J. Heidemann

2.4 Adaptive Queue Management Algorithms

Routers apply different queuing algorithms to support QoS for end-users andachieve fair bandwidth allocation for different flows. These algorithms differ inthe flow states they keep. On the one hand, RED [Floyd and Jacobson 1993]do not keep per-flow states. On the other hand, Fair Queuing [Demers et al.1989] maintains states for each flow and achieves fair bandwidth allocation forindividual flows. CSFQ [Stoica et al. 1998] improves fair queuing by eliminatingper-flow states from core routers. Instead, it keeps flow rate in packet headers.

There are some proposals between these two extremes which keep partialflow states [Lin and Morris 1997; Mahajan and Floyd 2001; Pan et al. 2000].For example, FRED [Lin and Morris 1997] keeps states for flows that cur-rently have packets in buffer. FRED determines the dropping probability forone flow according to the number of its packets being queued. RED with pref-erential dropping (RED-PD) [Mahajan and Floyd 2001] only keeps states forhigh-bandwidth flows and drops their packets preferentially.

Unlike these adaptive queuing management algorithms, the focus of thiswork is on flash crowd detection and mitigation. More specifically, NEWS cap-tures abrupt changes in traffic observation (Section 4) and regulates aggregatebehavior (e.g., rate-limit flash crowd traffic).

Persistent dropping (PD) [Jamjoom and Shin 2003] is proposed as an aug-ment for adaptive queuing algorithms to mitigate flash crowd traffic. It applies afine-grain connection-level control to persistently discard retransmitted pack-ets. PD can also be integrated into our NEWS framework as an alternativereaction scheme. On the other hand, it does not address important issues suchas flash crowd detection and bystander traffic protection as NEWS does.

2.5 Server Overloading Control

Recent studies [Cetinkaya and Knightly 2000; Chen and Mohapatra 2002; Luet al. 2001; Welsh and Culler 2003] propose to apply overloading control andservice differentiation on Web servers to improve Web performance under heavyloading and flash crowds. These schemes are complimentary to router-basedalgorithm like NEWS. Further, we believe that NEWS is also applicable toWeb servers. NEWS is more flexible than server-based overloading control,effectively protecting bystander traffic from flash crowds (Section 7.5 and 8.6).

3. FLASH CROWDS AND EARLY WARNING

Flash crowd traffic shows different patterns than normal traffic [Jung et al.2002]. In this section, we show some of its characteristics by examining twoserver HTTP logs. One was collected when a flash crowd happened to a privateserver (slash-dot effect [Adler 1999]), the other is from 1998’s World Cup Website (www.france98.com). To better describe the characteristics of flash crowdtraffic, we first define some terminology.

As depicted in Figure 3, a Web connection from client C to server S containsone (with HTTP/1.1 [Gettys et al. 1999]) or a series (with HTTP/1.0 [Berners-Lee et al. 1996]) of request and response exchanges (that is, request and corre-sponding response flows). We define the lifetime of this Web connection (Tw) as

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 8: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 539

Fig. 3. Request and response transmission latency, Web connection lifetime, and request interval.

the time interval from a client sending the first request packet untill it receivesall response packets.

We quantify the performance of Web request and response flows with thefollowing two metrics. Transmission latency, T ( f ), records the time interval(e.g., in seconds) from the first packet sent until the last packet is received.Transmission rate, R( f ), measures the flow data rate (in bytes per second,e.g.). Analytically, we have R( f ) = L/T ( f ), where L is the amount of datatransferred by flow f (in bytes, for example). In this work, we measure R( f ) atthe access router of the target server (as discussed in Section 3.3). Flows showvarious transmission latencies and rates due to the server load and congestioncondition in the networks.

From the servers’ point of view, we define request interval (Tr(S, C)) as thetime between two adjacent requests that server S receives from client C. Wedenote Tr(S) as the interval of request from all clients to server S. Alternatively,we define Rp(S) as the request rate (in number of requests per second) observedby server S. Rp(S) records the number of requests sent to S within one timeunit.

3.1 Observations of Flash Crowd Traffic

We present two observations based on the statistics of two Web server log files.First, requests show very small interarrival time: the mean request interval isaround 1 second in the slash-dot trace and less than 10ms in the World Cuptrace. Sometimes, the target server even recorded the same arrival times forsome requests due to its time granularity. So, the request rate shows a spikewhen a flash crowd happens.

Second, we find that most connections are short in the two log files we stud-ied. That is, they retrieve small pages.1 In both logs, most (90% for the slash-dotlog and 50% for the World Cup log) requests are for pages less than 1K bytes.In the World Cup HTTP log, more than 90% of requests are for pages smallerthan 10Kbytes. And, they carry about 50% of the network load. So, there is a

1The actual response size varies in different scenarios according to the content hosted on targetservers.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 9: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

540 • X. Chen and J. Heidemann

Fig. 4. Server and network overloading during a flash crowd.

concentration on small pages in the log files we studied. Based on these obser-vations, we propose a simple two-layered flash crowd traffic model to test ourdesign in the early stages.

The characteristics of flash crowd traffic impose challenges for detection. Inthis study, we focus on detecting flash crowds caused by unpredictable events.This is the most challenging case because of uncertainties in three aspectsincluding happening time, user interest (i.e., the magnitude of the request rateincrease), and response size. In this study, we propose an adaptive algorithmto detect flash crowds without knowledge of this information.

3.2 Reduced Web Performance During Flash Crowds

End-users may experience increased Web latency during flash crowds. As shownin Figure 4, several factors contribute to this increase. First, even though eachrequest may contain just one small packet, too many of them can still causecongestion in networks. So, requests may be delayed or even dropped by net-works. Second, the target server only accepts part of incoming requests, due toits CPU or memory constraint, and simply discards others. The target server isoverloaded during flash crowds. It processes requests and generates responsesslowly. Finally, responses contain a number of larger packets and inject moreload than requests do. They are more likely to congest networks and increasetransmission latency. When a response packet is dropped due to congestion, thetarget server needs to retransmit the lost packet even it is already overloaded.

Traditional approaches detect flash crowds from increases in request rate.However, as we will show, this increase does not necessarily correlate with over-loading conditions during a flash crowd. Other factors such as system capacityand response size are also important.

In different circumstances, flash crowds may cause overloading at the targetserver or in networks. However, from an end-user point of view, they alwaysperceive an increased Web latency or a decreased Web response rate. Therefore,we design NEWS to detect flash crowds from degradation in the Web responserate. As we will show in Section 8, this approach helps NEWS to adapt to bothnetwork- and server-limited scenarios automatically.

3.3 Overall Design of the Network Early Warning System

As shown in Figure 5, NEWS has three main components: the flash crowddetector, the request regulator, and the control logic. We design NEWS in amodular style so that we have the flexibility to apply new techniques withoutmodifying its framework. For example, we could adopt Web caching techniquesin the module of request regulator. We present the detailed design of flash crowddetection and regulation in Section 4 and 5.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 10: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 541

Fig. 5. Overall NEWS architecture (requests and responses physically traverse the same networkconnection).

NEWS imposes a global control loop over many individual TCPs (as shownin Figure 2(a). Unlike TCP’s small time scale (i.e., RTT), this global controlloop operates at a larger time scale such as 1 minute (discussed in more de-tail in Section 7.4 and 8.4). As a result, NEWS can apply more sophisticatedtechniques while only imposing small runtime CPU and memory overhead torouters (shown in Section 8.5). For example, we can apply complicated changedetection algorithms or implement fine-grained request regulator.

We had three assumptions when designing NEWS. First, we should be ableto deploy NEWS reasonably close to the target server. For example, we installNEWS on the access router of the target server or the server farm it operates.

Second, we assumed that requests and responses traverse the same accessrouter. As depicted in Figure 5, when the access router is close to the targetserver, it is reasonable to treat incoming traffic as request and outgoing trafficas response. So, the access router does not need to decode application-level in-formation for NEWS, for example, examining packet contents for HTTP headerinformation. This assumption well holds for the case of the server farm. Butsometimes the network behind the NEWS router might also provide Internetservices for end-users. In this case, both incoming and outgoing traffic are mixedwith Web requests and responses. So, the assumption above is questionable.Therefore, it is important to deploy NEWS near the target server.

In this work, we focus on a scenario that an enterprise network or a serverfarm is connected to the Internet with a single network link. A potentail re-search direction is to investigate the deployment of NEWS to a multihomednetwork where requests and responses of the same Web connection may tra-verse different routers.

4. DETECTING FLASH CROWDS

As shown in Figure 5, the flash crowd detector sets an alarm signal after itdetects flash crowds. In this design, we consider the following three issues.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 11: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

542 • X. Chen and J. Heidemann

(1) What to monitor and how? As described previously, NEWS detects flashcrowds by discovering a decrease in response rate. Since overloading eitherat the target server or in networks could cause low response rate, NEWSadapts to server- or network-limited scenarios [Eggert and Heidemann1999] easily. We discuss our approach to monitor response performancein Section 4.1

(2) How to detect changes? The change detection algorithm [Basseville andNikiforov 1993] is well studied in many fields such as signal processing andpattern recognition. Basically, it is the scheme that determines whether achange has occurred in the characteristics of a considered object. In thiswork, we tend to use a simple change detection algorithm to avoid com-putation complexity. On the other hand, we augment our algorithm withobservations in network traffic to increase detection accuracy. We presentour change detection algorithm in Section 4.2

(3) When to set and reset the alarm signal? To address this question, we needto consider two trade-offs. When setting the alarm signal, we intend toachieve reliable detection (i.e., low false alarm rate) at a cost of a relativelylong detection delay. When resetting the alarm signal, we try to avoid fluctu-ations in output at the risk of penalizing more incoming requests. We havenoticed in our earlier study that frequent variation in request regulationaffects end-user performance and leads to resource underutilization.

4.1 High-Bandwidth Connections

Flash crowd traffic usually originates from hundreds or thousands of clients.Some clients may not have visited the target server before. We call them coldclients. Formally, a client C is cold with respect to server S if Tr(C, S) > Tr0(Tr0 is a constant, e.g., Tr0 = 24 hours). The existence of a cold client impliesthat we can not detect flash crowds by monitoring perceived performance forparticular clients because they may not even attempt to access the target serverbefore the flash crowd.

Also, as we have shown in Section 3.1, Web connections during flash crowdscould be short (a few packets, e.g.). Therefore, we cannot monitor the perfor-mance of particular connections because they may disappear from our trafficobservation.

Further, monitoring the mean rate of all responses observed is not helpful be-cause flows react to congestion differently with fast connections noticing conges-tion quickly, while low-speed flows (such as to users connected through modems)showing very little change. Thus, congestion results in very little change in av-erage response rate because of these inherently low-speed flows.

We propose a novel flash crowd detection algorithm by monitoring changes inthe response performance of fast connections, that is, high-bandwidth connec-tions (HBC). These connections are most sensitive to congestion. We call theresponse flows of HBCs high-bandwidth response flows (HBFs). These flowsform an aggregate φH BF . We denote the number of flows in an aggregate as|φ|. For example, |φ| = 10% × |�|, where � is a special aggregate containingall flows. We can formally define � as � = φ(∗, ∗) which implies that ∀φ ⊆ �.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 12: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 543

Fig. 6. CDF of flows’ response rate in flash crowd and background traffic.

To quantify the performance of an aggregate φ, we define its transmission rate(AR(φ)) as:

AR(φ) =∑

f ∈φ R( f )

|φ| .

We investigate the sensitivity of HBF to network congestion through sim-ulations (detailed simulation methodology is in Section 7.1). We measure theresponse rate (R( f )) of each individual flow before and during flash crowdsand compare their cumulative distribution functions (CDFs) in Figure 6. If wechoose 10% flows with the highest rate as HBFs, their transmission rate de-creases about 78% during flash crowds.

On the other hand, if we average the response rate over all responses, themean transmission rate only decreases by 52%. Even worse, the mean rate forthe 10% slowest responses only reduced by about 40% during the flash crowd.Therefore, we confirm that HBFs are most sensitive to overloading conditions.

4.2 Change Detection Algorithm

We use a simple comparison-based scheme to detect changes in the responserate. Mathematically, the detector detects flash crowds if Condition 1 (listed inthe following) holds. After it detects flash crowds, the NEWS detector watchesthe increase in the AR for HBFs which indicates performance recovery. Thedetector resets the alarm signal when Condition 3 holds.

AR < AR × (1 − δ) (1)Rp > Rp × (1 + δ) (2)

AR > AR × (1 + δ) (3)0 < δ < 1

From the traffic control perspective, both Condition 1 and 3 choose the oper-ation point for NEWS, that is, the long-term average of the transmission rate

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 13: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

544 • X. Chen and J. Heidemann

for HBFs. The goal of NEWS is to regulate the interaction of the underlyingsystem (including server, networks, and clients) so that its behavior (measuredin AR for HBFs) is maintained within a reasonable range (represented by δ) ofthis operation point. In the above conditions, δ reflects the system’s toleranceto changes. For example, with δ = 10%, the algorithm detects an increase whenthe current measurement is 110% larger than average.

We use Condition 2 to reduce potential false detections. Specifically, sinceclients may be cold and connections may be short (details in Section 3.2), thereis the chance that only low-bandwidth hosts are active and responses only showa low rate. In that case, AR computed among these low-bandwidth flows willcause NEWS to trigger a false alarm.

To solve this potential problem, the flash crowd detector also checks theaggregate request rate Rp when the AR of HBFs decreases. We define aggregaterequest rate (Rp) as the rate of all requests passing through a router toward thetarget server. With this additional check, the detector triggers an alarm signalonly when both AR decreases (Condition 1) and Rp increases (Condition 2). Inthis way, we are more confident that the decrease in AR is due to flash crowdsrather than low-bandwidth connections.

We calculate these long-term averages (AR and Rp) with a high-low filter(HLF). Very similar to the flip-flop filter [Kim and Noble 2001], HLF is essen-tially a combination of two exponentially weighted moving average (EWMA)filters with adjustable gains to fulfill different requirements. We present anEWMA filter mathematically as: V (t) = αV (t − 1) + (1 − α)O(t). O(t) is thecurrent measurement (AR or Rp), V (t) is the long-term average calculated attime t (AR or Rp), and α is the gain of the filter. A large α gives a stable output,while a small gain makes output sensitive to the current observation.

HLF uses a low gain (α = 0.125) under common situations without an alarmsignal for fast response to changes. When the alarm is set, we switch to a highgain (α = 0.875) to keep output stable and avoid oscillations.

4.3 NEWS Flash Crowd Detection Algorithm

The flash crowd detector measures the transmission rate of response flows withthe time-sliding window (TSW) algorithm [Clark and Fang 1998] to smooth theburntness of TCP traffic. We apply the configuration of the TSW rate estimatorrecommended in Fang et al. [2000]. The detector also measures the aggregaterequest rate Rp by counting the number of incoming requests within one timeunit.

Every T seconds, the detector computes AR for HBFs. Since the number offlows observed at different times could vary dramatically, we choose the top ppercent of responses with the highest rate as HBFs: |φHBF| = p×|�|, where p isa tunable parameter. We choose p = 10% based on the observation in Figure 6.In our current implementation, we keep the transmission rate for all flows.Potentially, we can apply other schemes [Estan and Varghese 2002] to reducethis overhead.

The detector calculates the long-term average of the transmission rate forHBFs (AR) and aggregate request rate (Rp). Finally, the detector compares

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 14: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 545

Fig. 7. Flow chart of the flash crowd detector.

AR with AR and Rp with Rp. It sets or resets the alarm signal according tothe conditions in Section 4.2. We show the complete procedure of flash crowddetection in Figure 7.

4.3.1 Simple Analysis of the NEWS Detection Interval. The detection in-terval T is a tunable parameter. As we will show in Section 7.4 and 8.4, NEWSwith a smaller T detects traffic changes promptly but may trigger false alarms.On the other hand, a large T generates stable detection output but causeslonger response time. In this section, we present a simple analysis to facilitatethe choice of a proper detection interval. More specifically, we focus on the effectof the detection interval on false alarms.

According to Condition 1, NEWS detects the flash crowd when Rc < (1 −δ) × R0, where Rc is the current measurement on the transission rate of HBFs.We also notice that the observed aggregate rate changes constantly due to un-derlying TCP window adjustment and network conditions. When this variation(more precisely, drop in response rate) is greater than the detector’s tolerance δ,NEWS triggers a false alarm. Since there’s no congestion before a flash crowdhappens, we mainly focus on the effect of the TCP window adjustment whichleads to dynamics in TCP flow rate, and hence the response rate observed. Weare particularly interested in the slow start phase because it has a big im-pact on flow rate (double window size every RTT), and most Web responses areshort [Guo and Matta 2001; Chen and Heidemann 2003].

Assume a response has a length of S Kbytes. If S is small, the lifetime ofthis response is L = log (S + 1) × RTT seconds. Since most responses are lessthan 15Kbytes (according to Chen and Heidemann [2003]), we can estimate Las 4 × RTT. So we need to measure response rate in an interval of at least Lseconds in order to correctly capture the flow rate of this response. An interval

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 15: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

546 • X. Chen and J. Heidemann

less than L seconds may lead to underestimation of the flow rate and couldcause NEWS to trigger false alarms. Therefore, we suggest choosing a NEWSdetection interval of at least 4 × RTT seconds.

4.4 Discovering Target Server

After NEWS detects a flash crowd, it starts to identify the hot-spot, that is, thetarget server of flash crowd traffic. In many cases, servers operate in a serverfarm and connect through one access router. If we deploy NEWS on that accessrouter, it must distinguish between flash crowd traffic going to a hot-spot andlow-rate, bystander traffic going to other servers. In this section, we propose asimple hash-based algorithm to discover the target server.

NEWS keeps a table of size N (hot-spot list). N is chosen based on the numberof hot-spots required to identify the target server. Each entry in the hot-spot listrecords one server address observed in the following procedure. It maintainsa counter to record the number of corresponding address being accessed (hit-number) within a certain time interval. Periodically, NEWS timeouts inactiveentries in the list.

After NEWS detects a flash crowd, it randomly samples incoming packetsand hashes their destination addresses into the hot-spot list with a functionh(x) = x mod N . If the table entry has the same address or is not being used,NEWS increases the corresponding hit-number by 1. When the entry has beenoccupied by a different address, NEWS takes the next available entry in thehit-list. If the table is full, NEWS replaces the entry having the smallest hit-number and the longest inactive time with the new address observed. NEWSpicks the address with the highest hit-number as the hot-spot.

Since we solely consider request destination addresses, HSI only classifiesincoming traffic at the connection level. It cannot differentiate traffic to differentports on the same server (i.e., session-level classification). A potential futuredirection is to also consider destination port numbers, that is, to locate thetarget service at the target server.

5. MITIGATING FLASH CROWDS

NEWS mitigates flash crowds by discarding excessive requests. In this section,we present a detailed design of request regulator and NEWS controller.

When NEWS detects a flash crowd, it differentiates requests to the targetserver from bystander traffic using the HSI algorithm. NEWS discards exces-sive requests during flash crowds with an adaptive rate limiter. As we will showin Section 7.2 and 8.2, dropped requests are retransmitted due to applicationor underlying TCP.

5.1 Regulating Requests

A request regulator should ensure that the admitted request rate converges toa reasonable value (denoted as Rpc). Ideally, when requests arrive at rate Rpc,they fully utilize the server and network resource. In the meanwhile, neitherthe target server nor the network is overloaded.

We propose a token-bucket based [Parekh and Gallagher 1993; Shenker andWroclawski 1997] adaptive rate limiter to regulate requests. A token bucket

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 16: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 547

Fig. 8. NEWS control logic: scoreboard-based rate limit adjustment.

has two parameters: bucket size provides accommodation to bursty traffic andtoken rate limits long-term arrivals. In the long run, connections admitted bya token bucket converge to the token rate. Different from other simple tokenbucket algorithms, we design NEWS control logic to adaptively adjust the to-ken rate according to the current observation on response performance andrequest rate. We present the detailed design of NEWS control logic in the nextsection.

In an our early design, we tried the approach of discarding excessive requestspreferentially [Mahajan and Floyd 2001]. However, the resulting system is notstable: we observe oscillations in the admitted request rate, network load, andend-user performance. This is because probabilistic dropping only guaranteesexpected behavior but may not lead to smooth output [Mahajan et al. 2001].Further, it is also difficult to determine dropping probability for requests basedon measurement on response. This relationship may not be linear and is alsoaffected by variance in response sizes. Therefore, we propose to approximatetheir relationship with NEWS control logic.

5.2 Controlling the Adaptive Rate Limiter

We depict the function of NEWS control logic in Figure 8. Given the alarmsignal, it adjusts the rate limit of the request regulator so that it adapts todifferent scenarios automatically. NEWS control logic maintains two states:current and previous alarm signals. It adjusts the rate limit based on the rulesdescribed in the following.

When an alarm signal is set (transition from 0 to 1), NEWS control logicresets the rate limit to the current admitted request rate Rp0 observed by thedetector. Intuitively, requests arriving with rates higher than Rp0 are likely tooverload the server or networks and, therefore, cause a decrease in responseperformance. When an alarm signal changes back to 0 (transitions from 0 to 0or from 1 to 0), NEWS control logic keeps the same rate limit.

If an alarm signal remains set (transition from 1 to 1), NEWS control logicadjusts the rate limit with a scoreboard (as shown in Figure 8). More specifically,it assigns scores to adjustments of increasing and decreasing the rate limit. Itchooses the direction with the higher score. For example, if current scores forincreasing and decreasing the rate limit are 5 and 3, respectively, NEWS control

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 17: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

548 • X. Chen and J. Heidemann

Fig. 9. NEWS implementation.

logic chooses to increase the rate limit. If an alarm resets (transition from 1 to0) in the next period, a corresponding adjustment (that is, increasing rate limit)gets credit (now it is 6); otherwise it gets a penalty (reduced to 4). With thisscheme, NEWS learns to make decisions automatically from history. It alsomakes NEWS adapt to different environments without human interference.

6. IMPLEMENTING NEWS

In order to investigate the applicability of NEWS to a real network environment,we prototype NEWS on a Linux-based router. We present NEWS implementa-tion details in the following. We report our findings in testbed experiments inSection 8.

We implement NEWS under the framework of iptables/netfilter[Netfilter/iptables 2003]. Iptables/netfilter is the firewall subsystem forLinux kernel 2.4 and above. It provides various facilities such as state-ful or stateless packet filtering, network address translation (NAT), andpacket mangling. Netfilter and most functions of iptables (such as packetmatching) are implemented in kernel. They process and manipulate pack-ets according to different rules that users configure through iptables userinterface.

As shown in Figure 9, we implement NEWS as three kernel modules: (1)connection state-keeping module, (2) flash-crowd detection and control logic,and (3) adaptive token bucket filter (ATBF). By dividing NEWS functions intodifferent modules, we separate fast per-packet processing such as connectionstate keeping and packet matching and filtering from slow connection-baseddetection and control functions. We discuss the implementation of each modulewith details in the following.

6.1 Connection State Keeping

We build the connection state-keeping module based on the connection-trackingfunction in netfilter which keeps one record for each connection. Netfilter usestwo specific terms to distinguish two directions of one connection, that is, origi-nal and reply. For example, a request in a Web connection is in original direction,and response is in reply direction. In our current implementation, NEWS keepstrack of all connections. We measure memory consumption on the NEWS routerand investigate options to reduce memory usage in Section 8.5.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 18: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 549

We add a new state for each connection: response rate, that is, the data trans-mission rate (measured in bits per second) on reply direction. For Web traffic,this state records the actual data rate end-users perceive. We compute the con-nection response rate using the time-sliding window (TSW) algorithm [Clarkand Fang 1998; Fang et al. 2000]. We intentionally avoid the first few controlpackets such as SYN and SYN-ACK to keep the measured response rate stable.

6.2 Flash Crowd Detection and Control Logic

Applying the algorithm described in Section 4, NEWS detects flash crowd peri-odically with an interval of T seconds. Based on an alarm signal, NEWS controllogic regulates incoming requests adaptively (Section5).

The detection interval T is a tunable parameter. Since NEWS keeps trafficmeasurement for the last T seconds, NEWS with a smaller detection interval ismore sensitive to traffic change and detects changes more promptly. However,the result is likely to oscillate, that is, have a high false alarm rate. Conversly,NEWS gives more stable output with a larger detection interval at the expenseof a longer detection delay.

A smaller detection interval also consumes more CPU time. But, larger Tneeds more memory to keep connection states. We investigate the effect of dif-ferent detection intervals in Sections 8.4 and 8.5. Based on our experience, wechoose T as 64 seconds for our experiments.

6.3 Adaptive Token Bucket Filter

We use an adaptive token bucket filter (ATBF) to regulate incoming requests ac-cording to Section 5. We implement ATBF as an extension to iptables matchingfunction. That is, any new connections that matche with ATBF are dropped.More specifically, each new connection adds some token into the bucket. Itpasses through ATBF when there are enough tokens. Otherwise, ATBF dis-cards connections toward the target server by matching with the hot-spot list(Section 4.4). ATBF counts the number of accepted connections and reports itto the flash crowd detector as the current measurement of the request rate.

One observation in our experiments is that connections have very small inter-arrival times, for example, a few milliseconds. To accurately keep track of tokennumber, we have to measure time at fine granularity, micro-second. Since wecan not afford the computational overhead of floating point number division, wescale the number of tokens both in bucket and for acceptance of one connectionby 1,000,000 times.

7. ALGORITHM EVALUATION THROUGH SIMULATIONS

We first evaluate the performance of NEWS algorithm through simulationsusing the network simulator (ns-2.26) [VINT 1997]. For fast algorithm designand evaluation, we prototype NEWS under the framework of the DiffServ modelcontributed by the Advanced IP Networks group at Nortel Networks [Piedaet al. 2000]. We further report our testbed experiments in the next section.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 19: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

550 • X. Chen and J. Heidemann

Fig. 10. The two-level dumbbell topology in simulations.

7.1 Methodology

Figure 10 shows the network topology in our simulations. Router R0 connectsa server pool with 5 Web servers (S1–S5). S1 is the target server. Router R2 con-nects 50 access routers (C1–C50). Router R1 connects R0 and R2, representingthe intermediate network connection between servers and clients.

Each access router (C1–C50) provides network connections for 20 clients. So,there are 1,000 clients in total. To reflect the variance of link properties in realnetworks, we determine link bandwidths and propagation delays for second-tier links (that is, between access router and end hosts) by RAMP [Lan andHeidemann 2002]. RAMP takes traffic measurement at USC/ISI and generatesdistributions of link properties such as bandwidth and delay. In this topology,the range of bandwidths and delays for second-tier links are 21K–10Mbps and0.5ms–1.8 seconds, respectively. Although we cannot claim that this topologyis representative, it does give us a scenario with intermediate queuing on thesecond-tier links and a mix of various link bandwidths and delays.

We study two types of traffic: Web traffic and bystander traffic (i.e., other traf-fic sharing common links with flash crowds, details in Section 7.5). We generateWeb traffic (including background Web traffic and flash crowd traffic) based onreal HTTP logs on the 1998 World Cup Web site.2 There were 4 servers deployedfor this Web site. Since they have shown similar patterns in request access [Ar-litt and Jin 2000], we only consider those requests toward the server at SantaClara, California. These HTTP logs recorded all requests sent between April30, 1998, and July 26, 1998 [Labs 1992]. Each log entry keeps the following in-formation: time when the server received a request, source and destination ofa request, and size of the Web page requested. Our Web traffic model generatesa Web request for one log entry.

Arlitt and Jin [2000] analyzed workload characteristics based on these HTTPlogs. They found that there was a large increase (5–10 times) in request ratesbefore each game. This increase caused a flash crowd. In our simulations, we

2We have also simulated flash crowd traffic with a simple two-level model in the early stage of thiswork.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 20: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 551

Fig. 11. Request rates to the World Cup’98 Web site (used in simulations).

consider the game between Brazil and Scotland on April 30. We show changesin the corresponding Web request rate in Figure 11. We observe normal trafficto the Web server before 1,000 seconds. Flash crowd happens at around 1,000seconds with more than a 6 times’ increase in request rates.

We deploy NEWS on router R0. NEWS monitors the Web response rate (fromR0 to R1) and regulates incoming requests (from R1 to R0) when it detects aflash crowd. We initialize NEWS by setting the bucket size and token rate ofadaptive rate limiter to large values. For example, we set the token rate as1,000 connections per second. It is about three times larger than the maximumrequest rate observed in simulations (about 350 connections per second). So,NEWS accepts all incoming requests under normal conditions. In most simu-lations that follow, we configure the detection interval of NEWS as 60 seconds.We present our study on different detection intervals in Section 7.4.

We simulate scenarios with and without NEWS deployed. Each simulationruns for 4,000 seconds. We record offered and admitted request rates to thetarget server and the transmission rate of HBFs. In the following sections, weevaluate NEWS performance from both the target server and end-user per-spectives. We also investigate the sensitivity of our simulation results by con-sidering effects such as impatient end-users and server processing delays inSection 7.6.

7.2 Protecting Networks from Overloading

One goal in deploying NEWS is to protect the target server and networks fromoverloading. In this section, we simulate a network-limited scenario where flashcrowd traffic overloads networks with a large amount of response traffic. SinceNEWS detects flash crowds from observations of response traffic, we believethe following findings are also valid in server-limited scenarios. We verify thisclaim in Section 7.6. In order to consider the effect of server CPU and memoryconstraint on request processing, we further configure a server-limited scenarioin our testbed experiments (Section 8.3).

We first measure the admitted request rate to the target server with andwithout NEWS deployed (shown in Figure 12(a)). We observe that NEWS

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 21: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

552 • X. Chen and J. Heidemann

Fig. 12. Request rates to the target Web server (observed in simulations).

Fig. 13. NEWS maintains high transmission rates for HBFs during flash crowds (observed insimulations).

detects the flash crowd at about 1,020 seconds. That is, the detection latency is20 seconds when the detection interval is set as 60 seconds.

After detecting the flash crowd, NEWS adjusts its rate limit automaticallyand regulates incoming requests to about 107 connections per second. As aresult, the admitted request rate drops by 32%. This greatly reduces congestionin response traffic by reducing the packet drop rate from about 25% to 2%.

Due to the bandwidth limitation in networks, NEWS discards excessive re-quests. These requests may be retransmitted due to TCP, application, and clientreaction behavior [Jamjoom and Shin 2003]. As a result, we observe an increasein the offered request rate as shown in Figure 12(b). Although NEWS can’t serveall requests promptly, it does protect servers and networks from overloading.In the meanwhile, it gradually serves incoming requests.

7.3 Maintaining High Response Rate for Admitted Requests

Another important aspect of this evaluation is to study the effect of NEWSon end-user Web performance. As shown in Figure 13, HBFs suffer a 50%

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 22: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 553

Table I. Performance of NEWS and Different Static Rate Limiters in Simulations

Admitted Request TransmissionRequest Rate Rejection Rate of Response

Scenarios (number/s) Percentage HBFs (Kbps) Loss RateOriginal traffic 209.7 0% 58.2 25%NEWS 107.2 32.4% 90.5 2%Static rate 60 70.2% 101.36 0limiters 80 56.5% 100.2 0

100 40.2% 93.95 1%120 32.8% 90.8 2%

Fig. 14. Transmission rate of HBFs with NEWS and static rate limiters in simulations.

performance degradation during flash crowds. Their transmission rate dropsfrom 96Kbps down to 58Kbps. By deploying NEWS, admitted end-users con-tinuously receive high performance (90.5Kbps) during flash crowds. Therefore,we conclude that NEWS protects admitted requests from flash crowds.

NEWS requires sophisticated techniques to achieve this performance im-provement. One could argue that a simple static rate limiter is also able to givecomparable performance with careful configuration. Ideally, we want NEWSto perform similarly to the best possible rate limiter in the same scenario. Weinvestigate this issue in our simulations.

We deploy a static rate limiter on router R0 to control incoming requests.As depicted in Table I and Figure 14(a), a static rate limiter shows differ-ent performance levels with different rate limits. In this particular scenario,we get the highest performance when we set the rate limit to 60 requestsper second. We also anticipate higher end-user performance with an evenstricter limit. However, these limits are so strict that more than two-thirdsof incoming requests are discarded which may underutilize the underlyingsystem.

We compare the performance of the static rate limiter with NEWS in Table I.We find that NEWS has similar performance to the best rate limiter: NEWSshows only about 11% less than the best static rate limiter in the transmissionrate of HBFs. This result is encouraging because it verifies that the adaptiverate limiter in NEWS approaches the best request rate but without manual

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 23: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

554 • X. Chen and J. Heidemann

Fig. 15. The performance of NEWS under different detection intervals (observed in simulations).

adjustment. We believe that the overall performance difference between thesetwo schemes reflects the cost for NEWS to adaptively discover the right ratelimit after a flash crowd is detected (from 1,000 to 1,500 seconds, as shownin Figure 14(b)). On the other hand, NEWS only discards half of its incomingrequests compared to the best rate limiter. We highlight the transmission rateof HBFs with NEWS and static rate limiters in Figure 14(a). Given similarperformance, we believe it is an advantage to deploy NEWS because it alleviatesthe workload for manual operations by detecting and mitigating flash crowdsadaptively.

7.4 Effect of Detection Interval

NEWS periodically checks changes in the transmission rate of HBFs with aninterval of T . As we explain in Section 4, this parameter reflects the trade-offbetween fast detection and a low false alarm rate. In this section, we investigateNEWS performance with different detection intervals (see Figure 15).

From simulation results, we find that NEWS shows the best performance interms of the admitted request rate and the transmission rate of HBFs whenwe set the detection interval as 60 seconds. We also notice that small timeintervals (like 10, 15, and 20 seconds) give inconsistent performance for HBFs.This result indicates that it is hard to configure high-level controls like NEWS

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 24: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 555

at small timescales. Recall from our analysis in Section 4.3.1 that is due to theunderestimation of TCP flow rate.

It is not unexpected that NEWS detects flash crowds quickly under verysmall intervals such as 15 seconds and 30 seconds. In fact, with these smalltime intervals, NEWS triggers a false alarm before flash crowds really happen.As a result, the adaptive rate limiter starts to regulate incoming requests. Onthe other hand, NEWS shows no false alarm with detection intervals larger than90 seconds. But, the detection delay is also large. Based on these observations,we find that the detection interval of 60 seconds is feasible in the scenario westudied.

One possible way to reduce the NEWS false alarm rate is to separate thedetection interval from the time period in which NEWS measures the perfor-mance for HBFs. We call this time period measurement window size. For ex-ample, NEWS measures the transmission rate for HBFs observed in the past90 seconds but detects performance degradation with small intervals such as30 seconds. We need to investigate its applicability to flash crowd detection inour future work.

7.5 Bystander Traffic Protection

In real networks, there always exist other traffic (that is, bystanders) which wehave not considered in our simulations so far, for example, Web or FTP trafficto other servers connected through NEWS router. We investigate the effect offlash crowds and NEWS on bystander traffic in this section. This study hastwo goals. First, we verify NEWS performance with the existence of bystandertraffic. More specifically, we investigate both flash crowd detection and responseperformance for admitted requests. Second, we are also interested in NEWSprotection of bystander traffic. Since NEWS is an adaptive system, we expectNEWS improves the performance for both admitted requests and bystanders.We now examine these two aspects.

We study NEWS protection for both long-lived (such as FTP) and interactive(for example, Web) bystander traffic. We report our findings in the first casethat follows. In Section 8.6, we further investigate both cases with testbedexperiments.

In our simulation, node S5 sends FTP traffic (long-lived) to C50 (refer toFigure 10). As shown in Figure 16(a), NEWS maintains high performance foradmitted requests after detecting the flash crowd. The transmission rate forHBFs is 90.2Kbps before the flash crowd and reaches 86.4Kbps with NEWSdeployed. Therefore, with the existence of bystander traffic, NEWS still showsconsistent performance as in our previous study.

We further show the goodput of FTP bystander traffic without and withNEWS deployed in Figure 16(b). We find that the mean goodput is about337Kbps before the flash crowd, then it drops to only about 10Kbps when flashcrowd happens. With NEWS protection, the goodput of FTP traffic jumps backto about 120Kbps. So, NEWS improves goodput of FTP bystander traffic bymore than 10 times. This improvement is because NEWS relieves congestion innetworks by discarding excessive requests. The traffic regulation benefits both

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 25: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

556 • X. Chen and J. Heidemann

Fig. 16. NEWS bystander traffic protection (in simulations).

Web traffic and bystanders. So, we conclude that NEWS is also able to protectbystander traffic from flash crowds.

7.6 Result Sensitivity to Other Simulation Parameters

The simulation results presented are from a much simpler scenario than reality.To investigate the sensitivity of our previous results, we relax two aspects of oursimulation scenario by considering impatient end-users and server processingdelay. While we can not claim to simulate the Internet [Floyd and Paxson 2001],this sensitivity study, together with our investigation on the effect of bystandertraffic in Section 7.5, helps us to better understand the dynamics of NEWS. Wealso gain the confidence to deploy NEWS in real networks (we will report ourfindings in testbed experiments in the next section).

7.6.1 Impatient End-Users. In real the world, end-users may terminatetheir Web requests after a long waiting time. We investigate this effect witha simple model. Our model cancels Web requests that are not served aftera certain time period. It also assumes that an end-user’s waiting time hasan exponential distribution, and end-users do not resume their requests aftercancellation.

In our simulation, we choose the average waiting time as 60 seconds. Aftertimeout, end-users may decide if they want to continue to wait or cancel theirrequest with equal probability. We repeat simulations with 60-second detectionintervals. Our results show that NEWS still archives similar performance withthe existence of impatient end-users. We believe that this result is reasonablebecause excessive requests (and their responses) are delayed and even droppedduring flash crowds. Impatient end-users are unlikely to have a large impacton this situation. On the other hand, Jamjoom and Shin [2003] have shownthat retransmission due to persistent end-users (that is, very patient users)have a large effect on flash crowd traffic. This study also demonstrates that theexistence of impatient users does not have much effect on the performance ofNEWS.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 26: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 557

7.6.2 Server Processing Delays. We focus on the network perspective inour simulation so far. In reality, Web servers impose processing delays to Webrequests. To investigate this effect, we add a simple Web server model to oursimulation. We assume that a Web server has a constant processing rate (forexample, 1000KB/s in our study) and a buffer to hold incoming requests theserver applies first come, first serve (FCFS) scheduling policy and always pro-cesses the first request in its buffer. The server drops incoming requests whenits buffer is full.

We first investigate a scenario where the target server has infinite bufferspace. Simulation results show that NEWS gives similar performance (admit-ted request rate and transmission rate of HBFs) as in our previous study. Thisresult is not unexpected because our Web model only adds delays between re-quests and corresponding responses. It does not relieve the overloading condi-tion in networks. The response traffic still congests network connections.

To get more realistic results, we configure the server with a limited bufferspace of 1,000 requests. We find that incoming requests quickly fill the serverbuffer, and more than 50% of requests are dropped. This effect is very similarto the simple static request rate limiter described in Section 7.3. Therefore,neither server CPU nor network links are overloaded.

Based on these results, we conclude that our Web server model in simulationdoes not have a fundamental effect on the performance of NEWS. This is becauseour model focuses on adding server queuing and processing delays to incomingrequests, limiting the server overloading condition where the CPU or memoryis overwhelmed by incoming requests. To complete our study on the serveroverloading condition, we configure a server-limited scenario in our testbedexperiments in Section 8.3.

8. SYSTEM EVALUATION THROUGH TESTBED EXPERIMENTS

In Section 7, we have evaluated the performance of NEWS in simulations andshown that it protects the target server and networks from overloading andmaintains high performance for admitted requests. In this section, we evaluateNEWS implementation with testbed experiments. We quantitatively validateour previous simulation studies in a network-limited scenario. More impor-tantly, testbed experiments allow us to investigate server performance androuter overhead (such as CPU and memory usage) which are not modeled innetwork simulations.

We further evaluate the performance of NEWS in a server-limited scenario,confirming that NEWS is an adaptive system, capable of efficiently preventingoverloading on the server or in networks. By examining its runtime overheadon the router, we also show that NEWS is a relatively light-weighted scheme.Finally, we evaluate the effectiveness of the hot-spot identification algorithmthrough experiments with bystander traffic.

8.1 Experiment Setup

Figure 17 shows our testbed environment. All machines have 100Mbps fastEthernet network interfaces. The client pool has 7 machines with various

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 27: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

558 • X. Chen and J. Heidemann

Fig. 17. Network topology in the testbed experiment.

configurations ranging from 677MHz Pentium III CPU with 128MB RAMto 2GHz Pentium 4 CPU with 512MB RAM. We use 6 machines (C1–C6) for flash crowd traffic generation and one (B1) for bystander trafficexperiment.

Our server pool has 3 machines. Fast target server (S1) and bystander server(S3) have 2GHz Pentium 4 CPU and 512MB RAM. Slow target server (S2) has180MHz Pentium Pro CPU and 128MB RAM. We connect client and serverpools together through two routers R0 and R1. We use Linux Redhat 9 (kernel2.4.20-8) on all machines, including two routers.

The NEWS router (R0) is a PC with 1.5GHz AMD Athlon 4 processor and1GB RAM. We deploy NEWS on R0 by enabling iptables service with NEWSextension. In most experiments, we configure the NEWS detection interval as64 seconds (see Section 7.4). We also investigate the effect of different detectionintervals in Sections 8.4 and 8.5 .

We use NIST Net [NIST 1998] to introduce network dynamics into ourexperiments. NIST Net allows a Linux-based router to emulate a wide va-riety of network conditions such as link bandwidth, propagation delay, andpacket loss. We configure link bandwidth between R0 and R1 as 2Mbps. Wealso set different propagation delays between R1 and client machines ac-cording to network measurement at USC/ISI [Lan and Heidemann 2002].While we cannot claim that our testbed simulates the Internet [Floydand Paxson 2001], this experimental study does help us to better under-stand the dynamics of NEWS in a relatively more real environment thansimulations.

Our Web servers use TUX [Lever et al. 2000] to expedite request progress-ing. Running partially from within Linux kernel, TUX has demonstrated out-standing performance for serving static contents. We configure Web serverswith large SYN buffer size and HTTP backlog to ensure that they are fed withenough workload.

We implement a traffic generator to playback the Web server log (discussedin Section 7.1) in our experiments. It sends Web requests to the server ac-cording to information in log entries such as time, server address, and requestsize. To simulate many concurrent connections, our traffic generation tool cre-ates threads to accomplish request-response exchanges independently. Due toa memory constraint on one single client machine, we distribute traffic gener-ation across six client machines (C1–C6).

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 28: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 559

Fig. 18. Request rate to the World Cup’98 Web site (used in experiments).

Fig. 19. Server and end-user performance during flash crowds (observed in experiments).

In our experiments, we use HTTP logs as described in Section 7.1 but choose adifferent game. More specifically, we playback 2,000 seconds of log from onesemifinal between Brazil and Holland on July 7. As shown in Figure 18, therequest rate increased by about 5 times in just several minutes after the gamestarted at 1,000 seconds.

We monitor the Web request rate both at the NEWS router (R0) and thetarget Web server. To quantify end-user perceived performance, we record thetransmission rate of HBFs on router R0. We also keep track of system andnetwork statistics on each machine such as CPU and memory usage, networkutilization, and packet drop rate. We are particularly interested in changes inthese statistics when flash crowds happen.

8.2 Relieving Network Congestion

We have two different experiment configurations, namely network- and server-limited scenarios. We have studied NEWS performance in a network-limitedscenario in simulations. In this section, we validate our simulation resultsquantitatively, confirming that NEWS effectively protects networks from

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 29: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

560 • X. Chen and J. Heidemann

Fig. 20. Request rate observed on NEWS router in experiments.

overloading and maintains high performance for accepted requests. We furtherinvestigate system performance of the target server in detail.

In this experiment, we choose fast server S1 as the target. We observe thatrequests do not overload S1 during the flash crowd; in fact, their processingonly consumes about 2–3% of CPU time and about 60% of memory on S1. How-ever, the response traffic generated by S1 congests the network link betweenrouter R0 and R1. Our measurement shows 100% link utilization and about60% packet drop rate.

As a result, S1 is kept busy with transmitting and retransmitting responsetraffic and is not able to catch up with all incoming requests. As shown inFigure 19(a), S1 can only process about 80 requests per second, 32% lessthan the average incoming request rate during flash crowds (117 requests persecond). Congestion in the response network link and slow server processing(hence, large server backlog) greatly reduce end-user perceived performance:the transmission rate of HBFs drops by half as shown in Figure 19(b). Also,about 22% of connections timeout and fail due to packet loss. Failed connec-tions resend SYN packets after timeout and thus increase the average of-fered request rate on the NEWS router from 110 to 318 requests per second(Figure 20(a)).

We enable NEWS on R0. With a 64-second detection interval, NEWS detectsflash crowd at about 1,088 seconds, that is, 88 seconds after flash crowd starts.NEWS protects the target server and networks from overloading by regulatingthe admitted request rate down to 56 requests per second (Figure 20(b)). Asless response traffic is generated, packet loss rate on link between R0 and R1falls to less than 3%.

NEWS also maintains high performance for admitted requests during flashcrowds, increasing the transmission rate of HBFs from 133Kbps to 261Kbps(Figure 19(b)). As shown in Figure 21, this performance is comparable to statictoken-bucket based request rate limiter with manually configured token rate.Therefore, our experiment confirms that NEWS protects the server and net-works from flash crowds effectively, and maintains high performance for ad-mitted requests.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 30: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 561

Fig. 21. Comparison of NEWS and static rate limiter in experiments.

As a cost for this performance, NEWS discards about 49% of excessive re-quests. As observed in Figure 20(a), the offered request rate to R0 increasesby about 44% (from 318 to 459 requests per second) due to retransmissions asdiscussed in Section 7.2.

8.3 Protecting the Server From Flash Crowds

We further evaluate NEWS performance in a server-limited scenario. This eval-uation was not possible in simulation because network-level simulators (like ns)do not include detailed models of server CPU, memory, and disk performance.

In this experiment, clients send Web requests to slow target server S2. SinceWeb requests in our experiments are only for static content, CPU usage on S2 isabout 15%. At the same time, link utilization between R0 and R1 is only 60%.We do not observe any packet loss. However, we find that S2 uses up about98% of its memory and starts swapping in order to serve the huge number ofincoming requests.

Swapping slows down S2 and builds up server backlog. Eventually, about20% of connections timeout due to long waiting time. As a result, end-userssuffer from a low Web response rate even though there is no packet drop in thenetworks.

Since NEWS detects flash crowds by monitoring response performance, itadapts to the server-limited scenario automatically, detecting flash crowds with-ing about 80 seconds. By regulating incoming requests to 54 per second (asshown in Figure 22(a)), NEWS reduces server memory consumption to about88%.

As the server stops swapping, it processes requests more promptly. There-fore, admitted requests receive much higher performance. As depicted inFigure 22(b), transmission rate of HBFs jumps from 158 Kbps to 260 Kbps.Based on these results, we conclude that NEWS is an adaptive scheme and candetect and prevent both network- and server-overloading during flash crowds.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 31: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

562 • X. Chen and J. Heidemann

Fig. 22. NEWS performance in a server-limited scenario in experiments.

Fig. 23. Different detection intervals affect NEWS performance (observed in experiments).

8.4 Effect of Different Detection Intervals

As we explained in Section 4, NEWS detects flash crowds by periodically (withan interval of T seconds) checking degradation in response performance. Thedetection interval T is an important parameter affecting the speed and accuracyof flash crowd detection and therefore the overall NEWS performance. Withtestbed experiments, we first validate our findings in the prior simulation study(Section 7.4). We further study the effect of different detection intervals onNEWS runtime overhead in the next section.

Figure 23 shows the transmission rate of HBFs and the admitted requestrate under different Ts. NEWS shows its best performance with the detectioninterval at 64 or 128 seconds.

We find that the performance of NEWS falls with either large or very smalldetection intervals. For example, with a detection interval at 256 seconds,NEWS can’t adjust the rate limit promptly as traffic changes. On the otherhand, with very small intervals like 16 seconds, NEWS is so sensitive to trafficchange that it triggers an alarm before flash crowd happens (at 760 seconds).Then, it sets the rate limit too low and about 65% of connections are discarded.These results confirm our findings in simulations.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 32: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 563

With detection intervals larger than 16 seconds, NEWS always detects flashcrowds slightly after one detection interval. Balancing detection delay and per-formance, we suggest configuring NEWS with a detection interval of 64 seconds.

In our experiments, NEWS does not trigger false alarms with detection in-tervals larger than 16 seconds. We believe this is because requests generatedfrom one client machine are highly correlated. We plan to deploy NEWS to realnetworks to further study this issue.

8.5 NEWS Overhead on Routers

In this section, we first present an analysis of the algorithmic performance of theNEWS flash crowd detection algorithm. Then, we conduct a testbed experimentto further evaluate NEWS CPU and memory consumption on a Linux-basedrouter.

8.5.1 Algorithmic Performance of NEWS Flash Crowd Detection. NEWSflash crowd detection consumes both CPU time and memory on routers. We an-alyze this overhead in the following. We further study NEWS runtime overheadwith a testbed experiment in Section 8.5.

We first examine the overhead of NEWS flow state keeping. NEWS maintainsstats for all active flows and updates flow response rates for every responsepacket. As shown in Section 6, NEWS uses the hash-based connection trackingfunction in Iptables/Netfilter [Netfilter/iptables 2003]. Therefore, the averageoverhead of updating flow rate is O(1) (O(N ) in the worst case, N is the tablesize). On the other hand, NEWS requires the allocation of memory for its hashtable. We investigate this consumption in our testbed experiment (Section 8.5).

Periodically (with an interval of T seconds), NEWS conducts a flash crowddetection procedure as described in Section 4.2. NEWS selects the 10% fastestflows and computes their transmission rate. This task can be accomplishedefficiently with a randomized selection algorithm [Cormen et al. 2001] whichhas the average performance of O(log N ) (O(N ) in the worst case, N is thenumber of flows kept). In our current implementation, NEWS first sorts flowsbased on their rates with quick-sort. Then, it picks the first 10% of entries inthe sorted list. This imposes computational overhead of O(N log N ).

Based on our analysis, NEWS imposes similar overhead to routers as a fire-wall keeping flow state. Therefore, we believe that NEWS is applicable to realnetworks to protect server and networks from flash crowds.

8.5.2 NEWS CPU and Memory Overhead. With testbed experiments, weare able to quantitatively investigate NEWS runtime overhead on routers,including CPU time and memory consumption. We report our findingshere.

Table II shows the CPU usage of NEWS under different detection intervalsand the offered request rate after flash crowds. The offered request rate beforethe flash crowd is about 40 requests per second. When flash crowd happens, theoffered request rate varies from 530 to 259 requests per second with differentNEWS detection intervals. In our experiments, we find that smaller detectionintervals consume relatively larger CPU time because of more frequent flash

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 33: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

564 • X. Chen and J. Heidemann

Table II. CPU Time Consumed by NEWS With Different Detection Intervals (Observedin Experiments)

Detection Before After Offered Request RateIntervals Flash Crowds Flash Crowds After Flash Crowds (request / s)16 s 2–3.5% 5–6% 53032 s 1.5–2% 3.5–5% 51164 s 1–2% 2–4% 459128 s and larger about 1% about 2% less than 400

crowd detections. Overall, NEWS consumes less than 5% CPU time with areasonable detection interval (for example, 64 seconds with an offered requestrate of about 459 per second).

We also measure CPU usage when running NEWS on a slow machine(266MHz Pentium II CPU with 128MB RAM). NEWS consumes about 14%of CPU time on average after flash crowds happen. Therefore, we conclude thatNEWS imposes relatively small computational overhead to routers. We furtherstudy NEWS memory consumption as follows.

NEWS needs memory to keep track of connection response rates. As shownin Section 6, we design this module based on connection-tracking support innetfilter which uses a hash table to keep states for each connection. So, NEWSmemory consumption is determined by the number of new connections (initi-ated by requests). Through our experiments, we observe that memory consump-tion remains about 3M bytes before the flash crowd happens. In this stage, theincoming request rate is about 40 per second.

When a flash crowd happens, NEWS needs more memory as the request rateincreases. Depending on the rate-limit that NEWS discovers, NEWS memoryconsumption varies from 10M to 12M bytes, corresponding to an incoming re-quest rate of 60 to 100 requests per second. In general, NEWS memory con-sumption is not sensitive to detection intervals.

It is possible to reduce memory consumption for connection tracking by ad-justing the value of connection timeout, that is, the measurement window size.The default configuration of netfilter is 24 hours. We are able to reduce memoryconsumption after flash crowds to about 6M bytes by timing out connections ev-ery 64 seconds. We can further reduce NEWS memory consumption by randomsampling of incoming connections [Estan and Varghese 2002].

In real networks, routers usually have less memory than the PCs used in ourexperiments. To have a reference on the overhead of flow state keeping on com-mercial routers, we substitute R0 with a Cisco 3620 router which is widely usedfor mid-size networks. The router we use in this experiment has 100MHz R4700CPU, 12MB processor, memory, and 8MB IO memory. We use NetFlow [NetFlow2003] to keep states of individual unidirectional IP flows. With a default flowcache size of 256K bytes and flow timeout value of 15 seconds, our Cisco routeris able to track about 4,000 active flows (or 2,000 bidirectional connections),each in a 70 byte record.

By repeating the previous experiments, we find that the average router CPUtime is about 2–3% before flash crowd, and 7–8% after. The maximum CPU timereaches 10% during flash crowds. We do not observe any memory allocation

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 34: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 565

Table III. End-to-End Latency of Bystander WebTraffic in Different Experimental Scenarios

Web Latency (seconds)

Scenarios Mean Medianbefore flash crowd 0.22 0.18during flash crowd 24.44 9.53with NEWS 7.03 3.15NEWS + HSI 0.42 0.32

failures for incoming flows. In another word, the default 256K byte flow cacheis enough to keep states for all flows during the flash crowd in our experiments.Therefore, with the traffic condition in our experiments, per-flow state keepingonly imposes small overhead even to real routers.

8.6 Bystander Traffic Protection

In our simulation study (Section 7.5), we have shown that NEWS protects by-stander FTP bulk traffic from flash crowds. We partially validate this resultwith experiments. More specifically, we configure server S3 to constantly send8K byte data chunks to client machine B1. Since NEWS measures aggregaterequest and response rates, it shows consistent performance in detecting andmitigating flash crowds.

We measure goodput received by B1 and find it drops dramatically from327Kbps to 80Kbps as flash crowd traffic builds up. With NEWS deployed,B1 gets a consistent goodput of 318Kbps. Therefore, NEWS can protect thebystander bulk data transfer from flash crowds.

We further evaluate the hot-spot identification (HSI) algorithm through anexperiment with bystander Web traffic. In this case, B1 sends Web requeststo server S3 based on HTTP log under-light load (average request rates-lessthan 20 requests per second). Since most Web connections are short, we areinterested in the end-to-end latency.

We summarize the experiment results in Table III. We observe that theflash crowd hurts bystander Web traffic greatly, increasing the median Weblatency by more than 50 times. By discarding about 22% of connections, NEWS(with original algorithms) reduces the median Web latency for admitted by-stander requests by more than 3 times. With the hot-spot identification tech-nique, NEWS does not drop any bystander connections. As a result, NEWSfurther reduces the median Web latency for bystander Web traffic by about10 times. Therefore, we conclude that the hot-spot identification algorithmis an effective technique to classify and protect bystander traffic from flashcrowds.

9. FUTURE WORK

We have shown the effectiveness of NEWS in both network- and server-memorylimited scenarios (Sections 7 and 8). As a potential direction for our future work,we can further evaluate the performance of NEWS in a scenario where server

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 35: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

566 • X. Chen and J. Heidemann

CPU is overloaded with dynamic workload, for example, clients send databasequeries via CGI interface on the target server. This experiment will further testthe adaptivity of the NEWS algorithm.

In Section 7.4 and 8.4, we demonstrated the impact of the detection in-terval on the performance of NEWS through simulations and testbed exper-iments. Modeling the overload protection of NEWS could be an interestingfuture direction for our research. With this analytic effort, we should be ableto derive bounds to guide our choice of appropriate detection intervals forNEWS.

Currently, NEWS maintains states for each individual connection. A poten-tial improvement of NEWS is to reduce this overhead and only keep track ofhigh-bandwidth connections. One possible approach is to adopt schemes pro-posed by Estan and Varghese [2002].

DDoS attack [Savage et al. 2000; Shoeten et al. 2001; Park and Lee 2001]is another class of problem threatening network robustness. We can poten-tially extend the flash crowd detection techniques to detect DDoS attackson a Web server. More specifically, we can design a system to monitor end-user perceived performance and infer ongoing DDoS attacks from performancedegradation.

10. CONCLUSION

In this work, we propose NEWS to protect servers and networks from per-sistent overloading caused by flash crowds. NEWS detects flash crowds fromperformance degradation of Web responses and imposes aggregate-level con-trol between requests and responses. We evaluate the performance of NEWSthrough both simulations and testbed experiments. Our results show thatNEWS quickly detects flash crowds. By discarding excessive requests, NEWSprotects both the target server and networks from overloading. NEWS alsomaintains high performance for end-users and protects bystander traffic fromflash crowds. We also studied the effect of different detection intervals on theperformance of NEWS.

Through this study, we demonstrate that an aggregate-level traffic controlalgorithm (like NEWS) can effectively protect infrastructure from persistentoverloading during flash crowds. We also illustrate the importance of build-ing an adaptive system. More specifically, NEWS infers persistent overloadingfrom end-user perceived performance. It detects flash crowds effectively by mea-suring changes in the aggregated response rate of HBFs. As we have shown,this choice of traffic observation metrics makes the NEWS detection algorithmadaptive to both network- and server-limited scenarios. NEWS also discoversthe right rate limit automatically.

We further studied the effect of detection intervals on the performance ofNEWS, showing the trade-off between fast detection and a low false alarmrate. Since flash crowds are largely caused by human behavior, we focused onthe accuracy of flash crowd detection at a cost of longer delays and used largedetection intervals (for example, 1 minute). Further, our guideline also suggestsusing relatively long intervals (4–6 times of RTT) for TCP rate measurement.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 36: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 567

ACKNOWLEDGMENTS

We would like to acknowledge Stephen Adler for providing the server trace ofthe slash-dot effect. We appreciate the fruitful discussion with members in theSAMAN and CONSER projects. We are graceful to Professor Edmond A. Jonck-heere for his suggestions on presenting NEWS control diagrams and ProfessorDeborah Estrin, Dr. Ted Faber, and Dr. Sally Floyd for their feedback on theearly work of NEWS.

REFERENCES

ADLER, S. 1999. The slashdot effect, an analysis of three Internet publications. Available athttp://ssadler.phy.bnl.gov/adler/SDE/SlashDotEffect.html.

ARLITT, M. AND JIN, T. 2000. A workload characterization study of the 1998 World Cup Web site.IEEE Network, Special Issue on Web Performance 14, 3 (May), 30–37.

BALAKRISHNAN, H., RAHUL, H., AND SESHAN, S. 1999. An integrated congestion management archi-tecture for internet hosts. In Proceedings of the ACM SIGCOMM. Cambridge, MA, 175–187.

BARFORD, P. AND PLONKA, D. 2001. Characteristics of network traffic flow anomalies. In Proceedingsof the ACM SIGCOMM Internet Measurement Workshop. San Francisco, CA, 2–13.

BASSEVILLE, M. AND NIKIFOROV, I. V. 1993. Detection of Abrupt Changes—Theory and Application,1st Ed. Prentice-Hall, Englewood Cliffs, NJ.

BERNERS-LEE, T., FIELDING, R., AND FRYSTYK, H. 1996. Hypertext transfer protocol—HTTP/1.0.RFC 1945. IETF. Available at ftp://ftp.isi.edu/in-notes/rfc1945.txt.

BLAZEK, R. B., KIM, H., ROZOVSKII, B., AND TARTAKOVSKY, A. 2001. A novel approach to detectionof denial-of-service attacks via adaptive sequential and batch-sequential change-point detectionmethods. In Proceedings of the IEEE Systems, Man, and Cybernetics Information AssuranceWorkshop. West Point, NY.

BRESLAU, L., KNIGHTLY, E. W., SHENKER, S., STOICA, I., AND ZHANG, H. 2000. Endpoint admissioncontrol: Architectural issues and performance. In Proceedings of the ACM SIGCOMM. Stockholm,Sweden, 57–69.

CETINKAYA, C. AND KNIGHTLY, E. 2000. Egress admission control. In Proceedings of the IEEE Info-com. Tel-Aviv, Israel, 1471–1480.

CHEN, H. AND MOHAPATRA, P. 2002. Session-based overload control in qos-aware Web servers. InProceedings of the IEEE Infocom. New York, NY.

CHEN, X. AND HEIDEMANN, J. 2003. Preferential treatment for short flows to reduce Web latency.Comput. Netw. 41, 6 (April), 779–794.

CLARK, D. AND FANG, W. 1998. Explicit allocation of best effort packet delivery service. ACM/IEEETrans. Netw. 6, 4 (Aug.), 362–373.

CLARK, D. D., SHENKER, S., AND ZHANG, L. 1992. Supporting real-time applications in an integratedservices packet network: Architecture and mechanism. In Proceedings of the ACM SIGCOMM.Baltimore, MD. 14–26. Available at citeseer.nj.nec.com/clark92supporting.html.

CORMEN, T. H., LEISERSON, C. E., RIVEST, R. L., AND STEIN, C. 2001. Introduction to Algorithms. 2ndEd. The MIT Press, Cambridge, MA.

DEMERS, A., KESHAV, S., AND SHENKER, S. 1989. Analysis and simulation of a fair-queueing algo-rithm. In Proceedings of the ACM SIGCOMM. Austin, TX. 1–12.

EGGERT, L. AND HEIDEMANN, J. 1999. Application-level differentiated services for Web servers.World-Wide Web J. 2, 3 (Aug.), 133–142.

ESTAN, C. AND VARGHESE, G. 2002. New directions in traffic measurement and accounting. InProceedings of the ACM SIGCOMM. Pittsburgh, PA. 323–336.

FANG, W., SEDDIGH, N., AND NANDY, B. 2000. A time-sliding window three colour marker(TSWTCM). RFC 2859. IETF.

FLOYD, S. AND FALL, K. 1999. Promoting the use of end-to-end congestion control in the Internet.ACM/IEEE Trans. Netw. 7, 4 (Aug.), 458–473.

FLOYD, S. AND JACOBSON, V. 1993. Random early detection gateways for congestion avoidance.ACM/IEEE Trans. Netw. 1, 4 (Aug.), 397–413.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 37: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

568 • X. Chen and J. Heidemann

FLOYD, S. AND PAXSON, V. 2001. Difficulties in simulating the Internet. ACM/IEEE Trans.Netw. 9, 4 (Aug.), 392–403.

GETTYS, J., MOGUL, J., FRYSTYK, H., MASINTER, L., LEACH, P., AND BERNERS-LEE, T. 1999. Hyper-text transfer protocol—HTTP/1.1. RFC 2616. IETF. Available at http://www.w3.org/Protocols/rfc2616/rfc2616.html.

GIBBENS, R., KELLY, F. P., AND KEY, P. 1995. A decision-theoretic approach to call admission controlin ATM networks. IEEE J. Select. Areas Comm. 13, 6 (Aug.), 30–37.

GUO, L. AND MATTA, I. 2001. The war between mice and elephants. In Proceedings of the IEEEInternational Conference on Network Protocols. Riverside, CA.

JACOBSON, V. 1988. Congestion avoidance and control. In Proceedings of the ACM SIGCOMM.Stanford, CA. 314–329.

JAMIN, S., DANZIG, P., SHENKER, S., AND ZHANG, L. 1995. Measurement-based admission controlalgorithms for controlled-load service. In Proceedings of the ACM SIGCOMM. Cambridge, MA.2–13.

JAMJOOM, H. AND SHIN, K. G. 2003. Persistent dropping: An efficient control of traffic aggregates.In Proceedings of the ACM SIGCOMM. Karlsruhe, Germany, 287–298.

JUNG, J., KRISHNAMURTHY, B., AND RABINOVICH, M. 2002. Flash crowds and denial of service attacks:Characterization and implications for CDNs and Web sites. In Proceedings of the InternationalWorld Wide Web Conference. Hawaii, 252–262.

KIM, M. AND NOBLE, B. 2001. Mobile network estimation. In Proceedings of the ACM/IEEE In-ternational Conference on Mobile Computing and Networking. Rome, Italy, 298–309.

LABS, L.-B. 1992. The Internet Traffic Archive. Available at http://ita.ee.lbl.gov/.LAN, K. AND HEIDEMANN, J. 2002. Rapid model parameterization from traffic measurement. ACM

Trans. Model. Comput. Simul. 12, 3 (July), 201–229.LEVER, C., ERIKSEN, M. A., AND MOLLOY, S. P. 2000. An analysis of the TUX Web server. Tech. rep.,

(Nov.) Center for Information Technology Integration, University of Michigan.LIN, D. AND MORRIS, R. 1997. Dynamics of random early detection. In Proceedings of the ACM

SIGCOMM. Cannes, France, 127–137.LU, C., ABDELZAHER, T. F., STANKOVIC, J. A., AND SON, S. H. 2001. A feedback control approach for

guaranteeing relative delays in Web servers. In Proceedings of the IEEE Real-Time Technologyand Applications Symposium. Taipei, Taiwan.

MAHAJAN, R., BELLOVIN, S., FLOYD, S., IOANNIDIS, J., PAXSON, V., AND SHENKER, S. 2001. Controllinghigh bandwidth aggregates in the network. Tech. rep. International Computer Science Institute(ICSI), Berkeley, CA. (July).

MAHAJAN, R. AND FLOYD, S. 2001. Controlling high bandwidth flows at the congested router. InProceedings of the IEEE International Conference on Network Protocols. Riverside, CA, 1–12.

MUNDUR, P., SIMON, R., AND SOOD, A. 1999. Integrated admission control in hierarchical video-on-demand systems. In Proceedings of the IEEE International Conference on Multimedia Computingand Systems. Florence, Italy, 220–225.

NETFILTER/IPTABLES. 2003. Netfilter/iptables project. Available at http://www.netfilter.org/.NETFLOW. 2003. NetFlow performance analysis. White paper. Cisco Systems, Inc. Available athttp://www.cisco.com/warp/public/cc/pd/iosw/prodlit/ntfo wp.htm.

NIST. 1998. NIST net network emulator. Available at http://is2.antd.nist.gov/itg/nistnet/.PAN, R., PRABHAKAR, B., AND PSOUNIS, K. 2000. CHOKE, a stateless active queue management

scheme for approximating fair bandwidth allocation. In Proceedings of the IEEE Infocom. Tel-Aviv, Israel, 942–951.

PAREKH, A. AND GALLAGHER, R. G. 1993. A generalized processor sharing approach to flow controlin integrated services networks: the single-node case. ACM/IEEE Trans. Netw. 1, 3 (June), 344–357.

PARK, K. AND LEE, H. 2001. On the effectiveness of route-based packet filtering for distributedDoS attack prevention in power-law Internets. In Proceedings of the ACM SIGCOMM. San DiegoCA, 15–26.

PIEDA, P., ETHRIDGE, J., BAINES, M., AND SHALLWANI, F. 2000. A network simulator, differentiatedservices implementation. Available at http://www7.nortel.com:8080/CTL/.

RAHIN, M. A. AND KARA, M. 1998. Call admission control algorithms in ATM networks: A perfor-mance comparison and research directions. Research rep. (Feb.) University of Leeds.

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.

Page 38: Flash Crowd Mitigation via Adaptive Admission Control ...zbo/teaching/CS522/Projects/Flash... · tively. These two approaches make NEWS a self-tuning system, adaptive in both server-

Flash Crowd Mitigation via Adaptive Admission Control • 569

SAVAGE, S., WETHERALL, D., KARLIN, A., AND ANDERSON, T. 2000. Practical network support for IPtraceback. In Proceedings of the ACM SIGCOMM. Stockholm, Sweden, 295–306.

SHENKER, S. AND WROCLAWSKI, J. 1997. General characterization parameters for integrated servicenetwork elements. RFC 2215. IETF.

SHOETEN, A. C., PARTRIDGE, C., SANCHEZ, L. A., JONES, C. E., TCHAKOUNTIO, F., KENT, S. T., AND STRAYER,W. T. 2001. Hash-based IP traceback. In Proceedings of the ACM SIGCOMM. San Diego, CA.3–14.

STOICA, I., SHENKER, S., AND ZHANG, H. 1998. Core-stateless fair queueing: Achieving approxi-mately fair bandwidth allocations in high speed networks. In Proceedings of the ACM SIGCOMM.Vancouver, British Columbia, Canada, 118–130.

VINT. 1997. UCB/LBNL/VINT network simulator—ns (version 2). Available at http://www.

isi.edu/nsnam/ns/.

WELSH, M. AND CULLER, D. 2003. Adaptive overload control for busy Internet servers. In Proceed-ings of the USENIX Symposium on Internet Technologies and Systems. Seattle, WA.

WOLSKI, R., SPRING, N. T., AND HAYES, J. 1999. The network weather service: A distributed resourceperformance forecasting service for metacomputing. J. Fut. Gener. Comput. Syst. 15, 6 (Oct.),757–768.

Received March 2003; revised June 2004; accepted July 2004

ACM Transactions on Internet Technology, Vol. 5, No. 3, August 2005.