Dissecting Bufferbloat: Measurement and Per-Application...

3
Dissecting Bufferbloat: Measurement and Per-Application Breakdown of Queueing Delay Andrea Araldo LRI-Université Paris-Sud 91405 Orsay, France [email protected] Dario Rossi Telecom ParisTech 75013 Paris, France [email protected] ABSTRACT We propose a passive methodology to estimate the queue- ing delay incurred by TCP traffic, and additionally leverage DPI classification to breakdown the delay across different applications. Ultimately, we correlate the queueing delay to the performance perceived by the users of that applica- tions, depending on their delay-sensitivity. We implement our methodology in Tstat, and make it available 1 as open source software to the community. We validate and tune the tool, and run a preliminary measurement campaign based on a real ISP traffic trace, showing interesting yet partly counter-intuitive results. Categories and Subject Descriptors C.2.5 [Computer Communication Network]: Internet; C.4 [Performance of Systems]: Measurement Techniques; C.2.3 [Network Operations]: Network Monitoring Keywords Bufferbloat; Queueing delay; Passive Measurement; TCP; Queues; Traffic classification 1. INTRODUCTION Despite the steady growth of link capacity, Internet per- formance may still be laggy as confirmed by the recent reso- nance of the “bufferbloat” buzzword [13]. Shortly, excessive buffer delays (up to few seconds) are possible in today’s In- ternet due to the combination of loss-based TCP congestion control coupled with excessive buffer sizes (e.g., in user AP and modem routers, end-host software stack and network interfaces) in front of slow access links. This can cause long packet queues in those buffers and hamper user QoE. Recent effort has focused on measuring the queueing de- lay experienced by end-users, mostly employing active tech- niques [16, 9, 15, 20, 4, 5, 1, 17], with few exceptions 1 http://perso.telecom-paristech.fr/~araldo Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CoNEXT Student Workshop’13, December 9, 2013, Santa Barbara, CA, USA. Copyright 2013 ACM 978-1-4503-2575-2/13/12 ...$15.00. http://dx.doi.org/10.1145/2537148.2537785. t tx,i+1 t rx,i A t tx,i+1 t rx,i - q i+1 =( )- min( t tx,j+1 t rx,j - ) j<i B CD Data Data Data Data Ack C D D C D D C D D Figure 1: Synopsis of our passive methodology employing passive measurements [3, 12, 7, 8], like we do. Yet, most work so far has limitedly focused on measur- ing the maximum bufferbloat, while it is unclear how often bufferbloat occurs in practice, and which applications are most affected. To the best of our knowledge, this work is the first to report a detailed per-application view of Internet queueing delay – that depends on the traffic mix and user behavior of each household. 2. QUEUEING DELAY ESTIMATION Since severe queueing is likely to occur at the bottlenecks, like broadband links [10], we focus on infering the queueing delay of the uplink of hosts (customers) as in Fig. 1 Our monitor (the eye in the figure) runs close to host A, whose queueing we want to measure, e.g., close to the DSLAM of its ISP network. Assume the local queue of A contains pack- ets directed to remote hosts B,C and D. A data segment directed to A is captured by the monitor at t rx,i (packets are timestamped at the measurement point via Endace DAG cards) and, as soon as it is received, the TCP receiver issues an acknowledgement that will be serviced after the already queued data segments. We neglect for simplicity delayed- acknowledgement timers (that are by the way small com- pared to the bufferbloat magnitude reported in [16]). The monitor captures the ack at ttx,i+1 and estimate the queue- ing delay qi+1 incurred by the (i +1)-th ack at A as the dif- ference between the current RTT sample t tx,i+1 - t rx,i and the minimum among the previously observed RTT samples 53

Transcript of Dissecting Bufferbloat: Measurement and Per-Application...

Page 1: Dissecting Bufferbloat: Measurement and Per-Application ...conferences.sigcomm.org/co-next/2013/workshops/student/program/p… · Dissecting Bufferbloat: Measurement and Per ... Permission

Dissecting Bufferbloat: Measurement and Per-ApplicationBreakdown of Queueing Delay

Andrea AraldoLRI-Université Paris-Sud

91405 Orsay, [email protected]

Dario RossiTelecom ParisTech

75013 Paris, [email protected]

ABSTRACTWe propose a passive methodology to estimate the queue-ing delay incurred by TCP traffic, and additionally leverageDPI classification to breakdown the delay across differentapplications. Ultimately, we correlate the queueing delayto the performance perceived by the users of that applica-tions, depending on their delay-sensitivity. We implementour methodology in Tstat, and make it available 1 as opensource software to the community. We validate and tune thetool, and run a preliminary measurement campaign basedon a real ISP traffic trace, showing interesting yet partlycounter-intuitive results.

Categories and Subject DescriptorsC.2.5 [Computer Communication Network]: Internet;C.4 [Performance of Systems]: Measurement Techniques;C.2.3 [Network Operations]: Network Monitoring

KeywordsBufferbloat; Queueing delay; Passive Measurement; TCP;Queues; Traffic classification

1. INTRODUCTIONDespite the steady growth of link capacity, Internet per-

formance may still be laggy as confirmed by the recent reso-nance of the “bufferbloat” buzzword [13]. Shortly, excessivebuffer delays (up to few seconds) are possible in today’s In-ternet due to the combination of loss-based TCP congestioncontrol coupled with excessive buffer sizes (e.g., in user APand modem routers, end-host software stack and networkinterfaces) in front of slow access links. This can cause longpacket queues in those buffers and hamper user QoE.Recent effort has focused on measuring the queueing de-

lay experienced by end-users, mostly employing active tech-niques [16, 9, 15, 20, 4, 5, 1, 17], with few exceptions

1http://perso.telecom-paristech.fr/~araldo

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] Student Workshop’13, December 9, 2013, Santa Barbara, CA, USA.Copyright 2013 ACM 978-1-4503-2575-2/13/12 ...$15.00.http://dx.doi.org/10.1145/2537148.2537785.

ttx,i+1

trx,i

A

ttx,i+1 trx,i -qi+1=( )- min(ttx,j+1 trx,j - )

j<i

B C D

Data

Data

Data

Data

Ack

CDD

CDD

C

D

D

Figure 1: Synopsis of our passive methodology

employing passive measurements [3, 12, 7, 8], like we do.Yet, most work so far has limitedly focused on measur-ing the maximum bufferbloat, while it is unclear how oftenbufferbloat occurs in practice, and which applications aremost affected. To the best of our knowledge, this work isthe first to report a detailed per-application view of Internetqueueing delay – that depends on the traffic mix and userbehavior of each household.

2. QUEUEING DELAY ESTIMATIONSince severe queueing is likely to occur at the bottlenecks,

like broadband links [10], we focus on infering the queueingdelay of the uplink of hosts (customers) as in Fig. 1 Ourmonitor (the eye in the figure) runs close to host A, whosequeueing we want to measure, e.g., close to the DSLAM ofits ISP network. Assume the local queue of A contains pack-ets directed to remote hosts B,C and D. A data segmentdirected to A is captured by the monitor at trx,i (packetsare timestamped at the measurement point via Endace DAGcards) and, as soon as it is received, the TCP receiver issuesan acknowledgement that will be serviced after the alreadyqueued data segments. We neglect for simplicity delayed-acknowledgement timers (that are by the way small com-pared to the bufferbloat magnitude reported in [16]). Themonitor captures the ack at ttx,i+1 and estimate the queue-ing delay qi+1 incurred by the (i+1)-th ack at A as the dif-ference between the current RTT sample ttx,i+1 − trx,i andthe minimum among the previously observed RTT samples

53

Page 2: Dissecting Bufferbloat: Measurement and Per-Application ...conferences.sigcomm.org/co-next/2013/workshops/student/program/p… · Dissecting Bufferbloat: Measurement and Per ... Permission

of that flow (that represents the propagation delay and, asthe monitor is close to the user, is by the way expected to besmall). To gather reliable results, we filter out RTT samplesof reordered or retransmitted segments. If we consideredqueueing delay samples for all packets of any given flow, wecould possibly introduce biases in the measurement process,as flows with higher data rates would be more representedin the analysis. To avoid this risk, we instead consider theaggregated queueing delay, i.e. the average queueing delay inwindows of 1 second, that is closer to the user perspective,for it describes what happens to a flow during the typicalsecond of its lifetime, rather than to all of its packets.Other related work as [3] does not stress the importance

of the validation and the tuning of the measuring tool. Onthe contrary, we validate our methodology in a local testbedunder different data rate of the links, from 1Mbps (typicalof ADSL uplink) to 100Mbps. Specifically, Tstat keeps se-quence numbers in a structure named quad (inherited fromtcptrace). If the number of outstanding segments growslarger than the quad size, sequence numbers are overwritten,the related queueing delay samples can not be calculated andthe estimation becomes coarse – which especially happensfor high queueing delay, possibly resulting in bufferbloat un-derestimation, which is undesirable. Clearly, the quad sizeneeded to achieve an accurate estimation increases with thelink rate (we precisely tune it for our experiments).As a side note, our tool can be directly applied to of-

fline traffic traces, but is also suitable for online processing.We empirically observe that the queueing delay analysis ofoffline traces imposes an 8% overhead in terms of Tstat run-ning time (other than doubling the storage requirements, aswe log the queueing delay of all active flows every second).

3. PER-APPLICATION VIEWWe report experimental results of the inference analyzing

a 8hr busy-period trace gathered during 2009,2 at a vantagepoint close to the DSLAM, of an ISP network participatingto the FP7 NapaWine [18] project. Overall, we monitorover 93000 hosts and gather about 107 individual per-flowaggregated queueing delay samples.We consider each internal IP as a single3 host. To in-

fer the applications running on the hosts, we leverage theTstat DPI and behavioral classification capabilities [11],extensively validated by ourselves as well as by the scientificcommunity at large [19]. We cluster similar applicationsinto classes depending on their service: namely, Mail, Web,Multimedia, P2P, SSH, VoIP, Chat (we consider non-videochat) and other uncategorized (or unclassified) applications.We present our results in Fig. 2.For most applications the 75% of 1-second windows expe-

rience less than 100ms worth of queueing delay. The onlyexceptions are, rather unsurprisingly, P2P applications and,somehow more surprisingly, Chat applications, with mediandelay exceeding 100ms. We can map the queueing de-lay into a coarse indication of QoE for the user. Based on

2Our focus here is to build a solid methodology, rather thanproviding a full blown measurement campaign, which is theaim of our ongoing work. Hence results should be contrastedwith those of more recent dataset for an up-to-date view ofInternet bufferbloat.3With NAT devices the same IP is shared by multiple hostsbut this has no impact on our methodology, since these po-tentially multiple hosts share the same access bottleneck link

1

10

100

1000

10000

Oth Mail P2P Web Media Chat SSH VoIP Aggregated queueing delay[m

s]

low

mid

high

low 89.7 93.2 58.4 91.9 86.8 45.7 98.6 97.8mid 10.2 6.2 38.7 8.0 13.2 54.1 1.4 2.2high 0.1 0.6 2.9 0.1 0.0 0.1 0.0 0.0

Figure 2: Jittered density maps and (5,25,50,75,95)-th per-centiles of the aggregated queueing delay. Table reports thepercentage in each region.

[5, 20, 14, 6, 21, 7] we set two thresholds at 100ms and1 second (thus deriving 3 regions of low, middle and highqueueing delay) such that: (i) performance of interactivemultimedia (e.g., VoIP, video-conference and live-streaming)or data-oriented (e.g., remote terminal or cloud editing oftext documents) applications significantly degrades whenthe first threshold is crossed [14, 6]; (ii) performance ofmildly-interactive application (e.g., Web, chat, etc.) signif-icantly degrades when the second threshold is crossed [5,20]. In the table of Fig. 2, boldface highlights possibleQoE degradation: notice that even non-interactive appli-cations performance degrades when the second threshold iscrossed [13, 21, 7]. Overall, bufferbloat impact appears tobe modest: Web and Chat are rarely impacted by high de-lay, and VoIP and SSH rarely experience middle delay. P2Pstands out, raising the odds to induce high delays followedby Mail, though with likely minor impact for the users.

4. CONCLUSIONThough preliminary, this work already conveys several

useful insights on the queueing delay estimation, e.g., cor-rectly settings of monitoring tools, metrics more representa-tive of the user perspective, a per-application assessment oflikely QoE impact. We provide to the community a ready touse tool and we plan to deploy it on operational networks [2],in order to gather more statistically significant results.

Our per-application study of the queueing delay opensnew interesting questions, to investigate as future work. Aswe have seen, Chat users experience middle delay: at thesame time, such middle delay is likely not self-inflicted bythe Chat application itself, but rather tied to applicationsthat run in parallel with Chat and that generate cross-traffic.A root cause analysis, based on standard data mining tech-niques, would allow to grasp these correlations and pinpointthe applications (or combinations of them) that are morelikely to induce high queueing delays.

AcknowledgementsThis work was carried out at LINCS (http://www.lincs.fr) and funded by the European Union under the FP7 GrantAgreement n. 318627 (Integrated Project ”mPlane”).

54

Page 3: Dissecting Bufferbloat: Measurement and Per-Application ...conferences.sigcomm.org/co-next/2013/workshops/student/program/p… · Dissecting Bufferbloat: Measurement and Per ... Permission

5. REFERENCES[1] http://internetcensus2012.bitbucket.org/.

[2] http://www.ict-mplane.eu.

[3] M. Allman. Comments on bufferbloat. SIGCOMMComput. Commun. Rev., 43(1), Jan 2012.

[4] Z. Bischof, J. Otto, M. Sanchez, J. Rula, D. Choffnes,and F. Bustamante. Crowdsourcing ISPcharacterization to the network edge. In ACMSIGCOMM Workshop on Measurements Up the STack(W-MUST’11), 2011.

[5] Z. S. Bischof, J. S. Otto, and F. E. Bustamante. Up,down and around the stack: ISP characterization fromnetwork intensive applications. In ACM SIGCOMMWorkshop on Measurements Up the STack(W-MUST’12), 2012.

[6] K.-T. Chen, C.-Y. Huang, P. Huang, and C.-L. Lei.Quantifying skype user satisfaction. 36(4):399–410,2006.

[7] C. Chirichella and D. Rossi. To the moon and back:are internet bufferbloat delays really that large. InIEEE INFOCOM Workshop on Traffic Measurementand Analysis (TMA), 2013.

[8] C. Chirichella, D. Rossi, C. Testa, T. Friedman, andA. Pescape. Remotely gauging upstream bufferbloatdelays. In PAM, 2013.

[9] M. Dhawan, J. Samuel, R. Teixeira, C. Kreibich,M. Allman, N. Weaver, and V. Paxson. Fathom: abrowser-based network measurement platform. InACM IMC, 2012.

[10] M. Dischinger, A. Haeberlen, K. P. Gummadi, andS. Saroiu. Characterizing residential broadbandnetworks. In Internet Measurement Comference, pages43–56, 2007.

[11] A. Finamore, M. Mellia, M. Meo, M. Munafo, andD. Rossi. Experiences of internet traffic monitoringwith tstat. IEEE Network Magazine, May 2011.

[12] S. Gangam, J. Chandrashekar, I. Cunha, andJ. Kurose. Estimating TCP latency approximatelywith passive measurements. In PAM, 2013.

[13] J. Gettys and K. Nichols. Bufferbloat: Dark buffers inthe internet. Communications of the ACM,55(1):57–65, 2012.

[14] O. Holfeld, E. Pujol, F. Ciucu, A. Feldmann, andP. Barford. BufferBloat: how relevant? a QoEperspective on buffer sizing. Technical report, 2012.

[15] H. Jiang, Y. Wang, K. Lee, and I. Rhee. Tacklingbufferbloat in 3G/4G networks. In ACM IMC, 2012.

[16] C. Kreibich, N. Weaver, B. Nechaev, and V. Paxson.Netalyzr: Illuminating the edge network. In ACMIMC, 2010.

[17] D. Leonard and D. Loguinov. Demystifying servicediscovery: implementing an internet-wide scanner. InACM IMC, 2010.

[18] E. Leonardi, M. Mellia, A. Horvath, L. Muscariello,S. Niccolini, and D. Rossi. Building a cooperativeP2P-TV application over a Wise Network: theapproach of the European FP-7 STREPNAPA-WINE. IEEE Communication Magazine, 64(6),April 2008.

[19] M. Pietrzyk, J.-L. Costeux, G. Urvoy-Keller, andT. En-Najjary. Challenging statistical classification foroperational usage: the adsl case. In Proceedings of the9th ACM SIGCOMM conference on Internetmeasurement conference, pages 122–135. ACM, 2009.

[20] S. Sundaresan, W. de Donato, N. Feamster,R. Teixeira, S. Crawford, and A. Pescape. Broadbandinternet performance: a view from the gateway. InACM SIGCOMM, 2011.

[21] C. Testa and D. Rossi. The impact of uTP onBitTorrent completion time. In IEEE P2P, 2011.

55