On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau...

21
On the Characteristics On the Characteristics and Origins of Internet and Origins of Internet Flow Rates Flow Rates ACM SIGCOMM 2002 ACM SIGCOMM 2002 Yin Zhang Yin Zhang Lee Breslau Lee Breslau Vern Paxson Vern Paxson Scott Scott Shenker Shenker AT&T Labs – Research {yzhang,breslau}@research.a tt.com ICIR {vern,shenker}@icir .org

Transcript of On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau...

Page 1: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

On the Characteristics and On the Characteristics and Origins of Internet Flow RatesOrigins of Internet Flow Rates

ACM SIGCOMM 2002ACM SIGCOMM 2002

Yin ZhangYin Zhang

Lee BreslauLee Breslau

Vern PaxsonVern Paxson

Scott ShenkerScott ShenkerAT&T Labs – Research

{yzhang,breslau}@research.att.com

ICIR{vern,shenker}@icir.org

Page 2: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 2

MotivationMotivation Limited knowledge about flow rates

Flow rates are impacted by many factors Congestion, bandwidth, applications, host limits, …

Little is known about the resulting rates or their causes Why is it important to understand flow rates?

Understanding the network User experience

Improving the network Identify and eliminate bottlenecks

Designing scalable network control algorithms Scalability depends on the distribution of flow rates

Deriving better models of Internet traffic Useful for workload generation and various network problems

Page 3: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 3

Two QuestionsTwo Questions

What are the characteristics of flow rates? Rate distribution Correlations

What are the causes of flow rates? T-RAT: TCP Rate Analysis Tool

Design Validation Results

Page 4: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

Characteristics of Characteristics of Internet Flow RatesInternet Flow Rates

Page 5: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 5

Datasets and MethodologyDatasets and Methodology Datasets

Packet traces at ISP backbones and campus access links 8 datasets; each lasts 0.5 – 24 hours; over 110 million packets

Summary flow statistics collected at 19 backbone routers 76 datasets; each lasts 24 hours; over 20 billion packets

Flow definition Flow ID: <SrcIP, DstIP, SrcPort, DstPort, Protocol> Timeout: 60 seconds

Rate = Size / Duration Exclude flows with duration < 100 msec

Look at: Rate distribution Correlations among rate, size, and duration

Page 6: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 6

Flow Rate CharacteristicsFlow Rate Characteristics

Rate distribution Most flows are slow, but most bytes are in fast flows Distribution is skewed

Not as skewed as size distribution Consistent with log-normal distribution [BSSK97]

Correlations Rate and size are strongly correlated Not due to TCP slow-start

Removed initial 1 second of each connection; correlations increase

What users download is a function of their bandwidth

Page 7: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

Causes of Causes of Internet Flow RatesInternet Flow Rates

Page 8: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 8

T-RAT: TCP Rate Analysis ToolT-RAT: TCP Rate Analysis Tool Goal

Analyze TCP packet traces and determine rate-limiting factors for different connections

Requirements Work for traces recorded anywhere along a network

path Traces don’t have to be recorded near an endpoint

Work just seeing one direction of a connection Data only or ACK only there is no easy cause & effect

Work with partial connections Prevent bias against long-lived flows

Work in a streaming fashion Avoid having to read the entire trace into memory

Page 9: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 9

TCP Rate Limiting FactorsTCP Rate Limiting Factors

Factor Description

Application Application doesn’t generate data fast enough

Opportunity Flow never leaves slow-start

Receiver Flow is limited by receiver advertised window

Sender Flow is limited by sending buffer

BandwidthFlow fully utilizes and is limited by bottleneck link bandwidth

Congestion Flow responds to packet loss

TransportFlow has left slow-start & no packet loss &not limited by receiver/sender/bandwidth

Page 10: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 10

T-RAT ComponentsT-RAT Components

MSS Estimator Identify Maximum Segment Size (MSS)

RTT Estimator Estimate RTT Group packets into flights

Flight: packets sent during the same RTT

Rate Limit Analyzer Determine rate-limiting factors based on

MSS, RTT, and the evolution of flight size

Page 11: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 11

What Makes It Difficult?What Makes It Difficult? The network may introduce a lot of noise

E.g. significant delay variation, ACK compression, ... Time-varying RTT is difficult to track

E.g., handshake delay and median RTT may differ substantially Delayed ACK significantly complicates TCP dynamics

E.g. congestion avoidance: 12, 12, 13, 12, 12, 14, 14, 15, … There are a large number of TCP flavors & implementations

Different loss recovery algorithms, initial cwnd, bugs, weirdness Timers may introduce behavior difficult to analyze

E.g. delack timer may expire in the middle of an RTT Packets missing due to packet filter drop, route change

They are not lost! There may be multiple causes for a connection Some behaviors are intrinsically ambiguous And a lot more …

Page 12: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 12

MSS EstimatorMSS Estimator

Data stream MSS largest data packet payload

ACK stream MSS “most frequent common divisor”

Like GCD, apply heuristics to avoid looking for divisors of numbers that are not

multiples of MSS favor popular MSS (e.g. 536, 1460, 512)

Page 13: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 13

RTT EstimatorRTT Estimator Generate a set of candidate RTTs

Between 3 msec and 3 sec: 0.003 x 1.3K sec Assign a score to each candidate RTT

Group packets into flights Flight boundary: packet with large inter-arrival time

Track evolution of flight size over time and match it to identifiable TCP behavior

Slow start Congestion avoidance Loss recovery

Score # packets in flights consistent with identifiable TCP behavior

Pick the top scoring candidate RTT

Page 14: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 14

Rate Limit AnalyzerRate Limit AnalyzerCause Test

ApplicationA sub-MSS packet followed by a lull > RTT, followed by new data

Opportunity Total bytes < 13*MSS, or it never leaves slow-start

Receiver 3 consecutive flights with size > awndmax – 3*MSS

Sender Flight size stays constant, and not receiver limited

BandwidthA flow keeps achieving same amount of data in flights prior to loss; or it has nearly equally spaced packets

Congestion Packet loss and not bandwidth-limited

Transport Sender has left slow-start and there is no packet loss

HostFlight size stays constant, but we only see data packets and can’t tell between sender and receiver

Unknown None of the above

Page 15: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 15

RTT ValidationRTT ValidationValidation against tcpanaly [Pax97] over NPD N2 (17,248 conn)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 1.2 1.4 1.6 1.8 2 2.2 2.4

Cum

ulat

ive

Frac

tion

Accurate within a factor of X

Data-based sender-side estimationData-based receiver-side estimation

Ack-based sender-side estimationAck-based receiver-side estimation

SYN/ACK estimation

RTT estimator works reasonably well in most cases

Page 16: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 16

Rate Limit ValidationRate Limit Validation Methodology

ns2 simulations + dummynet experiments T-RAT correctly identifies the cause in vast

majority of cases Failure scenarios

Rcvr/Sndr Window = 2 packets; or link utilization > 90%

Transport High background load (esp. on ACK stream)

Congestion “transport limited” w/o loss NOT failure!

Bandwidth RTT wrong, but rate limit correct

Opportunity Connection size is very large

Application Less accurate with Nagle’s algorithm

Page 17: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 17

Rate Limiting Factors (Bytes)Rate Limiting Factors (Bytes)

0

10

20

30

40

50

60

Per

cent

age

of B

ytes

Congestion

Host/Sndr/Rcvr

Opportunity

Application

Transport

Unknow n

Dominant causes by bytes: Congestion, Receiver

Page 18: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 18

Rate Limiting Factors (Flows)Rate Limiting Factors (Flows)

0

10

20

30

40

50

60

70

80

90

Per

cen

tag

e o

f F

low

s

Congestion

Host/Sndr/Rcvr

Opportunity

Application

Transport

Unknow n

Dominant causes by flows: Opportunity, Application

Page 19: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 19

Flow Characteristics by CauseFlow Characteristics by Cause

Different causes are associated with different performance for users Rate distribution

Highest rates: Receiver, Transport Size distribution

Largest sizes: Receiver Duration distribution

Longest duration: Congestion

Page 20: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 20

ConclusionConclusion Characteristics of Internet flow rates

Fast flows carry most of the bytes It is important to understand their behavior.

Strong correlation between flow rate and size What users download is a function of their bandwidth.

Causes of Internet flow rates Dominant causes:

In terms of bytes: congestion, receiver In terms of flows: opportunity, application

Different causes are associated with different performance T-RAT has applicability beyond the results we have

so far E.g. correlating rate limiting factors with other user

characteristics like application type, access method, etc.

Page 21: On the Characteristics and Origins of Internet Flow Rates ACM SIGCOMM 2002 Yin Zhang Lee Breslau Vern Paxson Scott Shenker AT&T Labs – Research {yzhang,breslau}@research.att.com.

8/23/2002 SIGCOMM '2002 21

Thank you!Thank you!http://www.research.att.com/projects/T-RAT/