9.CongestionC

Flow Control

TCP Flow Controlreceiver: explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindow field in TCP segmentsender: keeps the amount of transmitted, unACKed data less than most recently received RcvWindowsender wont overrunreceivers buffers bytransmitting too much, too fastreceiver bufferingRcvBuffer = size or TCP Receive Buffer

RcvWindow = amount of spare room in Buffer

*TCP Flow Control: How it Worksspare room in buffer= RcvWindowsource port #dest port #applicationdata (variable length)sequence numberacknowledgement numberrcvr window sizeptr urgent datachecksumFSRPAUheadlennotusedOptions (variable length)

TCP: setting timeouts

TCP Round Trip Time and TimeoutQ: how to set TCP timeout value?longer than RTTnote: RTT will varytoo short: premature timeoutunnecessary retransmissionstoo long: slow reaction to segment lossQ: how to estimate RTT?SampleRTT: measured time from segment transmission until ACK receiptignore retransmissions, cumulatively ACKed segmentsSampleRTT will vary, want estimated RTT smootheruse several recent measurements, not just current SampleRTT

High-level IdeaSet timeout = average + safe margin

Estimating Round Trip TimeEstimatedRTT = (1- )*EstimatedRTT + *SampleRTTExponential weighted moving averageinfluence of past sample decreases exponentially fasttypical value: = 0.125 SampleRTT: measured time from segment transmission until ACK receipt SampleRTT will vary, want a smoother estimated RTTuse several recent measurements, not just current SampleRTT

Setting TimeoutProblem:using the average of SampleRTT will generate many timeouts due to network variations

Solution:EstimtedRTT plus safety marginlarge variation in EstimatedRTT -> larger safety marginTimeoutInterval = EstimatedRTT + 4*DevRTTDevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|

(typically, = 0.25) Then set timeout interval:

An Example TCP Session

TCP Round Trip Time and TimeoutSetting the timeoutEstimtedRTT plus safety marginlarge variation in EstimatedRTT -> larger safety margin

EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTTExponential weighted moving averageinfluence of given sample decreases exponentially fasttypical value of x: 0.1Timeout = EstimatedRTT + 4*DeviationDeviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|

Fast retransmit

*Fast RetransmitTimeout period often relatively long:long delay before resending lost packet

Detect lost segments via duplicate ACKssender often sends many segments back-to-backif segment is lost, there will likely be many duplicate ACKsIf sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:resend segment before timer expires

*Triple Duplicate Ack12345623PacketsAcknowledgements (waiting seq#)47

* event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y Fast Retransmit:a duplicate ACK for already ACKed segmentfast retransmit

Congestion Control

Principles of Congestion ControlCongestion:informally: too many sources sending too much data too fast for network to handlemanifestations:lost packets (buffer overflow at routers)long delays (queuing in router buffers)a highly important problem!

Causes/costs of congestion: scenario 1 two senders, two receiversone router, infinite buffers no retransmission

Causes/costs of congestion: scenario 1 Throughput increases with loadMaximum total load C (Each session C/2)Large delays when congestedThe load is stochastic

Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of lost packet

Causes/costs of congestion: scenario 2 always: (goodput)Like to maximize goodput!perfect retransmission:retransmit only when loss: Actual retransmission of delayed (not lost) packet makes larger (than perfect case) for same

Causes/costs of congestion: scenario 2 costs of congestion: more work (retrans) for given goodputunneeded retransmissions: link carries (and delivers) multiple copies of pkt

outoutinoutin

Causes/costs of congestion: scenario 3 four sendersmultihop pathstimeout/retransmit

Q: what happens as and increase ?

Causes/costs of congestion: scenario 3 Another cost of congestion: when packet dropped, any upstream transmission capacity used for that packet was wasted!

Approaches towards congestion controlEnd-end congestion control:no explicit feedback from networkcongestion inferred from end-system observed loss, delayapproach taken by TCPNetwork-assisted congestion control:routers provide feedback to end systemssingle bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)explicit rate sender should send atTwo broad approaches towards congestion control:

Goals of congestion controlThroughput:Maximize goodputthe total number of bits end-endFairness:Give different sessions equal share.Max-min fairnessMaximize the minimum rate session.Single link:Capacity Rsessions mEach sessions: R/m

Max-min fairnessModel: Graph G(V,e) and sessions s1 smFor each session si a rate ri is selected.The rates are a Max-Min fair allocation:The allocation is maximalNo ri can be simply increasedIncreasing allocation ri requires reducingSome session jrj riMaximize minimum rate session.

Max-min fairness: AlgorithmModel: Graph G(V,e) and sessions s1 smAlgorithmic view:For each link compute its fair share f(e).Capacity / # sessionselect minimal fair share link.Each session passing on it, allocate f(e).Subtract the capacities and delete sessionscontinue recessively.Fluid view.

Max-min fairnessExample

Throughput versus fairness.

Case study: ATM ABR congestion controlABR: available bit rate:elastic service if senders path underloaded: sender can use available bandwidthif senders path congested: sender lowers ratea minimum guaranteed rateAim: coordinate increase/decrease rateavoid loss!

Case study: ATM ABR congestion controlRM (resource management) cells:sent by sender, in between data cellsone out of every 32 cells.RM cells returned to sender by receiverEach router modifies the RM cellInfo in RM cell set by switchesnetwork-assisted 2 bit info.NI bit: no increase in rate (mild congestion)CI bit: congestion indication (lower rate)

Case study: ATM ABR congestion controltwo-byte ER (explicit rate) field in RM cellcongested switch may lower ER value in cellsender send rate thus minimum supportable rate on pathEFCI bit in data cells: set to 1 in congested switchif data cell preceding RM cell has EFCI set, sender sets CI bit in returned RM cell

Case study: ATM ABR congestion controlHow does the router selects its action: selects a rateSet congestion bitsVendor dependent functionalityAdvantages:fast responseaccurate responseDisadvantages:network level designIncrease router tasks (load).Interoperability issues.

End to end control

End to end feedbackAbstraction:Alarm flag. observable at the end stations

Simple Abstraction

Simple feedback modelEvery RTT receive feedbackHigh CongestionDecrease rate

Low congestionIncrease rate

Variable rate controls the sending rate.

Multiplicative UpdateCongestion:Rate = Rate/2No Congestion:Rate= Rate *2Performance Fast responseUn-fair:Ratios unchanged

Additive UpdateCongestion:Rate = Rate -1No Congestion:Rate= Rate +1Performance Slow responseFairness: Divides spare BW equallyDifference remains unchanged

AIMD SchemeAdditive IncreaseFairness: ratios improvesMultiplicative DecreaseFairness: ratio unchangedFast responsePerformance:Congestion -Fast responseFairness

AIMD: Two users, One linkBW limitFairnessRate of User 1Rate of User 2

*TCP Congestion ControlClosed-loop, end-to-end, window-based congestion controlDesigned by Van Jacobson in late 1980s, based on the AIMD alg. of Dah-Ming Chu and Raj JainWorks well so far: the bandwidth of the Internet has increased by more than 200,000 timesMany versionsTCP/Tahoe: this is a less optimized versionTCP/Reno: many OSs today implement Reno type congestion controlTCP/Vegas: not currently usedFor more details: see TCP/IP illustrated; or read http://lxr.linux.no/source/net/ipv4/tcp_input.c for linux implementation

TCP/Reno Congestion Detection

3 dup ACKs indicates network capable of delivering some segments

timeout is more alarming

Philosophy:Detect congestion in two cases and react differently:

3 dup ACKstimeout event

Basic StructureTwo phasesslow-start: MIcongestion avoidance: AIMD

Important variables:cwnd: congestion window sizessthresh: threshold between the slow-start phase and the congestion avoidance phase

Visualization of the Two Phases

Slow Start: MIWhat is the goal? getting to equilibrium gradually but quickly

Implements the MI algorithmdouble cwnd every RTT until network congested get a rough estimate of the optimal of cwnd

Slow-startcwnd = 2cwnd = 4cwnd = 6Initially:cwnd = 1;ssthresh = infinite (e.g., 64K);

For each newly ACKed segment:if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1;cwnd = 8

Startup Behavior with Slow-startSee [Jac89]

*TCP/Reno Congestion AvoidanceMaintains equilibrium and reacts around equilibrium

Implements the AIMD algorithmincreases window by 1 per round-trip time (how?)cuts window size to half when detecting congestion by 3DUPto 1 if timeoutif already timeout, doubles timeout

*TCP/Reno Congestion AvoidanceInitially:cwnd = 1;ssthresh = infinite (e.g., 64K);For each newly ACKed segment:if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1;else /* congestion avoidance; cwnd increases (approx.) by 1 per RTT */ cwnd += 1/cwnd;Triple-duplicate ACKs: /* multiplicative decrease */cwnd = ssthresh = cwnd/2;Timeout:ssthresh = cwnd/2;cwnd = 1;(if already timed out, double timeout value; this is called exponential backoff)

*TCP/Reno: Big Picture TimecwndslowstartcongestionavoidanceTDTD: Triple duplicate acknowledgementsTO: TimeoutTOssthreshssthreshssthreshssthreshcongestionavoidanceTDcongestionavoidanceslow startcongestionavoidanceTD

*A SessionQuestion: when cwnd is cut to half, why sending rate is not?

*TCP/Reno Queueing DynamicsConsider congestion avoidance onlyTimecwndcongestionavoidanceTDssthreshbottleneckbandwidthThere is a filling and draining of buffer process for each TCP flow.

TCP & AIMD: congestionDynamic window size [Van Jacobson]Initialization: Slow startSteady state: AIMDCongestion = timeoutTCP TahoeCongestion = timeout || 3 duplicate ACKTCP Reno & TCP new RenoCongestion = higher latencyTCP Vegas

TCP Congestion Controlend-end control (no network assistance)transmission rate limited by congestion window size, Congwin, over segments:w segments, each with MSS bytes sent in one RTT:Congwin

TCP congestion control:probing for usable bandwidth: ideally: transmit as fast as possible (Congwin as large as possible) without lossincrease Congwin until loss (congestion)loss: decrease Congwin, then begin probing (increasing) again

two phasesslow startcongestion avoidance important variables:Congwinthreshold: defines threshold between two slow start phase, congestion control phase

TCP Slowstartexponential increase (per RTT) in window size (not so slow!)In case of timeout:Threshold=CongWin/2initialize: Congwin = 1for (each segment ACKed) Congwin++until (congestion event OR CongWin > threshold)Host Aone segmentRTTHost Btwo segmentsfour segments

TCP Tahoe Congestion Avoidance/* slowstart is over */ /* Congwin > threshold */Until (timeout) { /* loss event */ every ACK: Congwin += 1/Congwin }threshold = Congwin/2Congwin = 1perform slowstartCongestion avoidanceTCP Taheo

TCP RenoFast retransmit:After receiving 3 duplicate ACKResend first packet in window.Try to avoid waiting for timeoutFast recovery:After retransmission do not enter slowstart.Threshold = Congwin/2Congwin = 3 + Congwin/2Each duplicate ACK received Congwin++After new ACK Congwin = Threshold return to congestion avoidanceSingle packet drop: great!

TCP Congestion Window Trace

TCP Vegas:Idea: track the RTTTry to avoid packet losslatency increases: lower ratelatency very low: increase rateImplementation:sample_RTT: current RTTBase_RTT: min. over sample_RTTExpected = Congwin / Base_RTTActual = number of packets sent / sample_RTT =Expected - Actual

TCP Vegas = Expected - ActualCongestion Avoidance:two parameters: and , ) Congwin = Congwin -1Otherwise no changeNote: Once per RTTSlowstartparameter If ( > ) then move to congestion avoidanceTimeout: same as TCP Taheo

TCP Dynamics: RateTCP Reno with NO Fast Retransmit or RecoverySending rate: Congwin*MSS / RTTAssume fixed RTT WW/2Actual Sending rate: between W*MSS / RTT and (1/2) W*MSS / RTT Average (3/4) W*MSS / RTT

TCP Dynamics: LossLoss rate (TCP Reno)No Fast Retransmit or RecoveryConsider a cycleTotal packet sent: about (3/8) W2 MSS/RTT = O(W2)One packet lossLoss Probability: p=O(1/W2) or W=O(1/p)

TCP latency modelingQ: How long does it take to receive an object from a Web server after sending a request? TCP connection establishmentdata transfer delay

Notation, assumptions:Assume one link between client and server of rate RAssume: fixed congestion window, W segmentsS: MSS (bits)O: object size (bits)no retransmissionsno loss, no corruption

TCP latency modelingOptimal Setting: Time = O/R

Two cases to consider:WS/R > RTT + S/R: ACK for first segment in window returns before windows worth of data sentWS/R < RTT + S/R: wait for ACK after sending windows worth of data sent

TCP latency ModelingCase 1: latency = 2RTT + O/RCase 2: latency = 2RTT + O/R+ (K-1)[S/R + RTT - WS/R]K:= O/WS

TCP Latency Modeling: Slow StartNow suppose window grows according to slow start. Will show that the latency of one object of size O is: where P is the number of times TCP stalls at server:- where Q is the number of times the server would stall if the object were of infinite size.

- and K is the number of windows that cover the object.

TCP Latency Modeling: Slow Start (cont.)Example:

O/S = 15 segments

K = 4 windows

Q = 2

P = min{K-1,Q} = 2

Server stalls P=2 times.

TCP Latency Modeling: Slow Start (cont.)

*******************

9.CongestionC

Documents

Transcript of 9.CongestionC