Reliable Transport II: TCP and Congestion Control

43
Reliable Transport II: TCP and Congestion Control Brad Karp UCL Computer Science CS 6007/GC15/GA07 27 th - 28 th February, 2008

description

Reliable Transport II: TCP and Congestion Control. Brad Karp UCL Computer Science. CS 6007/GC15/GA07 27 th - 28 th February, 2008. Outline. Packet header format Connection establishment Data transmission Retransmit timeouts RTT estimator AIMD Congestion control - PowerPoint PPT Presentation

Transcript of Reliable Transport II: TCP and Congestion Control

Page 1: Reliable Transport II: TCP and Congestion Control

Reliable Transport II:TCP and Congestion Control

Brad KarpUCL Computer Science

CS 6007/GC15/GA0727th - 28th February, 2008

Page 2: Reliable Transport II: TCP and Congestion Control

2

Outline

• Packet header format• Connection establishment• Data transmission• Retransmit timeouts• RTT estimator• AIMD Congestion control• Throughput, loss, and RTT equation• Connection teardown• Protocol state machine

Page 3: Reliable Transport II: TCP and Congestion Control

3

TCP Packet Header

• TCP packet: IP header + TCP header + data• TCP header: 20 bytes long• Checksum covers header + “pseudo header”

– IP header source and destination addresses, protocol– Length of TCP segment (TCP header + data)

Page 4: Reliable Transport II: TCP and Congestion Control

4

TCP Header Details

• Connections inherently bidirectional; all TCP headers carry both data and ACK sequence numbers

• 32-bit sequence numbers are in units of bytes

• Source and destination ports– multiplexing of TCP by applications– UNIX: local ports below 1024 reserved (only

root may use them)

• Window: advertisement of number of bytes advertiser willing to accept

Page 5: Reliable Transport II: TCP and Congestion Control

5

TCP Connection Establishment:Motivation

• Goals:– Start TCP connection between two hosts– Avoid mixing data from old connection in

new connection– Avoid confusing previous connection

attempts with current one– Prevent (most) third parties from

impersonating (spoofing) one endpoint• SYN packets (SYN flag in TCP header

set) used to establish connections• Use retransmission timer to recover

from lost SYNs• What protocol meets above goals?

Page 6: Reliable Transport II: TCP and Congestion Control

6

TCP Connection Establishment:Non-Solution (I)

• Use two-way handshake

• A sends SYN to B• B accepts by returning

SYN to A• A retransmits SYN if

not received• A and B can ignore

duplicate SYNs after connection established

• What about delayed data packets from old connection?

SYN

SYN

data, seqno = 1

time

data, seqno = 512

A B

closed

SYN

SYN

data, seqno = 1data, seqno = 512

data, seqno = 1024

data, seqno =

1024

Connections shouldn’t start with constant sequence number; risks mixing data between old and new connections

Page 7: Reliable Transport II: TCP and Congestion Control

7

TCP Connection Establishment:Non-Solution (II)

• Two-way handshake, as before

• But enclose random initial sequence numbers on SYNs

• What about delayed SYNs from old connection?– A wrongly believes

connection successfully established

– B will drop all of A’s data!

timeA B

closedSYN, seqno = k

SYN, seqno = j

data, seqno = k+1

SYN, seqno = i

data ignored!

Connection attempts should explicitly acknowledge which SYN they are accepting!

Page 8: Reliable Transport II: TCP and Congestion Control

8

TCP Connection Establishment:3-Way Handshake

• Set SYN on connection request

• Each side chooses random initial sequence number

• Each side explicitly ACKs the sequence number of the SYN it’s responding to

SYN, seqno = i

SYN, seqno =

j, ACK = i+1

seqno = i+1, ACK = j+1

time

A B

Page 9: Reliable Transport II: TCP and Congestion Control

9

Robustness of 3-Way Handshake:Delayed SYN

• Suppose A’s SYN i delayed, arrives at B after connection closed

• B responds with SYN/ACK for i+1

• A doesn’t recognize i+1; responds with reset, RST flag set in TCP header

• A rejects connection

SYN, seqno = i

SYN, seqno =

j, ACK = i+1

RST, ACK = j

time

A B

closed

Page 10: Reliable Transport II: TCP and Congestion Control

10

Robustness of 3-Way Handshake:Delayed SYN/ACK

• A attempts connection to B

• Suppose B’s SYN k/ACK p delayed, arrives at A during new connection attempt

• A rejects SYN k; sends RST to B

• Connection from A to B succeeds unimpeded

SYN, seqno = i

SYN, seqno =

j, ACK =

i+1

seqno = i+1, ACK = j+1

time

A B

closed

SYN, seqno =

k, ACK =

p

RST, ACK = k

Page 11: Reliable Transport II: TCP and Congestion Control

11

Robustness of 3-Way Handshake:Source Spoofing

• Suppose host B trusts host A, based on A’s IP address– e.g., allows any account

creation request from host A

• Adversary M may not control host A, but may seek to impersonate, or spoof, host A– Adversary may not need

to receive data from B; only send data (e.g., “create an account l33thax0r”)

• Can M establish a connection to B as A?

A B

M IP =

A, S

YN,

seqno

= i

SYN, seqno = j, ACK = i+1

IP =

A, s

eqno

= i+1, A

CK

= ??

Unless he is on path between A and B, adversary cannot spoof A to B or vice-versa!Why: random ISNs on SYNs

Page 12: Reliable Transport II: TCP and Congestion Control

12

TCP: Data Transmission (I)

• Each byte numbered sequentially, mod 232

• Sender buffers data in case retransmission required

• Receiver buffers data for in-order reassembly• Sequence number (seqno) field in TCP header

indicates first user payload byte in packet• Receiver indicates receive window size

explicitly to sender in window field in TCP header– corresponds to available buffer space at receiver

Page 13: Reliable Transport II: TCP and Congestion Control

13

TCP: Data Transmission (II)

• Sender’s transmit window size: amount of buffer space at sender

• Sender uses window that is minimum of send and receive window sizes

• Receiver sends cumulative ACKs– ACK number in TCP header names highest

contiguous byte number received thus far, +1– one ACK per received packet, OR– Delayed ACK also possible: receiver batches

ACKs, sends one for every pair of data packets (200 ms max delay)

• Current window at sender:– low byte advances as packets sent– high byte advances as receive window updates

arrive

Page 14: Reliable Transport II: TCP and Congestion Control

14

Outline

• Packet header format• Connection establishment• Data transmission• Retransmit timeouts• RTT estimator• AIMD Congestion control• Throughput, loss, and RTT equation• Connection teardown• Protocol state machine

Page 15: Reliable Transport II: TCP and Congestion Control

15

TCP: Retransmit Timeouts

• Sender sets timer for each sent packet– when ACK returns, timer canceled– if timer expires before ACK returns, packet resent

• Expected time for ACK to return: RTT• TCP estimates round-trip time using EWMA

– measurements mi from timed packet/ACK pairs– RTTi = ((1-α) x RTTi-1 + α x mi)– Retransmit timeout: RTOi = β × RTTi

– original TCP: β = 2

• Is this accurate enough?– Recall dangers of too-short and too-long RTT

estimates from previous lecture

Page 16: Reliable Transport II: TCP and Congestion Control

16

Mean and Variance:Jacobson’s RTT Estimator

• Above link load of 30% at router, β × RTTi will retransmit too early!

• Response to increasing load: waste bandwidth on duplicate packets

• Result: congestion collapse!• [Jacobson 88]: estimate vi, mean

deviation (EWMA of |mi – RTTi|), stand-in for variance

vi = vi-1 × (1-γ) + γ × |mi-RTTi|

• Use RTOi = RTTi + 4vi

Mean and Variance RTT estimator used by all modern TCPs

Page 17: Reliable Transport II: TCP and Congestion Control

17

Retransmit Behavior

• Original TCP, before [Jacobson 88]:– at start of connection, send full window

of packets– retransmit each packet immediately

after its timer expires

• Result: window-sized bursts of packets sent into network

Page 18: Reliable Transport II: TCP and Congestion Control

18

Pre-Jacobson TCP (Obsolete!)

• Time-sequence plot taken at sender • Bursts of packets: vertical lines• Spurious retransmits: repeats at same y value• Dashed line: available 20 Kbps capacity

Page 19: Reliable Transport II: TCP and Congestion Control

19

Self-Clocking: Conservation of Packets

• Goal: self-clocking transmission– each ACK returns, one data packet sent– spacing of returning ACKs: matches spacing of

packets in time at slowest link on path

Page 20: Reliable Transport II: TCP and Congestion Control

20

Reaching Equilibrium: Slow Start

• At connection start, sender sets congestion window size, cwnd, to pktSize (one packet’s worth of bytes), not whole window

• Sender sends up to minimum of receiver’s advertised window and cwnd

• Upon return of each ACK until receiver’s advertised window size reached, increase cwnd by pktSize bytes

• “Slow” means exponential window increase!

• Takes log2W RTTs to reach receiver’s advertised window size W

Page 21: Reliable Transport II: TCP and Congestion Control

21

Post-Jacobson TCP: Slow Start and Mean+Variance RTT Estimator

• Time-sequence plot at sender• “Slower” start• No spurious retransmits

Page 22: Reliable Transport II: TCP and Congestion Control

22

Outline

• Packet header format• Connection establishment• Data transmission• Retransmit timeouts• RTT estimator• AIMD Congestion control• Throughput, loss, and RTT equation• Connection teardown• Protocol state machine

Page 23: Reliable Transport II: TCP and Congestion Control

23

Goals in Congestion Control

• Achieve high utilization on links; don’t waste capacity!

• Divide bottleneck link capacity fairly among users

• Be stable: converge to a steady allocation among users

• Avoid congestion collapse

Page 24: Reliable Transport II: TCP and Congestion Control

24

Congestion Collapse

• Cliff behavior observed in [Jacobson 88]

Offered load (bps)

Th

roughput

(bps)

Congestion collapse!

Knee

Page 25: Reliable Transport II: TCP and Congestion Control

25

Congestion Requires Slowing Senders

• Recall: bigger buffers cannot prevent congestion

• Senders must slow to alleviate congestion• Absence of ACKs implicitly indicates congestion• TCP sender’s window size determines sending

rate• Recall: correct window size is bottleneck

bandwidth-delay product• How can sender learn this value?

– Search for it, by adapting window size– Feedback from network: ACKs return (window OK)

or do not return (window too big)

Page 26: Reliable Transport II: TCP and Congestion Control

26

Avoiding Congestion:Multiplicative Decrease

• Recall that sender uses sending window of size min(cwnd, rwnd), where rwnd is receiver’s advertised window

• Upon timeout for sent packet, sender presumes packet lost to congestion, and:– sets ssthresh = cwnd / 2– sets cwnd = pktSize– uses slow start to grow cwnd up to ssthresh

• End result: cwnd = cwnd / 2, via slow start• Sender sends one window per RTT; halving

cwnd halves transmit rate

Page 27: Reliable Transport II: TCP and Congestion Control

27

Avoiding Congestion:Additive Increase

• Drops indicate TCP sending more than its fair share of bottleneck

• No feedback to indicate TCP using less than its fair share of bottleneck

• Solution: speculatively increase window size as ACKs return

• Additive increase: for each returning ACK,

cwnd = cwnd + (pktSize × pktSize)/cwnd– Increases cwnd by ~pktSize bytes per RTT

Combined algorithm:Additive Increase, Multiplicative Decrease (AIMD)

Page 28: Reliable Transport II: TCP and Congestion Control

28

Refinement: Fast Retransmit (I)

• Sender must wait well over RTT for timer to expire before loss detected

• TCP’s minimum retransmit timeout: 1 second

• Another loss indication: duplicate ACKs– Suppose sender sends 1, 2, 3, 4, but 2 lost– Receiver receives 1, 3, 4– Receiver sends cumulative ACKs 2, 2, 2– Loss causes duplicate ACKs!

Page 29: Reliable Transport II: TCP and Congestion Control

29

Fast Retransmit (II)

• Upon arrival of 3 duplicate ACKs, sender:– sets cwnd =

cwnd/2– retransmits

“missing” packet– no slow start

• Not only loss causes dup ACKs– Reordering, too

data, seqno = 1

ACK = 513

data, seqno = 513

time

A B

data, seqno = 513data, seqno = 1025data, seqno = 1537

ACK = 513

ACK = 513

Page 30: Reliable Transport II: TCP and Congestion Control

30

AIMD in Action

• Sender searches for correct window size

Page 31: Reliable Transport II: TCP and Congestion Control

31

Why AIMD?

• Other control rules possible– E.g., MIMD, AIAD, …

• Recall goals:– Links fully utilized (efficient)– Users share resources fairly

• TCP adapts all flows’ window sizes independently

• Must choose a control that will always converge to an efficient and fair allocation of windows

Page 32: Reliable Transport II: TCP and Congestion Control

32

Chiu-Jain Phase Plots

• Consider two users sharing a bottleneck link

• Plot bandwidths allocated to each

• Efficiency: sum of two users’ rates fixed

• Fairness: two users’ rates equal

• Equi-Fairness: ratio of two users’ rates fixed

User 1 (bps)

Use

r 2 (

bps)

Efficiency Line

Fairness Line

Overload

Underload

Equi-Fairness Line (MI)

Optimum

(AI)

Page 33: Reliable Transport II: TCP and Congestion Control

33

Chiu Jain: AIMD

• AIMD converges to optimum efficiency and fairness

Efficiency Line

Fairness Line

Page 34: Reliable Transport II: TCP and Congestion Control

34

Chiu Jain: AIAD

• AIAD doesn’t converge to optimum point!• Similar oscillations for MIMD

Efficiency Line

Fairness Line

Page 35: Reliable Transport II: TCP and Congestion Control

35

Outline

• Packet header format• Connection establishment• Data transmission• Retransmit timeouts• RTT estimator• AIMD Congestion control• Throughput, loss, and RTT equation• Connection teardown• Protocol state machine

Page 36: Reliable Transport II: TCP and Congestion Control

36

Modeling Throughput, Loss, and RTT

• How do packet loss rate and RTT affect throughput TCP achieves?

• Assume:– only fast retransmits– no timeouts (so no slow starts in steady-

state)

Page 37: Reliable Transport II: TCP and Congestion Control

37

Evolution of Window Over Time

• Average window size: 3W/4• One window sent per RTT• Bandwidth:

– 3W/4 packets per RTT– (3W/4 x packet size) / RTT bytes per

second– W depends on loss rate…

time

W

W/2

Page 38: Reliable Transport II: TCP and Congestion Control

38

Loss and Window Size

• Assume no delayed ACKs, fixed RTT• cwnd grows by one packet per RTT• So it takes W/2 RTTs to go from

window size W/2 to window size W; this period is one cycle

• How many packets sent in total?– ((3W/4) / RTT) x (W/2 x RTT) = 3W2/8

• One loss per cycle (as window reaches W)– loss rate: p = 8/3W2

– W = sqrt(8/3p)

Page 39: Reliable Transport II: TCP and Congestion Control

39

Throughput, Loss, and RTT Model

• W = sqrt(8/3p) = (4/3) x sqrt(3/2p)• Recall:

– Bandwidth: B = (3W/4 x packet size) / RTT

• B = packet size / (RTT x sqrt(2p/3))• Consequences:

– Increased loss quickly reduces throughput– At same bottleneck, flow with longer RTT

achieves less throughput than flow with shorter RTT!

Page 40: Reliable Transport II: TCP and Congestion Control

40

Outline

• Packet header format• Connection establishment• Data transmission• Retransmit timeouts• RTT estimator• AIMD Congestion control• Throughput, loss, and RTT equation• Connection teardown• Protocol state machine

Page 41: Reliable Transport II: TCP and Congestion Control

41

TCP: Connection Teardown• Data may flow

bidirectionally• Each side independently

decides when to close connection

• In each direction, FIN answered by ACK

• Must reliably terminate connection for both sides– During TIME_WAIT state at

first side to send FIN, ACK valid FINs that arrive

• Must avoid mixing data from old connection with new one– During TIME_WAIT state,

disallow all new connections for 2 x max segment lifetime

FIN, seqno = i

ACK = i+1

ACK = j+1

time

A B

FIN, seqno = j

enter TIME_WAIT state

Page 42: Reliable Transport II: TCP and Congestion Control

42

TCP: Protocol State Machine

Page 43: Reliable Transport II: TCP and Congestion Control

43

Summary: TCP and Congestion Control

• Connection establishment and teardown– Robustness against delayed packets crucial

• Round-trip time estimation– EWMAs estimate both RTT mean and deviation

• Congestion detection at sender– Timeout: retransmit timer expires, half window,

slow start from one packet– Fast Retransmit: three duplicate ACKs, half

window, no slow start• Search for optimal sending window size

– Additive increase, multiplicative decrease (AIMD)– AIMD converges to high utilization, fair sharing