CS 4284 Operating Systems

71
CS 4284 Systems Capstone Godmar Back Networking

Transcript of CS 4284 Operating Systems

Page 1: CS 4284 Operating Systems

CS 4284

Systems Capstone

Godmar Back

Networking

TCP

CS 4284 Spring 2013

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer

bull flow control

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323

2018 2581

bull full duplex data

ndash bi-directional data flow in

same connection

ndash MSS maximum segment

size

bull connection-oriented

ndash handshaking (exchange

of control msgs) initrsquos

sender receiver state

before data exchange

bull flow controlled

ndash sender will not overwhelm

receiver

bull point-to-point

ndash one sender one receiver

bull reliable in-order byte stream

ndash no ldquomessage boundariesrdquo

bull pipelined

ndash TCP congestion and flow

control set window size

bull send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

CS 4284 Spring 2013

TCP Segment Structure

source port dest port

32 bits

application data

(variable length)

sequence number

acknowledgement number

receive window

urg data pnter checksum

F S R P A U head len

not used

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now (generally not used)

RST SYN FIN connection

establishment (setup teardown

commands)

bytes rcvr willing to accept

counting by bytes of data (not segments)

Internet checksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt

service on top of IPrsquos

unreliable service

bull Pipelined segments

bull Cumulative acks

bull TCP uses single

retransmission timer

bull Retransmissions are

triggered by

ndash timeout events

ndash duplicate acks

bull Initially consider

simplified TCP

sender

ndash ignore duplicate acks

ndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKs Seq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs

ndash seq of next byte expected from other side

ndash cumulative ACK

Q how receiver handles out-of-order segments

ndash A TCP spec doesnrsquot say - up to implementor

Host A Host B

User types

lsquoCrsquo

host ACKs receipt

of echoed lsquoCrsquo

host ACKs receipt of lsquoCrsquo echoes

back lsquoCrsquo

time

simple telnet scenario

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 2: CS 4284 Operating Systems

TCP

CS 4284 Spring 2013

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer

bull flow control

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323

2018 2581

bull full duplex data

ndash bi-directional data flow in

same connection

ndash MSS maximum segment

size

bull connection-oriented

ndash handshaking (exchange

of control msgs) initrsquos

sender receiver state

before data exchange

bull flow controlled

ndash sender will not overwhelm

receiver

bull point-to-point

ndash one sender one receiver

bull reliable in-order byte stream

ndash no ldquomessage boundariesrdquo

bull pipelined

ndash TCP congestion and flow

control set window size

bull send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

CS 4284 Spring 2013

TCP Segment Structure

source port dest port

32 bits

application data

(variable length)

sequence number

acknowledgement number

receive window

urg data pnter checksum

F S R P A U head len

not used

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now (generally not used)

RST SYN FIN connection

establishment (setup teardown

commands)

bytes rcvr willing to accept

counting by bytes of data (not segments)

Internet checksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt

service on top of IPrsquos

unreliable service

bull Pipelined segments

bull Cumulative acks

bull TCP uses single

retransmission timer

bull Retransmissions are

triggered by

ndash timeout events

ndash duplicate acks

bull Initially consider

simplified TCP

sender

ndash ignore duplicate acks

ndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKs Seq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs

ndash seq of next byte expected from other side

ndash cumulative ACK

Q how receiver handles out-of-order segments

ndash A TCP spec doesnrsquot say - up to implementor

Host A Host B

User types

lsquoCrsquo

host ACKs receipt

of echoed lsquoCrsquo

host ACKs receipt of lsquoCrsquo echoes

back lsquoCrsquo

time

simple telnet scenario

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 3: CS 4284 Operating Systems

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer

bull flow control

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323

2018 2581

bull full duplex data

ndash bi-directional data flow in

same connection

ndash MSS maximum segment

size

bull connection-oriented

ndash handshaking (exchange

of control msgs) initrsquos

sender receiver state

before data exchange

bull flow controlled

ndash sender will not overwhelm

receiver

bull point-to-point

ndash one sender one receiver

bull reliable in-order byte stream

ndash no ldquomessage boundariesrdquo

bull pipelined

ndash TCP congestion and flow

control set window size

bull send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

CS 4284 Spring 2013

TCP Segment Structure

source port dest port

32 bits

application data

(variable length)

sequence number

acknowledgement number

receive window

urg data pnter checksum

F S R P A U head len

not used

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now (generally not used)

RST SYN FIN connection

establishment (setup teardown

commands)

bytes rcvr willing to accept

counting by bytes of data (not segments)

Internet checksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt

service on top of IPrsquos

unreliable service

bull Pipelined segments

bull Cumulative acks

bull TCP uses single

retransmission timer

bull Retransmissions are

triggered by

ndash timeout events

ndash duplicate acks

bull Initially consider

simplified TCP

sender

ndash ignore duplicate acks

ndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKs Seq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs

ndash seq of next byte expected from other side

ndash cumulative ACK

Q how receiver handles out-of-order segments

ndash A TCP spec doesnrsquot say - up to implementor

Host A Host B

User types

lsquoCrsquo

host ACKs receipt

of echoed lsquoCrsquo

host ACKs receipt of lsquoCrsquo echoes

back lsquoCrsquo

time

simple telnet scenario

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 4: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Overview RFCs 793 1122 1323

2018 2581

bull full duplex data

ndash bi-directional data flow in

same connection

ndash MSS maximum segment

size

bull connection-oriented

ndash handshaking (exchange

of control msgs) initrsquos

sender receiver state

before data exchange

bull flow controlled

ndash sender will not overwhelm

receiver

bull point-to-point

ndash one sender one receiver

bull reliable in-order byte stream

ndash no ldquomessage boundariesrdquo

bull pipelined

ndash TCP congestion and flow

control set window size

bull send amp receive buffers

socket

door

TCP

send buffer

TCP

receive buffer

socket

door

segment

application

writes dataapplication

reads data

CS 4284 Spring 2013

TCP Segment Structure

source port dest port

32 bits

application data

(variable length)

sequence number

acknowledgement number

receive window

urg data pnter checksum

F S R P A U head len

not used

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now (generally not used)

RST SYN FIN connection

establishment (setup teardown

commands)

bytes rcvr willing to accept

counting by bytes of data (not segments)

Internet checksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt

service on top of IPrsquos

unreliable service

bull Pipelined segments

bull Cumulative acks

bull TCP uses single

retransmission timer

bull Retransmissions are

triggered by

ndash timeout events

ndash duplicate acks

bull Initially consider

simplified TCP

sender

ndash ignore duplicate acks

ndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKs Seq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs

ndash seq of next byte expected from other side

ndash cumulative ACK

Q how receiver handles out-of-order segments

ndash A TCP spec doesnrsquot say - up to implementor

Host A Host B

User types

lsquoCrsquo

host ACKs receipt

of echoed lsquoCrsquo

host ACKs receipt of lsquoCrsquo echoes

back lsquoCrsquo

time

simple telnet scenario

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 5: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Segment Structure

source port dest port

32 bits

application data

(variable length)

sequence number

acknowledgement number

receive window

urg data pnter checksum

F S R P A U head len

not used

options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now (generally not used)

RST SYN FIN connection

establishment (setup teardown

commands)

bytes rcvr willing to accept

counting by bytes of data (not segments)

Internet checksum

(as in UDP)

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt

service on top of IPrsquos

unreliable service

bull Pipelined segments

bull Cumulative acks

bull TCP uses single

retransmission timer

bull Retransmissions are

triggered by

ndash timeout events

ndash duplicate acks

bull Initially consider

simplified TCP

sender

ndash ignore duplicate acks

ndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKs Seq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs

ndash seq of next byte expected from other side

ndash cumulative ACK

Q how receiver handles out-of-order segments

ndash A TCP spec doesnrsquot say - up to implementor

Host A Host B

User types

lsquoCrsquo

host ACKs receipt

of echoed lsquoCrsquo

host ACKs receipt of lsquoCrsquo echoes

back lsquoCrsquo

time

simple telnet scenario

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 6: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Reliable Data Transfer

bull TCP creates rdt

service on top of IPrsquos

unreliable service

bull Pipelined segments

bull Cumulative acks

bull TCP uses single

retransmission timer

bull Retransmissions are

triggered by

ndash timeout events

ndash duplicate acks

bull Initially consider

simplified TCP

sender

ndash ignore duplicate acks

ndash ignore flow control

congestion control

CS 4284 Spring 2013

TCP Seq rsquos and ACKs Seq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs

ndash seq of next byte expected from other side

ndash cumulative ACK

Q how receiver handles out-of-order segments

ndash A TCP spec doesnrsquot say - up to implementor

Host A Host B

User types

lsquoCrsquo

host ACKs receipt

of echoed lsquoCrsquo

host ACKs receipt of lsquoCrsquo echoes

back lsquoCrsquo

time

simple telnet scenario

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 7: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Seq rsquos and ACKs Seq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKs

ndash seq of next byte expected from other side

ndash cumulative ACK

Q how receiver handles out-of-order segments

ndash A TCP spec doesnrsquot say - up to implementor

Host A Host B

User types

lsquoCrsquo

host ACKs receipt

of echoed lsquoCrsquo

host ACKs receipt of lsquoCrsquo echoes

back lsquoCrsquo

time

simple telnet scenario

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 8: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Sender Events data rcvd from app

bull create segment with seq

bull seq is byte-stream number of first data byte in segment

bull start timer if not already running (think of timer as for oldest unacked segment)

bull expiration interval TimeOutInterval

timeout

bull retransmit segment that

caused timeout

bull restart timer

ack rcvd

bull If acknowledges

previously unacked

segments

ndash update what is known to

be acked

ndash start timer if there are

outstanding segments

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 9: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP

sender (simplified)

NextSeqNum = InitialSeqNum

SendBase = InitialSeqNum

loop (forever)

switch(event)

event data received from application above

create TCP segment with sequence number NextSeqNum

if (timer currently not running)

start timer

pass segment to IP

NextSeqNum = NextSeqNum + length(data)

event timer timeout

retransmit not-yet-acknowledged segment with

smallest sequence number

start timer

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

end of loop forever

Comment

bull SendBase-1 last

cumulatively

ackrsquoed byte

Example

bull SendBase-1 = 71 so

SendBase = 72 say Ack

received with y= 73 so the

rcvr acknowledges up to

including 72 now wants

73+

y is gt SendBase so

know that new data is

acked set SendBase to

73

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 10: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP retransmission scenarios Host A

loss

tim

eou

t

lost ACK scenario

Host B

X

time

SendBase = 100

Host A

time

premature timeout

Host B

Seq=

92

tim

eou

t S

eq=9

2 t

imeou

t

SendBase = 120

SendBase = 120

Sendbase = 100

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 11: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP retransmission scenarios

(more) Host A

loss

tim

eou

t

Cumulative ACK scenario

Host B

X

time

SendBase = 120

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 12: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment with

expected seq All data up to

expected seq already ACKed

Arrival of in-order segment with

expected seq One other

segment has ACK pending

Arrival of out-of-order segment

higher-than-expect seq

Gap detected

Arrival of segment that

partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500ms

for next segment If no next segment

send ACK

Immediately send single cumulative

ACK ACKing both in-order segments

Immediately send duplicate ACK

indicating seq of next expected byte

Immediate send ACK provided that

segment starts at lower end of gap

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 13: CS 4284 Operating Systems

CS 4284 Spring 2013

Naglersquos algorithm

bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP

header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity

bull Nagle ndash Transmit first byte

ndash Buffer outgoing bytes until ack has been received ndash then send all at once

bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 14: CS 4284 Operating Systems

CS 4284 Spring 2013

Delayed ACK vs Nagle

bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources

ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo

bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks

are cumulative can reduce of acks sent)

bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be

fit into a single segment)

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 15: CS 4284 Operating Systems

TCP

Timeout amp Fast Retransmit

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 16: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Q how to set TCP

timeout value

bull longer than RTT

ndash but RTT varies

bull too short premature

timeout

ndash unnecessary

retransmissions

bull too long slow

reaction to segment

loss

Q how to estimate RTT

bull SampleRTT measured time

from segment transmission

until ACK receipt

bull SampleRTT will vary want

estimated RTT ldquosmootherrdquo

ndash average several recent

measurements not just current SampleRTT

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 17: CS 4284 Operating Systems

CS 4284 Spring 2013

RTT Distributions

(a) Probability density of ACK arrival times in the data link layer

(b) Probability density of ACK arrival times for TCP

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 18: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT

bull exponential weighted moving average

bull influence of past sample decreases

exponentially fast

bull typical value = 0125

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 19: CS 4284 Operating Systems

CS 4284 Spring 2013

Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (

mil

lise

con

ds)

SampleRTT Estimated RTT

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 20: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Round Trip Time and Timeout

Setting the timeout

bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin

bull first estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 0125 = 025)

EstimatedRTT = (1-)EstimatedRTT + SampleRTT

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 21: CS 4284 Operating Systems

CS 4284 Spring 2013

RTT Measurement amp

Retransmissions

bull How should you handle acks for retransmitted segments when measuring RTT

ndash Note ACK could be delayed for original segment or early for retransmitted segment

bull Choices

ndash Associate with original packet may overestimate true RTT

ndash Associate with retransmitted packet may underestimate true RTT

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 22: CS 4284 Operating Systems

CS 4284 Spring 2013

Karnrsquos Algorithm

bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network

delay

ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula

bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit

bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT

bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 23: CS 4284 Operating Systems

CS 4284 Spring 2013

Fast Retransmit

bull Time-out period often

relatively long

ndash long delay before

resending lost packet

bull Detect lost segments via

duplicate ACKs

ndash Sender often sends many

segments back-to-back

ndash If segment is lost there will

likely be many duplicate

ACKs

bull If sender receives 3

ACKs for the same data

it supposes that segment

after ACKed data was

lost

ndash fast retransmit resend

segment before timer

expires

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 24: CS 4284 Operating Systems

CS 4284 Spring 2013

event ACK received with ACK field value of y

if (y gt SendBase)

SendBase = y

if (there are currently not-yet-acknowledged segments)

start timer

else

increment count of dup ACKs received for y

if (count of dup ACKs received for y = 3)

resend segment with sequence number y

Fast retransmit algorithm

a duplicate ACK for already ACKed segment

fast retransmit

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 25: CS 4284 Operating Systems

TCP

Flow Control

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 26: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Flow Control

bull receive side of TCP connection has a receive buffer

bull speed-matching

service matching

the send rate to the

receiving apprsquos drain

rate bull app process may be

slow at reading from

buffer

sender wonrsquot overflow receiverrsquos buffer by

transmitting too much too fast

flow control

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 27: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Flow Control (contrsquod)

(Suppose TCP receiver discards

out-of-order segments)

bull spare room in buffer

= RcvWindow

= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Rcvr advertises spare room by including value of RcvWindow in segments

bull Sender limits unACKed data to RcvWindow

ndash guarantees receive buffer doesnrsquot overflow

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 28: CS 4284 Operating Systems

CS 4284 Spring 2013

Flow Control Persistence Timer

bull Suppose sender is blocked cause receiver

application hasnrsquot picked up data

bull Then receiver app reads n bytes

bull TCP receiver advertises new window of size

RcvWindow = n

bull But suppose this advertisement is lost

bull Sender would be stuck

ndash Solution persistence timer sender sends probe after

a period of inactivity to provoke window update

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 29: CS 4284 Operating Systems

CS 4284 Spring 2013

Flow Control Silly Window Syndrome

Clarkrsquos solution donrsquot advertise windows below a certain size

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 30: CS 4284 Operating Systems

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 31: CS 4284 Operating Systems

TCP

Connection Management

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 32: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Connection Management Recall TCP sender receiver

establish ldquoconnectionrdquo before

exchanging data segments

bull initialize TCP variables

ndash seq s

ndash buffers flow control info (eg RcvWindow)

bull client connection initiator

connect(s ampdstaddr hellip)

bull server contacted by client

cl=accept(sv

ampcaddrhellip)

Three way handshake

Step 1 client host sends TCP

SYN segment to server

ndash specifies initial seq

ndash no data

Step 2 server host receives SYN

replies with SYNACK segment

ndash server allocates buffers

ndash specifies server initial seq

Step 3 client receives SYNACK

replies with ACK segment

which may contain data

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 33: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP 3-way handshake

TCP connection establishment bull Q1 why 3-way and not 2-way handshake

bull Q2 how do sender amp receiver determine initial seqnums

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 34: CS 4284 Operating Systems

CS 4284 Spring 2013

3-way Handshake amp Delayed Dups

(a) Normal operation

(b) Old SYN appearing out of nowhere

(c) Duplicate SYN and duplicate ACK following SYN

3-way handshake required to deal with scenarios (b) and (c)

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 35: CS 4284 Operating Systems

CS 4284 Spring 2013

Sequence Number Reuse

bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a

connection with identical sequence numbers

bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]

bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 36: CS 4284 Operating Systems

CS 4284 Spring 2013

When Sequence Numbers Attack

bull Suppose attacker A can predict sequence

number a host B is going to use next

bull By using spoofed source IP C A can engage in

successful 3-way handshake with B

ndash B believes it is talking to C might grant permissions

based on Crsquos IP address

ndash Attacker on A must suppress the RST packets C is

likely to send ndash use a denial-of-service attack for that

bull A sends message to compromise B

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 37: CS 4284 Operating Systems

CS 4284 Spring 2013

When SYNs Attack

bull Servers receiving SYN must allocate resources

ndash Opens up possibility of denial-of-service attack where

server is flooded with bogus SYN packets with forged

IP source addresses

bull Solution

ndash SYN cookies

bull Server creates ACK number sends ACK ndash but does not

allocate buffers

ndash If client continues with SYNACK check if ACK could

have been sent then allocate buffers if correct

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 38: CS 4284 Operating Systems

Sequence Number Summary

bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot

reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared

ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs

bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker

can eavesdrop ndash use PRNG for initial seq number choice

ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number

CS 4284 Spring 2013

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 39: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Connection Management (cont)

Closing a connection

client closes socket close(s)

Step 1 client end system

sends TCP FIN control

segment to server

Step 2 server receives FIN

replies with ACK Closes

connection sends FIN

client server

close

close

closed

tim

ed w

ait

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 40: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Connection Management (cont)

Step 3 client receives FIN

replies with ACK

ndash Enters ldquotimed waitrdquo - will

respond with ACK to

received FINs

Step 4 server receives ACK

Connection closed

Note with small modification can

handle simultaneous FINs

client server

closing

closing

closed

tim

ed w

ait

closed

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 41: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP

Connection

FSM The heavy solid line is the

normal path for a client

The heavy dashed line is the

normal path for a server

The light lines are unusual

events

Each transition is labeled by

the event causing it and the

action resulting from it

separated by a slash

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 42: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Connection Management (contrsquod)

TCP client lifecycle TCP server lifecycle

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 43: CS 4284 Operating Systems

CS 4284 Spring 2013

Closing a Connection

bull Note previous charts showed normal case

bull Can we reliably close a connection if

packets (FIN ACK) can be lost

ndash No Famous two-army problem

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 44: CS 4284 Operating Systems

CS 4284 Spring 2013

Summary

bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm

ndash Fast retransmit

bull RTT estimation amp Karnrsquos algorithm

bull Flow Control amp Silly Window Syndrome

bull Connection Management in TCP

bull Attacks against TCPrsquos connection management scheme ndash SYN attack

ndash Sequence number prediction attacks

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 45: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Miscellaneous

bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536

outside same subnet) MSS option on SYN

bull SACK ndash selective acknowledgements

bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks

bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers

bull hellip

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 46: CS 4284 Operating Systems

CS 4284 Spring 2013

Study of TCP Outline

bull segment structure

bull reliable data transfer ndash delayed ACKs

ndash Naglersquos algorithm

bull timeout management fast retransmit

bull flow control + silly window syndrome

bull connection management

[ Network Address Translation ]

[ Principles of congestion control ]

bull TCP congestion control

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 47: CS 4284 Operating Systems

TCP

Congestion Control

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 48: CS 4284 Operating Systems

CS 4284 Spring 2013

Principles of Congestion Control

Congestion

bull informally ldquotoo many sources sending too much

data too fast for network to handlerdquo

bull different from flow control

bull manifestations

ndash long delays (queueing in router buffers)

ndash lost packets (buffer overflow at routers)

bull a top-10 problem

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 49: CS 4284 Operating Systems

CS 4284 Spring 2013

Causescosts of congestion scenario 1

bull two senders two

receivers

bull one router infinite

buffers

bull no retransmission

bull large delays when congested

bull (but no reduction in throughput here)

unlimited shared

output link buffers

Host A lin original data

Host B

lout

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 50: CS 4284 Operating Systems

CS 4284 Spring 2013

Causescosts of congestion scenario 2

bull one router finite buffers

bull sender retransmission of lost packet after

timeout

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 51: CS 4284 Operating Systems

CS 4284 Spring 2013

Causescosts of congestion scenario 2 Always lin = lout (goodput)

bull a) if no loss lrsquoin = lin

bull b) assume clairvoyant sender retransmission only when loss certain

bull c) retransmission of both delayed and lost packets makes lrsquoin larger for

same lout (every packet transmitted twice)

ldquocostsrdquo of congestion

bull more work (retrans) for given ldquogoodputrdquo

bull unneeded retransmissions link carries multiple copies of pkt

b

R2

R2 lin

lout

a c

R2

R2 lin

lout

R4

R2

R2 lin

lout

R3

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 52: CS 4284 Operating Systems

CS 4284 Spring 2013

Causescosts of congestion scenario 3

bull four senders

bull multihop paths

bull timeoutretransmit

Q what happens as lin

and lrsquoin increase

finite shared output

link buffers

Host A lin original data

Host B

lout

lin original data plus

retransmitted data

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 53: CS 4284 Operating Systems

CS 4284 Spring 2013

Causescosts of congestion scenario 3

Another ldquocostrdquo of congestion

bull when packet is dropped any upstream transmission

capacity used for that packet was wasted

bull ultimately leads to congestion collapse

H

o

s

t

A

H

o

s

t

B

lo

u

t

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 54: CS 4284 Operating Systems

CS 4284 Spring 2013

Reasons for Congestion Control

bull Congested networks increase delay even

if no packet loss occurs

bull If packet loss occurs needed

retransmission require offered load to be

greater than goodput

bull Downstream losses waste upstream

transmission capacity leading to

congestion collapse in the worst case

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 55: CS 4284 Operating Systems

CS 4284 Spring 2013

Congestion Control Approaches

End-end congestion

control

bull no explicit feedback from

network

bull congestion inferred from

end-system observed

loss delay

bull approach taken by TCP

Network-assisted congestion control

bull routers provide feedback to end systems

ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)

ndash explicit rate sender should send at

Two broad classes

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 56: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Congestion Control

bull end-end control (no network

assistance)

bull sender limits transmission

LastByteSent-LastByteAcked

min(CongWin RcvWindow)

bull CongWin and RTT influence throughput

bull CongWin is dynamic function of

perceived network congestion

How does sender notice

congestion

bull loss event = timeout or 3

duplicate acks

bull TCP sender reduces rate (CongWin) after loss event

bull (assumes congestion is

primary cause of loss)

Three mechanisms

bull AIMD

bull slow start

bull fast recovery

rate = CongWin

RTT

Bytessec

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 57: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP AIMD

8 Kbytes

16 Kbytes

24 Kbytes

time

congestion

window

multiplicative decrease cut CongWin in half

after loss event

additive increase increase CongWin by 1 MSS

every RTT in the

absence of loss events

probing

Long-lived TCP connection

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 58: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Slow Start

bull When connection begins CongWin = 1 MSS ndash Example MSS = 500

bytes amp RTT = 200 msec

ndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly

ramp up to respectable rate

bull When connection

begins increase rate

exponentially fast until

first loss event

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 59: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Slow Start (more)

bull When connection begins

increase rate

exponentially until first

loss event

ndash double CongWin every

RTT

ndash done by incrementing CongWin for every ACK

received

bull Summary initial rate is

slow but ramps up

exponentially fast

Host A

RT

T

Host B

time

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 60: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Tahoe vs Reno

Q When should the exponential increase switch to linear

A When CongWin gets to 12 of its value before timeout

Implementation

bull Variable Threshold

bull At loss event Threshold is set to 12 of CongWin just before loss event

bull If timeout event set CongWin to 1

bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 61: CS 4284 Operating Systems

CS 4284 Spring 2013

Timeouts vs 3-dup ACKs

bull After 3 dup ACKs

ndash CongWin is cut in half

ndash window then grows linearly

bull But after timeout event

ndash CongWin instead set to 1 MSS

ndash window then grows exponentially

ndash to a threshold then grows linearly

bull 3 dup ACKs indicates

network capable of

delivering some segments

bull timeout before 3 dup

ACKs received is stronger

ldquomore alarmingrdquo indicator

of congestion

Rationale

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 62: CS 4284 Operating Systems

CS 4284 Spring 2013

Summary TCP Congestion Control

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0

retransmit missing segment

L

cwnd gt ssthresh

congestion

avoidance

cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fast

recovery

cwnd = cwnd + MSS transmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2 cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeout

ssthresh = cwnd2 cwnd = 1 dupACKcount = 0

retransmit missing segment

ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment

dupACKcount == 3 cwnd = ssthresh dupACKcount = 0

New ACK

slow

start

timeout

ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment

cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed

new ACK dupACKcount++

duplicate ACK

L

cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0

New ACK

New ACK

New ACK

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 63: CS 4284 Operating Systems

CS 4284 Spring 2013

Summary TCP Congestion Control

bull When CongWin is below Threshold sender in

slow-start phase window grows exponentially

bull When CongWin is above Threshold sender is in

congestion-avoidance phase window grows linearly

bull When a triple duplicate ACK occurs Threshold set

to CongWin2 and CongWin set to Threshold

bull When timeout occurs Threshold set to CongWin2

and CongWin is set to 1 MSS

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 64: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Sender Congestion Control Event State TCP Sender Action Commentary

ACK receipt for

previously

unacked data

Slow Start

(SS)

CongWin = CongWin + MSS

If (CongWin gt Threshold)

set state to CA

Resulting in a doubling of

CongWin every RTT

ACK receipt for

previously

unacked data

Congestion

Avoidance

(CA)

CongWin = CongWin+MSS

(MSSCongWin)

Additive increase resulting in

increase of CongWin by 1

MSS every RTT

Triple duplicate

ACK

SS or CA Threshold = CongWin2

CongWin = Threshold

Set state to CA

Fast recovery implementing

multiplicative decrease

CongWin will not drop below

1 MSS

Timeout SS or CA Threshold = CongWin2

CongWin = 1 MSS

Set state to ldquoSlow Startrdquo

Enter slow start

Duplicate ACK SS or CA Increment duplicate ACK

count for segment being

acked

CongWin and Threshold not

changed

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 65: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Throughput Idealized

bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start

bull When window is W throughput is WRTT

bull Just after loss window drops to W2 throughput to W2RTT

bull Average steady-state throughput 75 WRTT

W

W2

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 66: CS 4284 Operating Systems

CS 4284 Spring 2013

TCP Throughput amp Loss

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull Throughput in terms of loss rate

bull L = 210-10 Very low

bull Require almost perfect link

LRTT

MSS221

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 67: CS 4284 Operating Systems

Equation-Based Control

bull Note TCP congestion control forms

control loop

ndash Inputs round-trip time ldquoloss eventsrdquo (which

are samples of timeout events + 3-ack events)

bull Instead equation-based control uses an

equation to compute sending rate based

on this input

bull See RFC 5348 for more info

CS 4284 Spring 2013

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 68: CS 4284 Operating Systems

TCP Fairness

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 69: CS 4284 Operating Systems

CS 4284 Spring 2013

Fairness goal if k TCP sessions share same

bottleneck link of bandwidth R each should

have average rate of Rk

TCP connection 1

bottleneck router

capacity R

TCP connection 2

TCP Fairness

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 70: CS 4284 Operating Systems

CS 4284 Spring 2013

Why is TCP fair

Consider two competing sessions

bull additive increase gives slope of 1 as throughput increases

bull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

congestion avoidance additive increase

loss decrease window by factor of 2

congestion avoidance additive increase

loss decrease window by factor of 2

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do

Page 71: CS 4284 Operating Systems

CS 4284 Spring 2013

Fairness (more)

Fairness and UDP

bull Multimedia apps often

do not use TCP

ndash do not want rate throttled

by congestion control

bull Instead use UDP

ndash pump audiovideo at

constant rate tolerate

packet loss

bull TCP friendliness

Fairness and parallel TCP

connections

bull nothing prevents app from

opening parallel connections

between 2 hosts

bull Example link of rate R

supporting 9 connections

ndash new app asks for 1 TCP gets

rate R10

ndash new app asks for 9 TCPs gets

R2

bull Thatrsquos what ldquodownload

acceleratorsrdquo do