Advanced Networks 20021 Transport Layer Michalis Faloutsos Many slides from Kurose-Ross.
-
Upload
antonio-rankin -
Category
Documents
-
view
220 -
download
0
Transcript of Advanced Networks 20021 Transport Layer Michalis Faloutsos Many slides from Kurose-Ross.
Advanced Networks 2002
1
Transport Layer
Michalis FaloutsosMichalis Faloutsos
Many slides from Kurose-RossMany slides from Kurose-Ross
Advanced Networks 2002
2
Transport Layer Functionality
Hide network from application layerHide network from application layer
Transport layer resides at end pointsTransport layer resides at end points
Sees the network as a black boxSees the network as a black box
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
Advanced Networks 2002
3
Transport Layers of the Internet
TCP: reliable protocolTCP: reliable protocol• Guarantees end-to-end deliveryGuarantees end-to-end delivery• Self-controls rate: congestion and flow controlSelf-controls rate: congestion and flow control• Connection oriented: handshake, stateConnection oriented: handshake, state• Ordered delivery of packets to applicationOrdered delivery of packets to application
UDP: unreliable protocolUDP: unreliable protocol• Non-regulated sending rateNon-regulated sending rate• Multiplexing-demultiplexingMultiplexing-demultiplexing
Advanced Networks 2002
5
TCP: What and How For more: RFCs: 793, 1122, 1323, 2018, 2581
full duplex data:full duplex data:• bi-directional data flow in bi-directional data flow in
same connectionsame connection• MSS: maximum segment MSS: maximum segment
sizesize
connection-oriented:connection-oriented: • handshaking (exchange handshaking (exchange
of control msgs) init’s of control msgs) init’s sender, receiver state sender, receiver state before data exchangebefore data exchange
flow controlled:flow controlled:• sender will not overwhelm sender will not overwhelm
receiverreceiver
point-to-point:point-to-point:• one sender, one receiverone sender, one receiver
reliable, in-order reliable, in-order byte byte steam:steam:• no “message boundaries”no “message boundaries”
pipelined:pipelined:• TCP congestion and flow TCP congestion and flow
control set window sizecontrol set window size
send & receive bufferssend & receive buffers
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
Advanced Networks 2002
6
TCP segment structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
Advanced Networks 2002
7
TCP overview
TCP is a sliding window protocol• Sender can have (Window) bytes in flight
Operates with cumulative ACKsIt includes control for the sending rate• Flow control: receiver-set sending rate• Congestion control: network-aware sending
rate
Congwin
Advanced Networks 2002
8
TCP seq. #’s and ACKsSeq. #’s:Seq. #’s:
• byte stream byte stream “number” of first “number” of first byte in segment’s byte in segment’s datadata
ACKs:ACKs:• seq # of next byte seq # of next byte
expected from other expected from other sideside
• cumulative ACKcumulative ACKQ:Q: how receiver handles how receiver handles
out-of-order segmentsout-of-order segments• A: TCP spec doesn’t A: TCP spec doesn’t
say, - up to say, - up to implementorimplementor
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
Advanced Networks 2002
9
TCP in a nutshell
I. Slow start phase (actually this is fast increase)I. Slow start phase (actually this is fast increase)• Start with a window of 1 (or 2)Start with a window of 1 (or 2)• Successful ACK: Increase window by one 1 max size segmentSuccessful ACK: Increase window by one 1 max size segment• Do this up to a threshold: sshthreshDo this up to a threshold: sshthresh
II. Congestion control phaseII. Congestion control phase• Increase window by 1 max size segment every RTTIncrease window by 1 max size segment every RTT• Drop window in half, if there is congestionDrop window in half, if there is congestion
Packet loss: duplicate ACKsPacket loss: duplicate ACKs Time expirationTime expiration
Advanced Networks 2002
10
TCP Congestion Controlend-end control (no network assistance)end-end control (no network assistance)transmission rate limited by congestion window size, transmission rate limited by congestion window size, CongwinCongwin, over segments:, over segments:
w segments, each with MSS bytes sent in one RTT:
throughput = w * MSS
RTT Bytes/sec
Congwin
Advanced Networks 2002
11
TCP congestion control: Intuition
TCP is “TCP is “probing”probing” for usable bandwidth: for usable bandwidth:
ideally:ideally: transmit as fast as possible ( transmit as fast as possible (CongwinCongwin as large as possible) without lossas large as possible) without loss
increaseincrease CongwinCongwin until loss (congestion) until loss (congestion)
loss: loss: decreasedecrease CongwinCongwin, then begin probing , then begin probing (increasing) again(increasing) again
Advanced Networks 2002
12
TCP congestion control:
TCP has two “phases”TCP has two “phases”• slow start:slow start:
start from small, increase quicklystart from small, increase quickly• congestion avoidance: congestion avoidance:
Additive Increase Multiplicative DecreaseAdditive Increase Multiplicative Decrease
important variables:important variables:• CongwinCongwin• threshold:threshold: defines threshold between two slow defines threshold between two slow
start phase, congestion control phasestart phase, congestion control phase
Advanced Networks 2002
13
TCP Slowstart
exponential increase (per exponential increase (per RTT) in window size RTT) in window size loss event: timeout (Tahoe loss event: timeout (Tahoe TCP) and/or or three TCP) and/or or three duplicate ACKs (Reno TCP)duplicate ACKs (Reno TCP)
initialize: Congwin = 1for (each segment ACKed) Congwin++until (loss event OR CongWin > threshold)
Slowstart algorithmHost A
one segment
RTT
Host B
time
two segments
four segments
Advanced Networks 2002
14
Why Call it Slow Start ?
The original version of TCP suggested that the sender The original version of TCP suggested that the sender transmit as much as the Advertised Window permitted.transmit as much as the Advertised Window permitted.
Routers may not be able to cope with this “burst” of Routers may not be able to cope with this “burst” of transmissions.transmissions.
Slow start is slower than the above version -- ensures that a Slow start is slower than the above version -- ensures that a transmission burst does not happen at once.transmission burst does not happen at once.
Advanced Networks 2002
15
TCP Congestion Avoidance
/* slowstart is over */ /* Congwin > threshold */Until (loss event) { every w segments ACKed: Congwin++ }threshold = Congwin/2Congwin = 1perform slowstart
Congestion avoidance
1
1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs
Advanced Networks 2002
16
TCP Congestion: Real Life is Hairy!
Remember: bytes vs Remember: bytes vs packets!packets!
CW += MSS * MSS/CWCW += MSS * MSS/CW
Thres = Max( 2* MSS,Thres = Max( 2* MSS,
InFlightData/2)InFlightData/2)
MSS: max segment sizeMSS: max segment size
InFlighData: un-ACK-ed dataInFlighData: un-ACK-ed data
/* slowstart is over */ /* Congwin > threshold */Until (loss event) { every w segments ACKed: Congwin++ }threshold = Congwin/2Congwin = 1 perform slowstart
Congestion avoidance
1
RFC 2581: TCP Congestion Control
Advanced Networks 2002
17
Fairness goal:Fairness goal: if N TCP if N TCP sessions share same sessions share same bottleneck link, each bottleneck link, each should get 1/N of link should get 1/N of link capacitycapacity
TCP congestion TCP congestion avoidance:avoidance:
AIMD:AIMD: additive additive increase, increase, multiplicative multiplicative decreasedecrease• increase window by 1 increase window by 1
per RTTper RTT• decrease “window” by decrease “window” by
factor of 2 on loss factor of 2 on loss eventevent
TCP Fairness and AIMD
TCP connection 1
bottleneckrouter
capacity R
TCP connection 2
Advanced Networks 2002
18
Why is TCP fair?
Two competing sessions:Two competing sessions:Additive increase gives slope of 1, as throughout increasesAdditive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughputConnect
ion 2
th
roughput
congestion avoidance: additive increaseloss: decrease window by factor of 2
congestion avoidance: additive increaseloss: decrease window by factor of 2
Advanced Networks 2002
19
Macroscopic Description of Throughput
Assume window toggling: W/2 to WAssume window toggling: W/2 to W
High rate: W * MSS / RTTHigh rate: W * MSS / RTT
Low rate: W * MSS / 2 RTTLow rate: W * MSS / 2 RTT
Rate increase is linearly between two Rate increase is linearly between two extremesextremes
Average throughput:Average throughput:• 0.75 * W * MSS / RTT0.75 * W * MSS / RTT
Advanced Networks 2002
20
TCP: reliable data transfer
Simplified sender, assuming
waitfor
event
waitfor
event
event: data received from application above
event: timer timeout for segment with seq # y
event: ACK received,with ACK # y
create, send segment
retransmit segment
ACK processing
•one way data transfer•no flow, congestion control
Advanced Networks 2002
21
TCP sender00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */
SimplifiedTCPsender
Advanced Networks 2002
22
TCP Receiver: ACK generation [RFC 1122, RFC 2581]
Event
in-order segment arrival, no gaps,everything else already ACKed
in-order segment arrival, no gaps,one delayed ACK pending
out-of-order segment arrivalhigher-than-expect seq. #gap detected
arrival of segment that partially or completely fills gap
TCP Receiver action
delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
immediately send singlecumulative ACK
send duplicate ACK, indicating seq. #of next expected byte
immediate ACK if segment startsat lower end of gap
Advanced Networks 2002
23
TCP: retransmission scenarios
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
Host A
Seq=100, 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeout,cumulative ACKs
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
10
0 t
imeou
t
ACK=120
Advanced Networks 2002
24
TCP Round Trip Time and Timeout
Q:Q: how to set TCP how to set TCP timeout value?timeout value?longer than RTTlonger than RTT• note: RTT will varynote: RTT will vary
too short: premature too short: premature timeouttimeout• unnecessary unnecessary
retransmissionsretransmissions
too long: slow reaction too long: slow reaction to segment lossto segment loss
Q:Q: how to estimate RTT? how to estimate RTT?SampleRTTSampleRTT:: measured time from measured time from segment transmission until ACK segment transmission until ACK receiptreceipt• ignore retransmissions, ignore retransmissions,
cumulatively ACKed segmentscumulatively ACKed segments
SampleRTTSampleRTT will vary, want will vary, want estimated RTT “smoother”estimated RTT “smoother”• use several recent use several recent
measurements, not just current measurements, not just current SampleRTTSampleRTT
Advanced Networks 2002
25
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
Exponential weighted moving averageinfluence of given sample decreases exponentially fasttypical value of x: 0.1
Setting the timeoutSetting the timeoutEstimtedRTTEstimtedRTT plus “safety margin” plus “safety margin”
large variation in large variation in EstimatedRTT ->EstimatedRTT -> larger safety margin larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|
Advanced Networks 2002
26
A problem
Sender Receiver
Original transmission
ACK
Retransmission
Sender Receiver
Original transmission
ACKRetransmission
(a) (b)
• When there are retransmissions, it is unclear if the ACK is for the original transmission or for a retransmission.
• How do we overcome this ?
Advanced Networks 2002
27
The Karn Patridge Algorithm
Take SampleRTT measurements only for segments that Take SampleRTT measurements only for segments that have been sent once !have been sent once !
This eliminates the possibility that wrong RTT estimates This eliminates the possibility that wrong RTT estimates are factored into the estimation.are factored into the estimation.
Another change -- Each time TCP retransmits, it sets the Another change -- Each time TCP retransmits, it sets the next timeout to 2 X Last timeout --> This is called the next timeout to 2 X Last timeout --> This is called the Exponential Back-off (primarily for avoiding congestion).Exponential Back-off (primarily for avoiding congestion).
Advanced Networks 2002
28
Jacobson Karels Algorithm
An issue with the Karn/Patridge scheme is that it does not An issue with the Karn/Patridge scheme is that it does not take into account the variation between RTT samples.take into account the variation between RTT samples.New method proposed -- the Jacobson Karels Algorithm.New method proposed -- the Jacobson Karels Algorithm.Estimated RTT = Estimated RTT + Estimated RTT = Estimated RTT + X Difference X Difference• Difference = Sample RTT - Estimated RTTDifference = Sample RTT - Estimated RTT
Deviation = Deviation + Deviation = Deviation + (|Difference| - deviation) (|Difference| - deviation)Timeout = Timeout = Estimated RTT + Estimated RTT + deviation. deviation.The values of The values of and and are computed based on experience -- are computed based on experience -- Typically Typically = 1 and = 1 and = 4. = 4.
Advanced Networks 2002
29
Silly Window Syndrome
Suppose a MSS worth of data is collected and advertised window Suppose a MSS worth of data is collected and advertised window is MSS/2.is MSS/2.
What should the sender do ? -- transmit half full segments or wait What should the sender do ? -- transmit half full segments or wait to send a full MSS when window opens ?to send a full MSS when window opens ?
Early implementations were aggressive -- transmit MSS/2.Early implementations were aggressive -- transmit MSS/2.
Aggressively doing this, would consistently result in small Aggressively doing this, would consistently result in small segment sizes -- called the Silly Window Syndrome.segment sizes -- called the Silly Window Syndrome.
Advanced Networks 2002
30
Issues ..
We cannot eliminate the possibility of small segments being We cannot eliminate the possibility of small segments being sent.sent.
However, we can introduce methods to coalesce small However, we can introduce methods to coalesce small chunks.chunks.• Delaying ACKs -- receiver does not send ACKs as soon as it Delaying ACKs -- receiver does not send ACKs as soon as it
receives segments.receives segments. How long to delay ? Not very clear.How long to delay ? Not very clear.
• Ultimate solution falls to the sender -- when should I transmit ?Ultimate solution falls to the sender -- when should I transmit ?
Advanced Networks 2002
31
Nagle’s Algorithm
If sender waits too long --> bad for interactive connections.If sender waits too long --> bad for interactive connections.
If it does not wait long enough -- silly window syndrome.If it does not wait long enough -- silly window syndrome.
How do we solve this?How do we solve this?
Timer -- clock basedTimer -- clock based• If both available data and Window ≥ MSS, send full segment.If both available data and Window ≥ MSS, send full segment.• Else, if there is unACKed data in flight, buffer new data until Else, if there is unACKed data in flight, buffer new data until
ACK returns.ACK returns.• Else, send new data now.Else, send new data now.
Note -- Socket interface allows some applications to turn off Note -- Socket interface allows some applications to turn off Nagle’s algorithm by setting the TCP-NODELAY option.Nagle’s algorithm by setting the TCP-NODELAY option.
Advanced Networks 2002
32
TCP Connection Management
Recall:Recall: TCP sender, receiver TCP sender, receiver establish “connection” establish “connection” before exchanging data before exchanging data segmentssegmentsinitialize TCP variables:initialize TCP variables:• seq. #sseq. #s• buffers, flow control info buffers, flow control info
(e.g. (e.g. RcvWindowRcvWindow))client:client: connection initiator connection initiator
Socket clientSocket = new Socket clientSocket = new Socket("hostname","port Socket("hostname","port
number");number"); server:server: contacted by client contacted by client
Socket connectionSocket = Socket connectionSocket = welcomeSocket.accept();welcomeSocket.accept();
Advanced Networks 2002
33
TCP Set-up
Three way handshake:Three way handshake:
Step 1:Step 1: client end system sends TCP SYN control segment to client end system sends TCP SYN control segment to serverserver• specifies initial seq #specifies initial seq #
Step 2:Step 2: server end system receives SYN, replies with SYNACK server end system receives SYN, replies with SYNACK control segmentcontrol segment
• ACKs received SYNACKs received SYN• allocates buffersallocates buffers• specifies server-> receiver initial seq. #specifies server-> receiver initial seq. #
Step 3:Step 3: Client replies with an ACK (using servers seq Client replies with an ACK (using servers seq number)number)
Advanced Networks 2002
34
TCP Connection Management (cont.)
Closing a connection:Closing a connection:
client closes socket:client closes socket: clientSocket.close();clientSocket.close();
Step 1:Step 1: clientclient end system sends end system sends TCP FIN control segment to serverTCP FIN control segment to server
Step 2:Step 2: serverserver receives FIN, replies receives FIN, replies with ACK. Closes connection, with ACK. Closes connection, sends FIN. sends FIN.
Last ACK is never ACK-ed!!Last ACK is never ACK-ed!!
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
Advanced Networks 2002
35
TCP Connection Management (cont.)
Step 3:Step 3: clientclient receives FIN, receives FIN, replies with ACK. replies with ACK.
• Enters “timed wait” - will Enters “timed wait” - will respond with ACK to respond with ACK to received FINs received FINs
Step 4:Step 4: serverserver, receives ACK. , receives ACK. Connection closed. Sends Connection closed. Sends FIN. FIN.
Last ACK is never ACK-edLast ACK is never ACK-ed
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed