1 CSE 524: Lecture 12 Transport Layer (Part 3). 2 Transport Layer Last class –CIDR exam question...
-
Upload
harvey-lyons -
Category
Documents
-
view
221 -
download
0
Transcript of 1 CSE 524: Lecture 12 Transport Layer (Part 3). 2 Transport Layer Last class –CIDR exam question...
1
CSE 524: Lecture 12
Transport Layer (Part 3)
2
Transport Layer
• Last class– CIDR exam question
– Specific transport layers• UDP
• This class– TCP
3
TL: TCP and Transport Layer Functions
• Demux to upper layer• Quality of service• Security• Delivery semantics• Flow control• Congestion control• Reliable data transfer
4
TL: TCP Overview RFCs: 793, 1122, 1323, 2018, 2581
• full duplex data:– bi-directional data flow in same
connection
– MSS: maximum segment size
• connection-oriented: – handshaking (exchange of
control msgs) init’s sender, receiver state before data exchange
– protocol implemented at ends (“fate-sharing”)
• flow and congestion controlled:– sender will not overwhelm
receiver or network
• point-to-point:– one sender, one receiver
• reliable, in-order byte steam:– no “message boundaries”
• pipelined:– TCP congestion and flow control
set window size
• send & receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
5
TL: TCP header
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberrcvr window size
ptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
6
TL: TCP connections
• TCP sender, receiver establish “connection” before exchanging data segments– initialize TCP variables:
• Initial sequence #s
• Buffers, flow control info (e.g. RcvWindow)
• Window scaling
• client: connection initiator
• server: contacted by client
• Java API Socket clientSocket = new Socket("hostname","port#”);
Socket connectionSocket = welcomeSocket.accept();
7
TL: TCP connections
• Three way handshake:– Step 1: client end system sends TCP SYN control segment to server
• specifies initial seq #• should be random to prevent spoofing ( http://www.rfc-editor.org/rfc
/rfc1948.txt )
– Step 2: server end system receives SYN, replies with SYNACK control segment
• ACKs received SYN• allocates buffers• specifies server-> receiver initial seq. #
– Step 3: client receives SYNACK control segment, replies with ACK and potentially data
• ACKs received SYNACK
• goes to established state
8
TL: TCP Connection Establishment
• A and B must agree on initial sequence number selection
• 3-way handshake
A B
SYN + Seq A
SYN+ACK-A + Seq B
ACK-B
9
TL: TCP Sequence Number Selection
• Why not simply chose 0?• Must avoid overlap with earlier incarnation• Client machine seq #0, initiates connection to server
with seq #0.– Client sends one byte and machine crashes
– Client reboots and initiates connection again
– Server thinks new incarnation is the same as old connection
10
TL: TCP Sequence Number Selection
• Why is selecting a random ISN Important?• Suppose machine X selects ISN based on predictable sequence• Fred has .rhosts to allow login to X from Y• Evil Ed attacks
– Disables host Y – denial of service attack– Make a bunch of connections to host X– Determine ISN pattern a guess next ISN– Fake pkt1: [<src Y><dst X>, guessed ISN]– Fake pkt2: desired command– Attack popularized by K. Mitnick
11
TL: TCP ISN selection and spoofing attacks
Ed
Y
X
.rhosts Y
1. Flood continuously
3. TCP SYNACK ACK spoofed Y ISN Send X ISN PACKET DROPPED!
2. Spoof TCP SYN from YWith spoofed Y ISN 6. Real acks
dropped so Ydoes not resetconnection4. Send ACK with guess of X’s ISN
as if you received TCP SYNACK
5. Send pre-canned rlogin/rsh messages rsh echo “Ed” >> .rhostsspoof acknowledgements
Ed7. Door now open, rlogin to X from Ed directly
12
TL: TCP connection setup
CLOSED
SYNSENT
SYNRCVD
ESTAB
LISTEN
active OPENcreate TCBSnd SYN
create TCB
passive OPEN
delete TCB
CLOSE
delete TCB
CLOSE
snd SYN
APP SEND
snd SYN ACKrcv SYN
Send FINCLOSE
rcv ACK of SYNSnd ACK
Rcv SYN, ACK
rcv SYN
snd ACK
13
TL: TCP connections
Data transfer for established connections using sequence numbers and sliding windows with cumulative ACKs
Seq. #’s:– byte stream “number” of first
byte in segment’s dataACKs:
– seq # of next byte expected from other side
– cumulative ACK– duplicate acks sent when out-of-
order packet received
See web traceJava API
connectionSocket.receive();clientSocket.send();
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
14
TL: TCP connections
Closing a connection:
Client-initiated close (reverse process for server-initiated close)
Java API: clientSocket.close();
Step 1: client end system sends TCP FIN control segment to server
Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN.
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
15
TL: TCP connections
Step 3: client receives FIN, replies with ACK.
– Enters “timed wait” - will respond with ACK to received FINs
Step 4: server, receives ACK. Connection closed.
Note: with small modification, can handle simultaneous FINs.
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
16
TL: TCP Connection Tear-down
Sender ReceiverFIN
FIN-ACK
FIN
FIN-ACK
Data write
Data ack
17
TL: TCP Connection Tear-down
CLOSING
CLOSEWAIT
FINWAIT-1
ESTAB
TIME WAIT
snd FIN
CLOSE
send FIN
CLOSE
rcv ACK of FIN
LAST-ACK
CLOSED
FIN WAIT-2
snd ACK
rcv FIN
delete TCB
Timeout=2msl
send FIN
CLOSE
send ACK
rcv FIN
snd ACK
rcv FIN
rcv ACK of FIN
snd ACK
rcv FIN+ACK
rcv ACK
18
TL: Time Wait Issues
• Cannot close connection immediately after receiving FIN– What if a new connection restarts and uses same sequence
number?
• Web servers not clients close connection first– Established Fin-Waits Time-Wait Closed– Why would this be a problem?
• Time-Wait state lasts for 2 * MSL– MSL is should be 120 seconds (is often 60s)– Servers often have order of magnitude more connections in
Time-Wait
19
TL: TCP connections
TCP clientlifecycle
TCP serverlifecycle
20
TL: TCP Demux to upper layer
multiplexing/demultiplexing:
• based on sender, receiver port numbers, IP addresses
– source, dest port #s in each segment
– recall: well-known port numbers for specific applications
– Servers wait on well known ports (/etc/services)
gathering data from multiple app processes, enveloping data with header (later used for demultiplexing)
source port # dest port #
32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
Multiplexing:
21
TL: TCP Demux to upper layer
host A server Bsource port: xdest. port: 23
source port:23dest. port: x
port use: simple telnet app
Web clienthost A
Webserver B
Web clienthost C
Source IP: CDest IP: B
source port: x
dest. port: 80
Source IP: CDest IP: B
source port: y
dest. port: 80
port use: Web server
Source IP: ADest IP: B
source port: x
dest. port: 80
22
TL: TCP Flow control
• TCP is a sliding window protocol– For window size n, can send up to n bytes without
receiving an acknowledgement
– When the data is acknowledged then the window slides forward
• Each packet advertises a window size– Indicates number of bytes the receiver has space for
• Original TCP always sent entire window– Congestion control now limits this
23
TL: TCP Flow control
receiver: explicitly informs sender of (dynamically changing) amount of free buffer space – RcvWindow field
in TCP segmentsender: keeps the amount
of transmitted, unACKed data less than most recently received RcvWindow
sender won’t overrun
receiver’s buffers bytransmitting too
much, too fast
flow control
receiver buffering
RcvBuffer = size or TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
24
TL: TCP Flow control
• What happens if window is 0?– Receiver updates window when application reads data
– What if this update is lost?• Deadlock
• TCP Persist timer– Sender periodically sends window probe packets
– Receiver responds with ACK and up-to-date window advertisement
25
TL: TCP flow control enhancements
• Problem: (Clark, 1982)– If receiver advertises small increases in the receive window
then the sender may waste time sending lots of small packets
• What happens if window is small?– Small packet problem known as “Silly window syndrome”
• Receiver advertises one byte window
• Sender sends one byte packet (1 byte data, 40 byte header = 4000% overhead)
26
TL: TCP flow control enhancements
• Solutions to silly window syndrome• Clark (1982)
– receiver avoidance– prevent receiver from advertising small windows– increase advertised receiver window by min(MSS, RecvBuffer/2)
• Nagle’s algorithm (1984)– sender avoidance– prevent sender from unnecessarily sending small packets– http://www.rfc-editor.org/rfc/rfc896.txt
• “Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged”
• Allow only one outstanding small (not full sized) segment that has not yet been acknowledged• Works for idle connections (no deadlock)• Works for telnet (send one-byte packets immediately)• Works for bulk data transfer (delay sending)
27
TL: TCP reliable data transfer
• Segment integrity• Acknowledgement generation• Retransmission
28
TL: TCP RDT segment integrity
• Checksum included in header• Is it sufficient to just checksum the packet contents?• No, need to ensure correct source/destination
– Pseudoheader – portion of IP hdr that are critical
– Checksum covers Pseudoheader, transport hdr, and packet body
– Layer violation, redundant with parts of IP checksum
29
TL: TCP RDT acks and timeouts
• TCP’s reliable data transfer approach– Cumulative acknowledgements
• Receiver sends back the byte number it expects to receive next
• Out of order packets generate duplicate acknowledgements– Receive 1, Ack 2
– Receive 4, Ack 2
– Receive 3, Ack 2
– Receive 2, Ack 5
– Retransmissions• Sender sends segment and sets a timer
• Waits for an acknowledgement indicating segment was received– Send 1
– Wait for Ack 2
– No Ack 2 and timer expires
– Send 1 again
30
TL: TCP RDT acks and timeouts
simplified sender, assuming
waitfor
event
waitfor
event
event: data received from application above
event: timer timeout for segment with seq # y
event: ACK received,with ACK # y
create, send segment
retransmit segment
ACK processing
•one way data transfer•no flow, congestion control
31
TL: TCP RDT acks and timeouts
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */
SimplifiedTCPsender
32
TL: TCP delayed acknowledgements
• Problem:– In request/response programs, you send separate ACK and Data packets
for each transaction• Delay ACK in order to send ACK back along with data
• Solution:– Don’t ACK data immediately
• Wait 200ms (must be less than 500ms – why?)• Must ACK every other packet• Must not delay duplicate ACKs
– Without delayed ACK: 40 byte ack + data packet– With delayed ACK: data packet includes ACK– See web trace example– Extensions for asymmetric links
• See later part of lecture
33
TL: TCP ACK generation [RFC 1122, RFC 2581]
Event
in-order segment arrival, no gaps,everything else already ACKed
in-order segment arrival, no gaps,one delayed ACK pending
out-of-order segment arrivalhigher-than-expect seq. #gap detected
arrival of segment that partially or completely fills gap
TCP Receiver action
delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
immediately send singlecumulative ACK
send duplicate ACK, indicating seq. #of next expected byte
immediate ACK if segment startsat lower end of gap
34
TL: TCP retransmission
• Wait at least one RTT before retransmitting packet• Importance of accurate RTT estimators:
– Estimator too low unneeded retransmissions– Estimator too high poor throughput, slow reaction to
segment loss
• RTT estimator must adapt to change in RTT– But not too fast, or too slow!
• Backing off the retransmission timeout– Exponential backoff– Double retransmission timer interval after every loss until
successful retransmission
35
TL: TCP retransmission scenarios
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
time lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
Host A
Seq=100, 20 bytes data
ACK=100
Seq=
92
tim
eout
time premature timeout,cumulative ACKs
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
10
0 t
imeou
t
ACK=120
36
TL: Initial Round-trip Estimator
• Round trip times exponentially averaged:
– Recommended value for x: 0.1-0.2• 0.125 for most TCP’s
– Influence of given sample decreases exponentially fast
• Retransmit timer set to RTT, where = 2– Every time timer expires, RTO exponentially backed-off– Like Ethernet
• Not good at preventing spurious timeouts
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
37
TL: Jacobson’s Retransmission Timeout
• Key observation:– At high loads round trip variance is high
– Need larger safety margin with larger variations in RTT
• Solution:– Base RTO value on RTT and standard deviation (RRTT)
38
TL: Jacobson’s Retransmission Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
Setting the timeout• EstimtedRTT plus “safety margin”
• large variation in EstimatedRTT -> larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|
39
TL: Retransmission Ambiguity
A B
ACK
SampleRTT
Original transmission
retransmission
RTO
A B
Original transmission
retransmissionSampleRTT
ACKRTOX
40
TL: Karn’s algorithm
• Accounts for retransmission ambiguity• If a segment has been retransmitted:
– Don’t count RTT sample on ACKs for this segment
– Keep backed off time-out for next packet
– Reuse RTT estimate only after one successful transmission
41
TL: Timer Granularity
• Many TCP implementations set RTO in multiples of 200,500,1000ms
• Why?– Avoid spurious timeouts – RTTs can vary quickly due to
cross traffic
– Make timers interrupts efficient
42
TL: TCP Congestion Control
• Motivated by ARPANET congestion collapse– Flow control, but no congestion control– Sender sends as much as the receiver resources will allow– Go-back-N on loss, burst out advertised window
• Congestion control– Extending control to network resources– Underlying design principle: packet conservation
• At equilibrium, inject packet into network only when one is removed• Basis for stability of physical systems (fluid model)
• Why was this not working before?– No equilibrium
• Solved by self-clocking and congestion window
– Spurious retransmissions• Solved by accurate RTO estimation (see earlier discussion)
– Resource limitations prevent equilibrium• Solved by congestion window and congestion avoidance algorithms
43
TL: TCP congestion control basics
• Keep a congestion window, cwnd– Book calls this “Congwin”
– Denotes how much network is able to absorb
– Sender’s maximum window:• Min (receiver’s advertised window, cwnd)
– Sender’s actual window:• Max window - unacknowledged segments
44
TL: TCP Congestion Control
• end-end control (no network assistance)• transmission rate limited by congestion window size, cwnd over
segments:
• w segments, each with MSS bytes sent in one RTT:
throughput = w * MSS
RTT Bytes/sec
cwnd
45
TL: TCP congestion control:
• two “phases”– slow start
– congestion avoidance
• important variables:– cwnd– ssthresh: defines threshold
between two slow start phase, congestion control phase (Book calls this threshold)
• useful reference– http://www.aciri.org/floyd/pap
ers/sacks.ps.Z
• “probing” for usable bandwidth: – ideally: transmit as fast as
possible (cwnd as large as possible) without loss
– increase cwnd until loss (congestion)
– loss: decrease cwnd, then begin probing (increasing) again
46
TL: TCP slow start
• Start the self-clocking behavior of TCP– Use acks to clock sending new data
– Do not send entire advertised window in one shot
PrPb
Ar
Ab
ReceiverSender
As
47
TL: TCP slow start
• exponential increase (per RTT) in window size – Window actually increases to W
in RTT * log2(W)– Can overshoot window and cause
packet loss
initialize: cwnd = 1for (each segment ACKed) cwnd++until (loss event OR cwnd > ssthresh)
Slowstart algorithmHost A
one segment
RTT
Host B
time
two segments
four segments
48
TL: TCP slow start example
1
One RTT
One pkt time
0R
2
1R
3
4
2R
567
83R
91011
1213
1415
1
2 3
4 5 6 7
49
TL: TCP slow start sequence plot
Time
Sequence No
.
.
.