1 CSE 524: Lecture 12 Transport Layer (Part 3). 2 Transport Layer Last class –CIDR exam question...

1

CSE 524: Lecture 12

Transport Layer (Part 3)

2

Transport Layer

• Last class– CIDR exam question

– Specific transport layers• UDP

• This class– TCP

3

TL: TCP and Transport Layer Functions

• Demux to upper layer• Quality of service• Security• Delivery semantics• Flow control• Congestion control• Reliable data transfer

4

TL: TCP Overview RFCs: 793, 1122, 1323, 2018, 2581

• full duplex data:– bi-directional data flow in same

connection

– MSS: maximum segment size

• connection-oriented: – handshaking (exchange of

control msgs) init’s sender, receiver state before data exchange

– protocol implemented at ends (“fate-sharing”)

• flow and congestion controlled:– sender will not overwhelm

receiver or network

• point-to-point:– one sender, one receiver

• reliable, in-order byte steam:– no “message boundaries”

• pipelined:– TCP congestion and flow control

set window size

• send & receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

5

TL: TCP header

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

6

TL: TCP connections

• TCP sender, receiver establish “connection” before exchanging data segments– initialize TCP variables:

• Initial sequence #s

• Buffers, flow control info (e.g. RcvWindow)

• Window scaling

• client: connection initiator

• server: contacted by client

• Java API Socket clientSocket = new Socket("hostname","port#”);

Socket connectionSocket = welcomeSocket.accept();

7

TL: TCP connections

• Three way handshake:– Step 1: client end system sends TCP SYN control segment to server

• specifies initial seq #• should be random to prevent spoofing ( http://www.rfc-editor.org/rfc

/rfc1948.txt )

– Step 2: server end system receives SYN, replies with SYNACK control segment

• ACKs received SYN• allocates buffers• specifies server-> receiver initial seq. #

– Step 3: client receives SYNACK control segment, replies with ACK and potentially data

• ACKs received SYNACK

• goes to established state

http://www.rfc-editor.org/rfc/rfc1948.txt





8

TL: TCP Connection Establishment

• A and B must agree on initial sequence number selection

• 3-way handshake

A B

SYN + Seq A

SYN+ACK-A + Seq B

ACK-B

9

TL: TCP Sequence Number Selection

• Why not simply chose 0?• Must avoid overlap with earlier incarnation• Client machine seq #0, initiates connection to server

with seq #0.– Client sends one byte and machine crashes

– Client reboots and initiates connection again

– Server thinks new incarnation is the same as old connection

10

TL: TCP Sequence Number Selection

• Why is selecting a random ISN Important?• Suppose machine X selects ISN based on predictable sequence• Fred has .rhosts to allow login to X from Y• Evil Ed attacks

– Disables host Y – denial of service attack– Make a bunch of connections to host X– Determine ISN pattern a guess next ISN– Fake pkt1: [<src Y><dst X>, guessed ISN]– Fake pkt2: desired command– Attack popularized by K. Mitnick

11

TL: TCP ISN selection and spoofing attacks

Ed

Y

X

.rhosts Y

1. Flood continuously

3. TCP SYNACK ACK spoofed Y ISN Send X ISN PACKET DROPPED!

2. Spoof TCP SYN from YWith spoofed Y ISN 6. Real acks

dropped so Ydoes not resetconnection4. Send ACK with guess of X’s ISN

as if you received TCP SYNACK

5. Send pre-canned rlogin/rsh messages rsh echo “Ed” >> .rhostsspoof acknowledgements

Ed7. Door now open, rlogin to X from Ed directly

12

TL: TCP connection setup

CLOSED

SYNSENT

SYNRCVD

ESTAB

LISTEN

active OPENcreate TCBSnd SYN

create TCB

passive OPEN

delete TCB

CLOSE

delete TCB

CLOSE

snd SYN

APP SEND

snd SYN ACKrcv SYN

Send FINCLOSE

rcv ACK of SYNSnd ACK

Rcv SYN, ACK

rcv SYN

snd ACK

13

TL: TCP connections

Data transfer for established connections using sequence numbers and sliding windows with cumulative ACKs

Seq. #’s:– byte stream “number” of first

byte in segment’s dataACKs:

– seq # of next byte expected from other side

– cumulative ACK– duplicate acks sent when out-of-

order packet received

See web traceJava API

connectionSocket.receive();clientSocket.send();

Host A Host B

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of

‘C’, echoesback ‘C’

timesimple telnet scenario

14

TL: TCP connections

Closing a connection:

Client-initiated close (reverse process for server-initiated close)

Java API: clientSocket.close();

Step 1: client end system sends TCP FIN control segment to server

Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN.

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

15

TL: TCP connections

Step 3: client receives FIN, replies with ACK.

– Enters “timed wait” - will respond with ACK to received FINs

Step 4: server, receives ACK. Connection closed.

Note: with small modification, can handle simultaneous FINs.

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

16

TL: TCP Connection Tear-down

Sender ReceiverFIN

FIN-ACK

FIN

FIN-ACK

Data write

Data ack

17

TL: TCP Connection Tear-down

CLOSING

CLOSEWAIT

FINWAIT-1

ESTAB

TIME WAIT

snd FIN

CLOSE

send FIN

CLOSE

rcv ACK of FIN

LAST-ACK

CLOSED

FIN WAIT-2

snd ACK

rcv FIN

delete TCB

Timeout=2msl

send FIN

CLOSE

send ACK

rcv FIN

snd ACK

rcv FIN

rcv ACK of FIN

snd ACK

rcv FIN+ACK

rcv ACK

18

TL: Time Wait Issues

• Cannot close connection immediately after receiving FIN– What if a new connection restarts and uses same sequence

number?

• Web servers not clients close connection first– Established Fin-Waits Time-Wait Closed– Why would this be a problem?

• Time-Wait state lasts for 2 * MSL– MSL is should be 120 seconds (is often 60s)– Servers often have order of magnitude more connections in

Time-Wait

19

TL: TCP connections

TCP clientlifecycle

TCP serverlifecycle

20

TL: TCP Demux to upper layer

multiplexing/demultiplexing:

• based on sender, receiver port numbers, IP addresses

– source, dest port #s in each segment

– recall: well-known port numbers for specific applications

– Servers wait on well known ports (/etc/services)

gathering data from multiple app processes, enveloping data with header (later used for demultiplexing)

source port # dest port #

32 bits

applicationdata

(message)

other header fields

TCP/UDP segment format

Multiplexing:

21

TL: TCP Demux to upper layer

host A server Bsource port: xdest. port: 23

source port:23dest. port: x

port use: simple telnet app

Web clienthost A

Webserver B

Web clienthost C

Source IP: CDest IP: B

source port: x

dest. port: 80

Source IP: CDest IP: B

source port: y

dest. port: 80

port use: Web server

Source IP: ADest IP: B

source port: x

dest. port: 80

22

TL: TCP Flow control

• TCP is a sliding window protocol– For window size n, can send up to n bytes without

receiving an acknowledgement

– When the data is acknowledged then the window slides forward

• Each packet advertises a window size– Indicates number of bytes the receiver has space for

• Original TCP always sent entire window– Congestion control now limits this

23


receiver: explicitly informs sender of (dynamically changing) amount of free buffer space – RcvWindow field

in TCP segmentsender: keeps the amount

of transmitted, unACKed data less than most recently received RcvWindow

sender won’t overrun

receiver’s buffers bytransmitting too

much, too fast

flow control

receiver buffering

RcvBuffer = size or TCP Receive Buffer

RcvWindow = amount of spare room in Buffer

24


• What happens if window is 0?– Receiver updates window when application reads data

– What if this update is lost?• Deadlock

• TCP Persist timer– Sender periodically sends window probe packets

– Receiver responds with ACK and up-to-date window advertisement

25

TL: TCP flow control enhancements

• Problem: (Clark, 1982)– If receiver advertises small increases in the receive window

then the sender may waste time sending lots of small packets

• What happens if window is small?– Small packet problem known as “Silly window syndrome”

• Receiver advertises one byte window

• Sender sends one byte packet (1 byte data, 40 byte header = 4000% overhead)

26

TL: TCP flow control enhancements

• Solutions to silly window syndrome• Clark (1982)

– receiver avoidance– prevent receiver from advertising small windows– increase advertised receiver window by min(MSS, RecvBuffer/2)

• Nagle’s algorithm (1984)– sender avoidance– prevent sender from unnecessarily sending small packets– http://www.rfc-editor.org/rfc/rfc896.txt

• “Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged”

• Allow only one outstanding small (not full sized) segment that has not yet been acknowledged• Works for idle connections (no deadlock)• Works for telnet (send one-byte packets immediately)• Works for bulk data transfer (delay sending)






27

TL: TCP reliable data transfer

• Segment integrity• Acknowledgement generation• Retransmission

28

TL: TCP RDT segment integrity

• Checksum included in header• Is it sufficient to just checksum the packet contents?• No, need to ensure correct source/destination

– Pseudoheader – portion of IP hdr that are critical

– Checksum covers Pseudoheader, transport hdr, and packet body

– Layer violation, redundant with parts of IP checksum

29

TL: TCP RDT acks and timeouts

• TCP’s reliable data transfer approach– Cumulative acknowledgements

• Receiver sends back the byte number it expects to receive next

• Out of order packets generate duplicate acknowledgements– Receive 1, Ack 2

– Receive 4, Ack 2



– Retransmissions• Sender sends segment and sets a timer

• Waits for an acknowledgement indicating segment was received– Send 1

– Wait for Ack 2

– No Ack 2 and timer expires

– Send 1 again

30


simplified sender, assuming

waitfor

event

waitfor

event

event: data received from application above

event: timer timeout for segment with seq # y

event: ACK received,with ACK # y

create, send segment

retransmit segment

ACK processing

•one way data transfer•no flow, congestion control

31


00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */

SimplifiedTCPsender

32

TL: TCP delayed acknowledgements

• Problem:– In request/response programs, you send separate ACK and Data packets

for each transaction• Delay ACK in order to send ACK back along with data

• Solution:– Don’t ACK data immediately

• Wait 200ms (must be less than 500ms – why?)• Must ACK every other packet• Must not delay duplicate ACKs

– Without delayed ACK: 40 byte ack + data packet– With delayed ACK: data packet includes ACK– See web trace example– Extensions for asymmetric links

• See later part of lecture

33

TL: TCP ACK generation [RFC 1122, RFC 2581]

Event

in-order segment arrival, no gaps,everything else already ACKed

in-order segment arrival, no gaps,one delayed ACK pending

out-of-order segment arrivalhigher-than-expect seq. #gap detected

arrival of segment that partially or completely fills gap

TCP Receiver action

delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK

immediately send singlecumulative ACK

send duplicate ACK, indicating seq. #of next expected byte

immediate ACK if segment startsat lower end of gap

34

TL: TCP retransmission

• Wait at least one RTT before retransmitting packet• Importance of accurate RTT estimators:

– Estimator too low unneeded retransmissions– Estimator too high poor throughput, slow reaction to

segment loss

• RTT estimator must adapt to change in RTT– But not too fast, or too slow!

• Backing off the retransmission timeout– Exponential backoff– Double retransmission timer interval after every loss until

successful retransmission

35

TL: TCP retransmission scenarios

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X


ACK=100

Host A


ACK=100

Seq=

92

tim

eout

time premature timeout,cumulative ACKs

Host B


ACK=120


Seq=

10

0 t

imeou

t

ACK=120

36

TL: Initial Round-trip Estimator

• Round trip times exponentially averaged:

– Recommended value for x: 0.1-0.2• 0.125 for most TCP’s

– Influence of given sample decreases exponentially fast

• Retransmit timer set to RTT, where = 2– Every time timer expires, RTO exponentially backed-off– Like Ethernet

• Not good at preventing spurious timeouts

EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT

37

TL: Jacobson’s Retransmission Timeout

• Key observation:– At high loads round trip variance is high

– Need larger safety margin with larger variations in RTT

• Solution:– Base RTO value on RTT and standard deviation (RRTT)

38

TL: Jacobson’s Retransmission Timeout

EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT

Setting the timeout• EstimtedRTT plus “safety margin”

• large variation in EstimatedRTT -> larger safety margin

Timeout = EstimatedRTT + 4*Deviation

Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|

39

TL: Retransmission Ambiguity

A B

ACK

SampleRTT

Original transmission

retransmission

RTO

A B

Original transmission

retransmissionSampleRTT

ACKRTOX

40

TL: Karn’s algorithm

• Accounts for retransmission ambiguity• If a segment has been retransmitted:

– Don’t count RTT sample on ACKs for this segment

– Keep backed off time-out for next packet

– Reuse RTT estimate only after one successful transmission

41

TL: Timer Granularity

• Many TCP implementations set RTO in multiples of 200,500,1000ms

• Why?– Avoid spurious timeouts – RTTs can vary quickly due to

cross traffic

– Make timers interrupts efficient

42

TL: TCP Congestion Control

• Motivated by ARPANET congestion collapse– Flow control, but no congestion control– Sender sends as much as the receiver resources will allow– Go-back-N on loss, burst out advertised window

• Congestion control– Extending control to network resources– Underlying design principle: packet conservation

• At equilibrium, inject packet into network only when one is removed• Basis for stability of physical systems (fluid model)

• Why was this not working before?– No equilibrium

• Solved by self-clocking and congestion window

– Spurious retransmissions• Solved by accurate RTO estimation (see earlier discussion)

– Resource limitations prevent equilibrium• Solved by congestion window and congestion avoidance algorithms

43

TL: TCP congestion control basics

• Keep a congestion window, cwnd– Book calls this “Congwin”

– Denotes how much network is able to absorb

– Sender’s maximum window:• Min (receiver’s advertised window, cwnd)

– Sender’s actual window:• Max window - unacknowledged segments

44

TL: TCP Congestion Control

• end-end control (no network assistance)• transmission rate limited by congestion window size, cwnd over

segments:

• w segments, each with MSS bytes sent in one RTT:

throughput = w * MSS

RTT Bytes/sec

cwnd

45

TL: TCP congestion control:

• two “phases”– slow start

– congestion avoidance

• important variables:– cwnd– ssthresh: defines threshold

between two slow start phase, congestion control phase (Book calls this threshold)

• useful reference– http://www.aciri.org/floyd/pap

ers/sacks.ps.Z

• “probing” for usable bandwidth: – ideally: transmit as fast as

possible (cwnd as large as possible) without loss

– increase cwnd until loss (congestion)

– loss: decrease cwnd, then begin probing (increasing) again

http://www.aciri.org/floyd/papers/sacks.ps.Z

http://www.aciri.org/floyd/papers/sacks.ps.Z

46

TL: TCP slow start

• Start the self-clocking behavior of TCP– Use acks to clock sending new data

– Do not send entire advertised window in one shot

PrPb

Ar

Ab

ReceiverSender

As

47

TL: TCP slow start

• exponential increase (per RTT) in window size – Window actually increases to W

in RTT * log2(W)– Can overshoot window and cause

packet loss

initialize: cwnd = 1for (each segment ACKed) cwnd++until (loss event OR cwnd > ssthresh)

Slowstart algorithmHost A

one segment

RTT

Host B

time

two segments

four segments

48

TL: TCP slow start example

1

One RTT

One pkt time

0R

2

1R

3

4

2R

567

83R

91011

1213

1415

1

2 3

4 5 6 7

49

TL: TCP slow start sequence plot

Time

Sequence No

.

.

.

1 CSE 524: Lecture 12 Transport Layer (Part 3). 2 Transport Layer Last class –CIDR exam question...

Documents

Transcript of 1 CSE 524: Lecture 12 Transport Layer (Part 3). 2 Transport Layer Last class –CIDR exam question...