CSCI-1680 Transport Layer I Based partly on lecture notes by Rodrigro Foncesa, David Mazières, Phil...

34
CSCI-1680 Transport Layer I Based partly on lecture notes by Rodrigro Foncesa, David Mazières, Theophilus Benson

Transcript of CSCI-1680 Transport Layer I Based partly on lecture notes by Rodrigro Foncesa, David Mazières, Phil...

CSCI-1680Transport Layer I

Based partly on lecture notes by Rodrigro Foncesa, David Mazières, Phil Levis, John Jannotti

Theophilus Benson

Today

• Transport Layer– UDP– TCP Intro

• Connection Establishment

Transport Layer

• Transport protocols sit on top of network layer

• Problem solved: communication among processes– Application-level multiplexing (“ports”)– Error detection, reliability, etc.

UDP – User Datagram Protocol

• Unreliable, unordered datagram service

• Adds multiplexing, checksum

• End points identified by ports– Scope is an IP address (interface)

• Checksum aids in error detection

UDP Header

UDP Checksum

• Uses the same algorithm as the IP checksum– Set Checksum field to 0– Sum all 16-bit words, adding any carry bits to the

LSB– Flip bits to get checksum (except 0xffff->0xffff)– To check: sum whole packet, including sum, should

get 0xffff

• How many errors?– Catches any 1-bit error– Not all 2-bit errors

• Optional in IPv4: not checked if value is 0

Pseudo Header

• UDP Checksum is computed over pseudo-header prepended to the UDP header– For IPv4: IP Source, IP Dest, Protocol (=17), plus UDP

length

• What does this give us?

• What is a problem with this?– Is UDP a layer on top of IP?

0 7 8 15 16 23 24 31+--------+--------+--------+--------+| source address |+--------+--------+--------+--------+| destination address | +--------+--------+--------+--------+| zero |protocol| UDP length | +--------+--------+--------+--------+

UDP Does No Provide

• Reliability

• In-order delivery (or help with reordered packets)

Why not use Link Layer Reliability?

• Review: reliability on the link layer

Problem Mechanism

Acknowledgments + TimeoutDropped Packets

Duplicate Packets Sequence Numbers

Packets out of order Receiver Window

Keeping the pipe full Sliding Window (Pipelining)

• Single link: things were easy…

Transport Layer Reliability

• Extra difficulties– Multiple hosts (problems with share?)– Multiple hops (problems with diff loss/capacity)– Multiple potential paths

• Need for connection establishment, tear down– Analogy: dialing a number versus a direct line

• Varying RTTs– Both across connections and during a

connection– Why do they vary? What do they influence?

Extra Difficulties (cont.)

• Out of order packets– Not only because of drops/retransmissions– Can get very old packets (up to 120s), must not

get confused

• Unknown resources at other end– Must be able to discover receiver buffer: flow

control

• Unknown resources in the network– Should not overload the network– But should use as much as safely possible– Congestion Control (next class)

TCP

A Short History of TCP

• 1974: 3-way handshake

• 1978: IP and TCP split

• 1983: January 1st, ARPAnet switches to TCP/IP

• 1984: Nagle predicts congestion collapses

• 1986: Internet begins to suffer congestion collapses– LBL to Berkeley drops from 32Kbps to 40bps

• 1987/8: Van Jacobson fixes TCP, publishes seminal paper*: (TCP Tahoe)

• 1990: Fast transmit and fast recovery added (TCP Reno)

* Van Jacobson. Congestion avoidance and control. SIGCOMM ’88

TCP – Transmission Control Protocol

• Service model: “reliable, connection oriented, full duplex byte stream”– Endpoints: <IP Address, Port>

• Flow control– If one end stops reading, writes at other eventually

stop/fail

• Congestion control– Keeps sender from overloading the network

TCP

• Specification– RFC 793 (1981), RFC 1222 (1989, some corrections),

RFC 5681 (2009, congestion control), …

• Was born coupled with IP, later factored out– Why?

• End-to-end protocol– Minimal assumptions on the network– All mechanisms run on the end points

• Alternative idea:– Provide reliability, flow control, etc, link-by-link– Does it work?

Why not provide (*) on the network layer?

• Cost– These functionalities are not free: don’t

burden those who don’t need them

• Conflicting– Timeliness and in-order delivery, for example

• Insufficient– Example: reliability

* may be security, reliability, ordering guarantees, …

UDP v TCP

UDP

• Pretty much just IP with ports

• If you have a real-time app that doesn’t want to be slowed down by TCP

TCP

• Reliable

• Connection oriented

• Full duplex byte stream

TCP Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Header Fields

• Ports: multiplexing

• Sequence number– Correspond to bytes, not packets!

• Acknowledgment Number– Next expected sequence number

• Window: willing to receive– Lets receiver limit SWS (even to 0) for flow control

• Data Offset: # of 4 byte (header + option bytes)

• Flags, Checksum, Urgent Pointer

TCP Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Header Flags

• URG: whether there is urgent data

• ACK: ack no. valid (all but first segment)

• PSH: push data to the application immediately

• RST: reset connection

• SYN: synchronize, establishes connection

• FIN: close connection

Establishing a Connection

• Three-way handshake– Two sides agree on respective initial

sequence nums

• If no one is listening on port: server sends RST

• If server is overloaded: ignore SYN• If no SYN-ACK: retry, timeout

Listen, Accept…

Accept returns

Connect

Connection Termination

• FIN bit says no more data to send– Caused by close or shutdown– Both sides must send FIN to close a connection

• Typical closeFIN

ACK

FIN

ACK

Close

Close

FIN_WAIT_1

CLOSE_WAIT

FIN_WAIT_2

LAST_ACKTIME_WAIT

CLOSED

CLOSED

2MSL

I’m done. good bye

I’m done. Good byte

No, I didn’t get all your data

Sure stop talking

Sure stop talking

Summary of TCP States

Passive close:Can still send!Active close:

Can still receive

Conn

ectio

n Es

tabl

ishm

ent

Unsynchronized

Synchronized

From: The TIME−WAIT state in TCP and Its Effect on Busy Servers, Faber and TouchInfocom 1999

Good byeSure good

byteDone.

TIME_WAIT

• Why do you have to wait for 2MSL in TIME_WAIT?– What if last ack is severely delayed, AND– Same port pair is immediately reused for a new connection?

• Solution: active closer goes into TIME_WAIT– Waits for 2MSL (Maximum Segment Lifetime)

• Can be problematic for active servers– OS has too many sockets in TIME_WAIT, can accept less

connections• Hack: send RST and delete socket, SO_LINGER = 0

– OS won’t let you re-start server because port in use• SO_REUSEADDR lets you rebind

First Goal

• We should not send more data than the receiver can take: flow control

• When to send data?– Sender can delay sends to get larger

segments

• How much data to send?– Data is sent in MSS-sized segments

• Chosen to avoid fragmentation

Flow Control

• Part of TCP specification (even before 1988)

• Receiver uses window header field to tell sender how much space it has

Flow Control

• Receiver: AdvertisedWindow (how much room left)

= MaxRcvBuffer – ((NextByteExpected-1) – LastByteRead)

• Sender: LastByteSent – LastByteAcked <= AdvertisedWindow

Flow Control

• Advertised window can fall to 0– How?– Sender eventually stops sending, blocks

application

• Sender keeps sending 1-byte segments until window comes back > 0

When to Transmit?

• Nagle’s algorithm• Goal: reduce the overhead of small

packetsIf available data and window >= MSS

Send a MSS segmentelse

If there is unAcked data in flightbuffer the new data until ACK arrives

elsesend all the new data now

• Receiver should avoid advertising a window <= MSS after advertising a window of 0

Delayed Acknowledgments

• Goal: Piggy-back ACKs on data– Delay ACK for 200ms in case application

sends data– If more data received, immediately ACK

second segment– Note: never delay duplicate ACKs (if missing a

segment)

• Warning: can interact very badly with Nagle– Temporary deadlock– Can disable Nagle with TCP_NODELAY– Application can also avoid many small writes

Limitations of Flow Control

• Network may be the bottleneck

• Signal from receiver not enough!

• Sending too fast will cause queue overflows, heavy packet loss

• Flow control provides correctness

• Need more for performance: congestion control

Next class: Second goal

• We should not send more data than the network can take: congestion control