11 Transport

download 11 Transport

of 80

Transcript of 11 Transport

  • 8/11/2019 11 Transport

    1/80

    CS 4700 / CS 5700Network Fundamentals

    Lecture 11: Transport

    (UDP, but mostly TCP)

    Revised 7/27/2013

  • 8/11/2019 11 Transport

    2/80

    Transport Layer2

    Function:

    Demultiplexing of data streams

    Optional functions:

    Creating long lived connections Reliable, in-order packet delivery

    Error detection

    Flow and congestion control

    Key challenges: Detecting and responding to

    congestion

    Balancing fairness against high

    utilization

    Application

    PresentationSession

    Transport

    NetworkData Link

    Physical

  • 8/11/2019 11 Transport

    3/80

    UDP

    TCP

    Congestion Control

    Evolution of TCP

    Problems with TCP

    Outline3

  • 8/11/2019 11 Transport

    4/80

    The Case for Multiplexing4

    Datagram network

    No circuits

    No connections

    Clients run many applications

    at the same time

    Who to deliver packets to?

    IP header protocol field 8 bits = 256 concurrent streams

    Insert Transport Layer to

    handle demultiplexing Packet

    Network

    Data LinkPhysical

    Transport

  • 8/11/2019 11 Transport

    5/80

    Demultiplexing Traffic5

    Endpoints identified by

    Network

    Transport

    Application

    P1 P2 P3 P4 P6 P7P5

    Host 1 Host 2 Host 3

    Unique port for

    each application

    Applications

    share the samenetwork

    Server applications

    communicate withmultiple clients

  • 8/11/2019 11 Transport

    6/80

    Layering, Revisited6

    Application

    Transport

    Network

    Data Link

    Physical

    Host 1 Router Host 2

    Physical

    Lowest level end-to-end protocol (in theory)

    Transport header only read by source and destination

    Routers view transport header as payload

    Application

    Transport

    Network

    Data Link

    Physical

    Network

    Data Link

    Layers communicatepeer-to-peer

  • 8/11/2019 11 Transport

    7/80

    User Datagram Protocol (UDP)7

    Simple, connectionless datagram C sockets: SOCK_DGRAM

    Port numbers enable demultiplexing

    16 bits = 65535 possible ports

    Port 0 is invalid

    Checksum for error detection

    Detects (some) corrupt packets

    Does not detect dropped, duplicated, or reordered packets

    Destination Port

    0 16 31

    Payload Length

    Source Port

    Checksum

  • 8/11/2019 11 Transport

    8/80

    Uses for UDP8

    Invented after TCP

    Why?

    Not all applications can tolerate TCP

    Custom protocols can be built on top of UDP

    Reliability? Strict ordering?

    Flow control? Congestion control?

    ExamplesRTMP, real-time media streaming (e.g. voice, video)

    Facebook datacenter protocol

  • 8/11/2019 11 Transport

    9/80

    UDP

    TCP

    Congestion Control

    Evolution of TCP

    Problems with TCP

    Outline9

  • 8/11/2019 11 Transport

    10/80

    Options

    Transmission Control Protocol10

    Reliable, in-order, bi-directional byte streams

    Port numbers for demultiplexing

    Virtual circuits (connections)

    Flow controlCongestion control, approximate fairness

    Destination Port

    0 16 31

    Sequence Number

    Source Port

    Acknowledgement Number

    Advertised Window

    Urgent Pointer

    Flags

    Checksum

    Why these

    features?

    4

    HLen

  • 8/11/2019 11 Transport

    11/80

    Connection Setup11

    Why do we need connection setup?

    To establish state on both hosts

    Most important state: sequence numbers

    Count the number of bytes that have been sent

    Initial value chosen at random

    Why?

    Important TCP flags (1 bit each)

    SYNsynchronization, used for connection setup

    ACKacknowledge received data

    FINfinish, used to tear down connection

  • 8/11/2019 11 Transport

    12/80

    Three Way Handshake12

    Each side:

    Notifies the other of starting sequence number

    ACKs the other sides starting sequence number

    Client Server

    Why

    Sequence # +1?

  • 8/11/2019 11 Transport

    13/80

    Connection Setup Issues13

    Connection confusion

    How to disambiguate connections from the same host?

    Random sequence numbers

    Source spoofing

    Kevin Mitnick

    Need good random number generators!

    Connection state managementEach SYN allocates state on the server

    SYN flood = denial of service attack

    Solution: SYN cookies

  • 8/11/2019 11 Transport

    14/80

    Connection Tear Down14

    Either side can initiate

    tear down

    Other side may

    continue sending dataHalf open connection

    shutdown()

    Acknowledge the lastFIN

    Sequence number + 1

    Client Server

  • 8/11/2019 11 Transport

    15/80

    Sequence Number Space15

    TCP uses a byte stream abstraction

    Each byte in each stream is numbered

    32-bit value, wraps around

    Initial, random values selected during setup

    Byte stream broken down into segments (packets)

    Size limited by the Maximum Segment Size (MSS)

    Set to limit fragmentation Each segment has a sequence number

    Segment 8 Segment 9 Segment 10

    13450 14950 16050 17550

  • 8/11/2019 11 Transport

    16/80

    Bidirectional Communication16

    Each side of the connection can send and receive

    Different sequence numbers for each direction

    Client ServerSeq. Ack. Seq. Ack.1 23

    23 1461

    1461 753

    753 2921Data and ACK in

    the same packet

    23 1

  • 8/11/2019 11 Transport

    17/80

    Flow Control17

    Problem: how many packets should a sendertransmit?

    Too many packets may overwhelm the receiver

    Size of the receivers buffers may change over time Solution: sliding window

    Receiver tells the sender how big their buffer is

    Called the advertised window

    For window size n, sender may transmit nbytes withoutreceiving an ACK

    After each ACK, the window slides forward

    Window may go to zero!

  • 8/11/2019 11 Transport

    18/80

    Flow Control: Sender Side18

    Sequence Number

    Src. Port

    Acknowledgement NumberWindow

    Urgent Pointer

    Flags

    Checksum

    HL

    Packet Sent

    Dest. PortSrc. Port

    Acknowledgement NumberWindow

    Urgent Pointer

    Flags

    Checksum

    HL

    Packet Received

    Dest. Port

    Sequence Number

    ACKed Sent To Be Sent Outside Window

    Window

    App WriteMust be buffered

    until ACKed

  • 8/11/2019 11 Transport

    19/80

    Sliding Window Example19

    Time Time

    TCP is ACK Clocked

    Short RTTquick ACK window slides quickly Long RTTslow ACK window slides slowly

  • 8/11/2019 11 Transport

    20/80

    What Should the Receiver ACK?

    1. ACK every packet

    2. Use cumulative ACK, where an ACK for

    sequence nimplies ACKS for all k< n

    3. Use negative ACKs (NACKs), indicating which

    packet did not arrive

    4. Use selective ACKs (SACKs), indicating those

    that did arrive, even if not in order SACK is an actual TCP extension

    20

    20

  • 8/11/2019 11 Transport

    21/80

    Sequence Numbers, Revisited21

    32 bits, unsigned

    Why so big?

    For the sliding window you need

    |Sequence # Space| > 2 * |Sending Window Size|

    232> 2 * 216

    Guard against stray packets

    IP packets have a maximum segment lifetime (MSL) of120 seconds

    i.e. a packet can linger in the network for 3 minutes

    Sequence number would wrap around at 286Mbps

    What about GigE? PAWS algorithm + TCP options

  • 8/11/2019 11 Transport

    22/80

    Silly Window Syndrome22

    Problem: what if the window size is very small?

    Multiple, small packets, headers dominate data

    Equivalent problem: sender transmits packets one

    byte at a time

    1. for (int x = 0; x < strlen(data); ++x)2. write(socket, data + x, 1);

    Header Data Header Data Header Data Header Data

  • 8/11/2019 11 Transport

    23/80

    Nagles Algorithm23

    1. If the window >= MSS and available data >= MSS:

    Send the data

    2. Elif there is unACKed data:

    Enqueue data in a buffer (send after a timeout)3. Else: send the data

    Problem: Nagles Algorithm delays transmissions

    What if you need to send a packet immediately?

    1. int flag = 1;

    2. setsockopt(sock, IPPROTO_TCP, TCP_NODELAY,

    (char *) &flag, sizeof(int));

    Send a full

    packet

    Send a non-full packet if

    nothing else is happening

  • 8/11/2019 11 Transport

    24/80

    Error Detection24

    Checksum detects (some) packet corruption

    Computed over IP header, TCP header, and data

    Sequence numbers catch sequence problems

    Duplicates are ignored

    Out-of-order packets are reordered or dropped

    Missing sequence numbers indicate lost packets

    Lost segments detected by senderUse timeoutto detect missing ACKs

    Need to estimate RTT to calibrate the timeout

    Sender must keep copies of all data until ACK

  • 8/11/2019 11 Transport

    25/80

    Retransmission Time Outs (RTO)25

    Problem: time-out is linked to round trip time

    RTO

    RTO

    Timeout is

    too short

    What about

    if timeout istoo long?

  • 8/11/2019 11 Transport

    26/80

    Round Trip Time Estimation26

    Original TCP round-trip estimator

    RTT estimated as a moving average new_rtt = (old_rtt) + (1)(new_sample)

    Recommended : 0.8-0.9 (0.875 for most TCPs)

    RTO = 2 * new_rtt (i.e. TCP is conservative)

    Sample

  • 8/11/2019 11 Transport

    27/80

    RTT Sample Ambiguity27

    Karnsalgorithm: ignore samples for retransmitted

    segments

    RTO

    RTO

    Sample Sample?

  • 8/11/2019 11 Transport

    28/80

    UDP

    TCP

    Congestion Control

    Evolution of TCP Problems with TCP

    Outline28

  • 8/11/2019 11 Transport

    29/80

    What is Congestion?29

    Load on the network is higher than capacity

    Capacity is not uniform across networks

    Modem vs. Cellular vs. Cable vs. Fiber Optics

    There are multiple flows competing for bandwidth Residential cable modem vs. corporate datacenter

    Load is not uniform over time

    10pm, Sunday night = Bittorrent Game of Thrones

  • 8/11/2019 11 Transport

    30/80

    Why is Congestion Bad?30

    Results in packet loss

    Routers have finite buffers, packets must be dropped

    Practical consequences

    Router queues build up, delayincreases

    Wasted bandwidth from retransmissions

    Low network goodput

  • 8/11/2019 11 Transport

    31/80

    The Danger of Increasing Load31

    Kneepoint after which

    Throughput increases

    very slow

    Delay increases fast In an M/M/1 queue

    Delay = 1/(1utilization)

    Cliffpoint after which Throughput0

    Delay

    Congestion

    Collapse

    Load

    Load

    Go

    odput

    Delay

    Knee Cliff

    Ideal point

  • 8/11/2019 11 Transport

    32/80

    Cong. Control vs. Cong. Avoidance32

    CongestionCollapseGoodp

    ut

    Knee Cliff

    Load

    Congestion Avoidance:

    Stay left of the knee

    Congestion Control:

    Stay left of the cliff

  • 8/11/2019 11 Transport

    33/80

    Advertised Window, Revisited33

    Does TCPs advertised window solve congestion?

    NO

    The advertised window only protects the receiver

    A sufficiently fast receiver can max the window

    What if the network is slower than the receiver?

    What if there are other concurrent flows?

    Key pointsWindow size determines send rate

    Window must be adjusted to prevent congestion

    collapse

  • 8/11/2019 11 Transport

    34/80

    Goals of Congestion Control34

    1. Adjusting to the bottleneck bandwidth

    2. Adjusting to variations in bandwidth

    3. Sharing bandwidth between flows4. Maximizing throughput

  • 8/11/2019 11 Transport

    35/80

    General Approaches35

    Do nothing, send packets indiscriminately

    Many packets will drop, totally unpredictable performance

    May lead to congestion collapse

    Reservations Pre-arrange bandwidth allocations for flows

    Requires negotiation before sending packets

    Must be supported by the network

    Dynamic adjustment Use probes to estimate level of congestion

    Speed up when congestion is low

    Slow down when congestion increases

    Messy dynamics, requires distributed coordination

  • 8/11/2019 11 Transport

    36/80

    TCP Congestion Control36

    Each TCP connection has a window

    Controls the number of unACKed packets

    Sending rate is ~ window/RTT

    Idea: vary the window size to control the send rate

    Introduce a congestion window at the sender

    Congestion control is sender-side problem

  • 8/11/2019 11 Transport

    37/80

    Congestion Window (cwnd)37

    Limits how much data is in transit

    Denominated in bytes

    1.

    wnd= min(cwnd, adv_wnd);2. effective_wnd= wnd

    (last_byte_sentlast_byte_acked);

    last_byte_acked last_byte_sent

    wnd

    effective_wnd

  • 8/11/2019 11 Transport

    38/80

    Two Basic Components38

    1. Detect congestion

    Packet dropping is most reliably signal

    Delay-based methods are hard and risky

    How do you detect packet drops? ACKs Timeout after not receiving an ACK

    Several duplicate ACKs in a row (ignore for now)

    2. Rate adjustment algorithm

    Modify cwnd

    Probe for bandwidth

    Responding to congestion

    Except on

    wireless

    networks

  • 8/11/2019 11 Transport

    39/80

    Rate Adjustment39

    Recall: TCP is ACK clocked

    Congestion = delay = long wait between ACKs

    No congestion = low delay = ACKs arrive quickly

    Basic algorithmUpon receipt of ACK: increase cwnd

    Data was delivered, perhaps we can send faster

    cwndgrowth is proportional to RTT

    On loss: decrease cwnd

    Data is being lost, there must be congestion

    Question: increase/decrease functions to use?

  • 8/11/2019 11 Transport

    40/80

    Utilization and Fairness40

    Flow 1 Throughput

    Flo

    w

    2Throughput

    Max

    throughput

    for flow 2

    Zero

    throughput

    for flow 1 Maxthroughput

    for flow 1

    Zero

    throughput

    for flow 2

    Less than full

    utilization

    More than full

    utilization

    (congestion)

    Ideal point

    Max efficiency

    Perfect fairness

    Equal

    throughput

    (fairness)

  • 8/11/2019 11 Transport

    41/80

    Multiplicative Increase, Additive Decrease

    41

    Not stable!

    Veers away

    from fairness

    Flow 1 Throughput

    Flow

    2Through

    put

  • 8/11/2019 11 Transport

    42/80

    Additive Increase, Additive Decrease

    42

    Stable

    But does not

    converge to

    fairness

    Flow 1 Throughput

    Flow

    2Through

    put

  • 8/11/2019 11 Transport

    43/80

    Multiplicative Increase, Multiplicative Decrease

    43

    Stable

    But does not

    converge to

    fairness

    Flow 1 Throughput

    Flow

    2Through

    put

  • 8/11/2019 11 Transport

    44/80

    Additive Increase, Multiplicative Decrease

    44

    Converges to

    stable and fair

    cycle

    Symmetricaround y=x

    Flow 1 Throughput

    Flow

    2Through

    put

  • 8/11/2019 11 Transport

    45/80

    Implementing Congestion Control

    Maintains three variables:

    cwnd: congestion window

    adv_wnd: receiver advertised window

    ssthresh: threshold size (used to update cwnd)

    For sending, use: wnd= min(cwnd, adv_wnd)

    Two phases of congestion control

    1. Slow start (cwnd< ssthresh) Probe for bottleneck bandwidth

    2. Congestion avoidance (cwnd>= ssthresh)

    AIMD

    45

    45

  • 8/11/2019 11 Transport

    46/80

    Slow Start

    Goal: reach knee quickly

    Upon starting (or restarting) aconnection

    cwnd=1 ssthresh= adv_wnd

    Each time a segment is ACKed, cwnd++

    Continues until

    ssthreshis reachedOr a packet is lost

    Slow Start is not actually slow

    cwndincreases exponentially

    46

    LoadG

    oodput

    Knee Cliff

  • 8/11/2019 11 Transport

    47/80

    Slow Start Example47

    cwnd= 1

    cwnd= 2

    cwnd= 4

    cwnd= 8

    cwndgrows rapidly

    Slows down when

    cwnd >= ssthresh

    Or a packet drops

  • 8/11/2019 11 Transport

    48/80

    Congestion Avoidance

    AIMD mode

    ssthreshis lower-bound guess about location of

    the knee

    Ifcwnd >= ssthresh theneach time a segment is ACKed

    increment cwnd by 1/cwnd (cwnd += 1/cwnd).

    So cwndis increased by one only if all segmentshave been acknowledged

    48

  • 8/11/2019 11 Transport

    49/80

    Congestion Avoidance Example49

    0

    2

    4

    6

    8

    10

    12

    14

    t=0

    t=2

    t=4

    t=6

    Round Trip Times

    cwnd

    (insegments)

    SlowStart

    cwnd>= ssthresh

    cwnd= 1

    cwnd= 2

    cwnd= 4

    cwnd= 8

    cwnd= 9

    ssthresh = 8

  • 8/11/2019 11 Transport

    50/80

    TCP Pseudocode

    Initially:cwnd = 1;ssthresh = adv_wnd;

    New ack received:

    if (cwnd < ssthresh)/* Slow Start*/cwnd = cwnd + 1;

    else/* Congestion Avoidance */

    cwnd = cwnd + 1/cwnd;Timeout:/* Multiplicative decrease */ssthresh = cwnd/2;cwnd = 1;

    50

  • 8/11/2019 11 Transport

    51/80

    The Big Picture

    Time

    cwnd

    Timeout

    Slow Start

    Congestion

    Avoidance

    51

    ssthresh

  • 8/11/2019 11 Transport

    52/80

    UDP

    TCP

    Congestion Control

    Evolution of TCP Problems with TCP

    Outline52

  • 8/11/2019 11 Transport

    53/80

    The Evolution of TCP53

    Thus far, we have discussed TCP TahoeOriginal version of TCP

    However, TCP was invented in 1974!

    Today, there are many variants of TCP Early, popular variant: TCP Reno

    Tahoe features, plus

    Fast retransmit Fast recovery

  • 8/11/2019 11 Transport

    54/80

    TCP Reno: Fast Retransmit54

    Problem: in Tahoe, if

    segment is lost, there

    is a long wait until the

    RTO Reno: retransmit after

    3 duplicate ACKs

    cwnd= 1

    cwnd= 2

    cwnd= 4

    3 Duplicate

    ACKs

  • 8/11/2019 11 Transport

    55/80

    TCP Reno: Fast Recovery

    After a fast-retransmit set cwndto ssthresh/2

    i.e. dont reset cwndto 1

    Avoid unnecessary return to slow start

    Prevents expensive timeouts

    But when RTO expires still do cwnd= 1

    Return to slow start, same as Tahoe

    Indicates packets arent being delivered at all i.e. congestion must be really bad

    55

  • 8/11/2019 11 Transport

    56/80

    Fast Retransmit and Fast Recovery

    At steady state, cwndoscillates around the optimal

    window size

    TCP always forces packet drops

    56

    Time

    cw

    nd

    Timeout

    Slow Start

    Congestion Avoidance

    Fast Retransmit/Recovery

    ssthresh

    Timeout

  • 8/11/2019 11 Transport

    57/80

    Many TCP Variants57

    Tahoe: the original

    Slow start with AIMD

    Dynamic RTO based on RTT estimate

    Reno: fast retransmit and fast recovery NewReno: improved fast retransmit

    Each duplicate ACK triggers a retransmission

    Problem: >3 out-of-order packets causes pathologicalretransmissions

    Vegas: delay-based congestion avoidance

    And many, many, many more

  • 8/11/2019 11 Transport

    58/80

    TCP in the Real World58

    What are the most popular variants today?

    Key problem: TCP performs poorly on high bandwidth-

    delay product networks (like the modern Internet)

    Compound TCP (Windows) Based on Reno

    Uses two congestion windows: delay based and loss based

    Thus, it uses a compoundcongestion controller

    TCP CUBIC (Linux) Enhancement of BIC (Binary Increase Congestion Control)

    Window size controlled by cubic function

    Parameterized by the time Tsince the last dropped packet

  • 8/11/2019 11 Transport

    59/80

    High Bandwidth-Delay Product59

    Key Problem: TCP performs poorly when

    The capacity of the network (bandwidth) is large

    The delay (RTT) of the network is large

    Or, when bandwidth * delay is large b * d = maximum amount of in-flight data in the network

    a.k.a. the bandwidth-delay product

    Why does TCP perform poorly?

    Slow start and additive increase are slow to converge

    TCP is ACK clocked

    i.e. TCP can only react as quickly as ACKs are received

    Large RTTACKs are delayedTCP is slow to react

  • 8/11/2019 11 Transport

    60/80

    Poor Performance of TCP Reno CC60

    Bottleneck Bandwidth (Mb/s)

    50 flows in both directions

    Buffer = BW x Delay

    RTT = 80 ms

    Round Trip Delay (sec)

    50 flows in both directions

    Buffer = BW x Delay

    BW = 155 Mb/s

  • 8/11/2019 11 Transport

    61/80

    Goals61

    Fast window growth

    Slow start and additive increase are too slow when

    bandwidth is large

    Want to converge more quickly Maintain fairness with other TCP varients

    Window growth cannot be too aggressive

    Improve RTT fairness TCP Tahoe/Reno flows are not fair when RTTs vary

    widely

    Simple implementation

  • 8/11/2019 11 Transport

    62/80

    Compound TCP Implementation62

    Default TCP implementation in Windows

    Key idea: split cwnd into two separate windows

    Traditional, loss-based window

    New, delay-based window wnd= min(cwnd + dwnd, adv_wnd)

    cwnd is controlled by AIMD

    dwndis the delay window

    Rules for adjustingdwnd: If RTT is increasing, decrease dwnd(dwnd>= 0)

    If RTT is decreasing, increase dwnd

    Increase/decrease are proportional to the rate of change

  • 8/11/2019 11 Transport

    63/80

    Low

    RTT

    High

    RTT

    Compound TCP Example

    Aggressiveness corresponds to changes in RTT

    Advantages: fast ramp up, more fair to flows with different RTTs

    Disadvantage: must estimate RTT, which is very challenging

    63

    Time

    cwnd

    Timeout

    Slow Start

    TimeoutSlower

    cwnd

    growth

    Fastercwnd

    growth

    C C C

  • 8/11/2019 11 Transport

    64/80

    TCP CUBIC Implementation64

    Default TCP implementation in Linux

    Replace AIMD with cubic function

    =

    3

    +

    Ca constant scaling factor a constant fraction for multiplicative decrease

    Ttime since last packet drop

    cwndmax

    cwnd when last packet dropped

    TCP CUBIC E l

  • 8/11/2019 11 Transport

    65/80

    TCP CUBIC Example

    Less wasted bandwidth due to fast ramp up

    Stable region and slow acceleration help maintain fairness

    Fast ramp up is more aggressive than additive increase

    To be fair to Tahoe/Reno, CUBIC needs to be less aggressive

    65

    Time

    cwnd

    Timeout

    Slow Start

    CUBIC Function

    cwndmax

    Fast

    ramp up

    StableRegion

    Slowly accelerate to

    probe for bandwidth

    Si l ti f CUBIC Fl

  • 8/11/2019 11 Transport

    66/80

    Simulations of CUBIC Flows66

    CUBIC

    CUBIC

    RenoReno

    D l i TCP V i t

  • 8/11/2019 11 Transport

    67/80

    Deploying TCP Variants

    TCP assumes all flows employ TCP-like

    congestion control

    TCP-friendly or TCP-compatible

    Violated by UDP :( If new congestion control algorithms are

    developed, they must be TCP-friendly

    Be wary of unforeseen interactionsVariants work well with others like themselves

    Different variants competing for resources may trigger

    unfair, pathological behavior

    67

  • 8/11/2019 11 Transport

    68/80

    C TCP O ti

  • 8/11/2019 11 Transport

    69/80

    Common TCP Options69

    Window scaling SACK: selective acknowledgement

    Maximum segment size (MSS)

    Timestamp

    Options

    Destination Port0 16 31

    Sequence Number

    Source Port

    Acknowledgement Number

    Advertised WindowUrgent Pointer

    FlagsChecksum

    4

    HLen

    Wi d S li

  • 8/11/2019 11 Transport

    70/80

    Window Scaling70

    Problem: the advertised window is only 16-bits

    Effectively caps the window at 65536B, 64KB

    Example: 1.5Mbps link, 513ms RTT

    (1.5Mbps * 0.513s) = 94KB

    64KB / 94KB = 68%of maximum possible speed

    Solution: introduce a window scaling value

    wnd= adv_wnd

  • 8/11/2019 11 Transport

    71/80

    SACK: Selective Acknowledgment71

    Problem: duplicate ACKs only

    tell us about 1 missing packet

    Multiple rounds of dup ACKs

    needed to fill all holes Solution: selective ACK

    Include received, out-of-order

    sequence numbers in TCP header

    Explicitly tells the sender about

    holes in the sequence

    Oth C O ti

  • 8/11/2019 11 Transport

    72/80

    Other Common Options72

    Maximum segment size (MSS)

    Essentially, what is the hosts MTU

    Saves on path discovery overhead

    TimestampWhen was the packet sent (approximately)?

    Used to prevent sequence number wraparound

    PAWS algorithm

    I ith TCP

  • 8/11/2019 11 Transport

    73/80

    Issues with TCP73

    The vast majority of Internet traffic is TCP

    However, many issues with the protocol

    Lack of fairness

    Synchronization of flowsPoor performance with small flows

    Really poor performance on wireless networks

    Susceptibility to denial of service

    F i

  • 8/11/2019 11 Transport

    74/80

    Fairness74

    Problem: TCP throughput depends on RTT

    1 Mbps 1 Mbps

    1 Mbps1 Mbps1 Mbps

    100 ms

    1000 ms

    ACK clocking makes TCP inherently unfair

    Possible solution: maintain a separate delay window

    Implemented by Microsofts Compound TCP

    S h i ti f Fl

  • 8/11/2019 11 Transport

    75/80

    Synchronization of Flows

    Ideal bandwidth sharing

    75

    cwnd

    cwnd

    cwnd

    Oscillating, but high overallutilization

    In reality, flows synchronize

    One flow causesall flows to drop

    packets

    Periodic lulls oflow utilization

    S ll Fl

  • 8/11/2019 11 Transport

    76/80

    Small Flows76

    Problem: TCP is biased against short flows

    1 RTT wasted for connection setup (SYN, SYN/ACK)

    cwndalways starts at 1

    Vast majority of Internet traffic is short flowsMostly HTTP transfers,

  • 8/11/2019 11 Transport

    77/80

    Wireless Networks77

    Problem: Tahoe and Reno assume loss = congestion

    True on the WAN, bit errors are very rare

    False on wireless, interference is very common

    TCP throughput ~ 1/sqrt(drop rate)Even a few interference drops can kill performance

    Possible solutions:

    Break layering, push data link info up to TCPUse delay-based congestion detection (TCP Vegas)

    Explicit congestion notification (ECN)

    More on this next week

    Denial of Service

  • 8/11/2019 11 Transport

    78/80

    Denial of Service78

    Problem: TCP connections require state

    Initial SYN allocates resources on the server

    State must persist for several minutes (RTO)

    SYN flood: send enough SYNs to a server toallocate all memory/meltdown the kernel

    Solution: SYN cookies

    Idea: dont store initial state on the serverSecurely insert state into the SYN/ACK packet

    Client will reflect the state back to the server

    SYN Cookies

  • 8/11/2019 11 Transport

    79/80

    SYN Cookies79

    Did the client really send me a SYN recently?

    Timestamp: freshness check

    Cryptographic hash: prevents spoofed packets

    Maximum segment size (MSS)

    Usually stated by the client during initial SYN

    Server should store this value

    Reflect the clients value back through them

    Sequence NumberTimestamp310 5

    MSS8

    Crypto Hash of Client IP & Port

    SYN Cookies in Practice

  • 8/11/2019 11 Transport

    80/80

    SYN Cookies in Practice80

    Advantages

    Effective at mitigating SYN floods

    Compatible with all TCP versions

    Only need to modify the serverNo need for client support

    Disadvantages

    MSS limited to 3 bits, may be smaller than clientsactual MSS

    Server forgets all other TCP options included with the

    clients SYN