COT 4600 Operating Systems Fall 2009

42
COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM

description

COT 4600 Operating Systems Fall 2009. Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM. Lecture 27. Schedule Tuesday November 24 - Project phase 4 and HW 6 are due Tuesday December 1st -Research projects instead of final exam presentation - PowerPoint PPT Presentation

Transcript of COT 4600 Operating Systems Fall 2009

Page 1: COT 4600 Operating Systems Fall 2009

COT 4600 Operating Systems Fall 2009

Dan C. MarinescuOffice: HEC 439 BOffice hours: Tu-Th 3:00-4:00 PM

Page 2: COT 4600 Operating Systems Fall 2009

222222

Lecture 27 Schedule

Tuesday November 24 - Project phase 4 and HW 6 are due Tuesday December 1st -Research projects instead of final exam presentation Thursday December 3rd - Class review

Last time: More on Scheduling Network properties - (Chapter 7) - available online from the publisher

of the textbook Layers Data Link Layer

Today: Transport protocols; end-to-end problems

Next Time: Project discussion

Page 3: COT 4600 Operating Systems Fall 2009

Requirements of application level-processes. The transport protocols are expected to:

Guarantee message delivery Deliver messages in the same order they are sent Deliver at most one copy of each message Support arbitrarily large messages Support synchronization between sender and receiver Allow the receiver to apply flow control to the sender Support multiple application processes on each host

Page 4: COT 4600 Operating Systems Fall 2009

Best-effort networks

Services provided by best-effort networks Drop messages Re-order messages Deliver duplicate copies of a given message Limit messages to some finite size Deliver messages after an arbitrarily long delay

Challenges develop algorithms that turn the less-than-desirable properties of the underlying network into the high level services required by application programs.

Page 5: COT 4600 Operating Systems Fall 2009

UDP

Provides a process-to-process transport service. Unreliable and unordered datagram service No flow control Multiplexing Pseudo-header: fields from the IP header (1)+(2)+(3)

(1) protocol number (2) source IP address (3) destination IP address UDP length field

UDP Checksum optional in IPv4 but mandatory in IPv6 pseudo header + udp header + data

Page 6: COT 4600 Operating Systems Fall 2009

UDP ports

Endpoints identified by ports. servers run at well-known ports (e.g. Web server at port 80, Domain

Server at port 53) see /etc/services on Unix

Page 7: COT 4600 Operating Systems Fall 2009

UDP header

Page 8: COT 4600 Operating Systems Fall 2009

Reliable service - TCP Connection-oriented Byte-stream

sending process writes some number of bytes TCP breaks into segments and sends via IP receiving process reads some number of bytes

Full duplex Flow control: keep sender from overrunning receiver Congestion control: keep sender from overrunning network

Page 9: COT 4600 Operating Systems Fall 2009

End-to-end issues TCP is based on sliding window protocol used at data link level, but the situation is very different.

Potentially connects many different hosts need explicit connection establishment and termination

Potentially different RTT need adaptive timeout mechanism Potentially long delay in network need to be prepared for arrival

of very old packets Potentially different capacity at destination need to accommodate

different amounts of buffering Potentially different network capacity need to be prepared for

network congestion

Page 10: COT 4600 Operating Systems Fall 2009

End-to-end issues (cont’d)

At a link level congestion is visible in the queue of packets at the sender. Network congestion much tougher to cope with, detect and then prevent.

TCP uses the sliding window algorithm on an end-to end basis to provide reliable/ordered delivery.

X.25 uses the sliding window protocol on a hop-by-hop basis. A sequence of hop-by-hop guarantees does not add up to an end-

to-end guarantee. End-to-end argument: a function should not be provided at a lower

levels of a system unless it can be completely and correctly implemented at that level.

Yet for performance optimization a function may be incompletely provided at a low level. E.g. error detection is provided on a hop by hop basis. Why send all the way a corrupted packet?

Page 11: COT 4600 Operating Systems Fall 2009

Segment Format Each connection identified with 4-tuple:

<SrcPort, SrcIPAddr, DstPort, DstIPAddr> Sliding window + flow control

Acknowledgment, SequenceNum, AdvertisedWindow

Flags: SYN, FIN, RESET, PUSH, URG, ACK Checksum: pseudo header + tcp header + data

Page 12: COT 4600 Operating Systems Fall 2009

The state transition diagram of a TCP connection Two types of events trigger a transition:

a segment arrives from the peer the local application process invokes and operation on TCP

Lecture 27 12

Page 13: COT 4600 Operating Systems Fall 2009

Connection establishment and termination

Connection establishment is an asymmetric activity: A server does a passive open. A client does an active open The two parties have to agree upon on a set of parameters the

starting numbers the two sides plan to use for their respective byte streams.

Three-Way Handshake Connection termination is symmetric, each side has to close the

connection independently. One side can do a close, meaning that it can no longer send the data, but the other side may keep the other half of the connection open and continue sending data.

Page 14: COT 4600 Operating Systems Fall 2009

Open a connection

The information exchanged during the three-way handshake SN - initial sequence number the client wants to use SN - initial sequence number the server wants to use SYN flag indicating a sequence number is sent SYN+ACK flags indicating a sequence number is sent and the next

sequence number expected (ACK). The three-ways handshake:

a client does an active open, sends a SYN and moves to SYN_SENT state.

when SYN arrives at server, the server moves to SYN_RECVD state and sends a segment with SYN+ACK flags on.

when the client receives the segment with SYN+ACK flags on it moves to ESTABLISHED state

Page 15: COT 4600 Operating Systems Fall 2009

Lecture 27 15

Page 16: COT 4600 Operating Systems Fall 2009

Close a connection

Three combinations of transitions to close a connection: This side closes first: ESTABLISHED FIN_WAIT_1FIN_WAIT_2 TIME_WAITCLOSED The other side closes first: ESTABLISHEDCLOSE_WAITLAST_ACKCLOSED Both sides close at the same time: ESTABLISHEDFIN_WAIT_1CLOSINGTIME_WAITCLOSING

It takes twice the maximum life time of an IP datagram in the Internet ( 120 sec) to move from TIME_WAIT to CLOSED.

Reason: the local side of the connection sent an ACK segment in response to a FIN segment but does not know if the ACK segment reached the other side. The other side may have retransmitted the FIN that might arrive late and be acted upon by a new reincarnation of the connection.

Page 17: COT 4600 Operating Systems Fall 2009

Sliding-window revisited Each byte has a sequence number. ACKs are cumulative Sending side

LastByteAcked LastByteSent LastByteSent LastByteWritten bytes between LastByteAcked and LastByteWritten must be buffered

Receiving side LastByteRead < NextByteExpected bytes between LastByteRead and LastByteRcvd must be buffered

Page 18: COT 4600 Operating Systems Fall 2009

Flow control

Sender buffer size: MaxSendBuffer Receive buffer size: MaxRcvBuffer Receiving side:

LastByteRcvd - NextByteRead MaxRcvBuffer AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd -

NextByteRead)

Page 19: COT 4600 Operating Systems Fall 2009

Flow control (cont’d)

Sending sideNextByteExpected LastByteRcvd + 1LastByteSent - LastByteAcked AdvertisedWindowEffectiveWindow = AdvertisedWindow – (LastByteSent - LastByteAcked) LastByteWritten - LastByteAcked MaxSendBuffer

Block sender if (LastByteWritten - LastByteAcked) + y > MaxSendBuffer

Always send ACK in response to an arriving data segment Persist when AdvertisedWindow=0

Page 20: COT 4600 Operating Systems Fall 2009

Protecting against wraparound

A byte with a sequence number x may be sent at one time and then on the same connection a byte with the same sequence number x may be sent again.

Wrap Around: controlled by the 32-bit SequenceNum The maximum lifteime of an IP datagram is 120 sec thus we need to have a

wraparound time at least 120 sec. For slow links OK but no longer sufficient for optical networks. Bandwidth &

Time Until Wrap Around

BandwidthT1 (1.5Mbps)Ethernet (10Mbps)T3 (45Mbps)FDDI (100Mbps)STS-3 (155Mbps)STS-12 (622Mbps)STS-24 (1.2Gbps)

Time Until Wrap Around6.4 hours57 minutes13 minutes6 minutes4 minutes55 seconds28 seconds

Page 21: COT 4600 Operating Systems Fall 2009

Keeping the pipe full

The SequenceNum, the sequence number space (32 bits) should be twice as large as the window size (16 bits). It is.

The window size (the number of bytes in transit) is given by the AdvertisedWindow field (16 bits).

The higher the bandwidth the larger the window size to keep the pipe full.

Essentially we regard the network as a storage system and the amount of data is equal to: ( bandwidth x delay )

Page 22: COT 4600 Operating Systems Fall 2009

Required window size for a 100 msec RTT.

BandwidthT1 (1.5Mbps)Ethernet (10Mbps)T3 (45Mbps)FDDI (100Mbps)STS-3 (155Mbps)STS-12 (622Mbps)STS-24 (1.2Gbps)

Delay x Bandwidth Product18KB122KB549KB1.2MB1.8MB7.4MB14.8MB

Page 23: COT 4600 Operating Systems Fall 2009

Adaptive retransmission –original algorithm

Measure SampleRTT for each segment/ACK pair Compute weighted average of RTT

EstimatedRTT = a x EstimatedRTT + b x SampleRTT where a + b = 1 a between 0.8 and 0.9 b between 0.1 and 0.2

Set timeout based on EstimatedRTT TimeOut = 2 x EstimatedRTT

Page 24: COT 4600 Operating Systems Fall 2009

Karn/Partridge algorithm

Do not sample RTT when retransmitting Double timeout after each retransmission

Page 25: COT 4600 Operating Systems Fall 2009

Jacobson/Karels algorithm

New calculation for average RTTDiff = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + (d x )Deviation = Deviation + d(|Diff|- Deviation) where d is a fraction between 0 and 1

Consider variance when setting timeout value TimeOut = m x EstimatedRTT + f x Deviation where m = 1 and f = 4

Notes algorithm only as good as granularity of clock (500ms on Unix) accurate timeout mechanism important to congestion control (later)

Page 26: COT 4600 Operating Systems Fall 2009

Congestion control and resource allocation

Network resources: link bandwidth and router buffer space. Global optimization problem - hence very hard. Flow control versus congestion control (should be fair). Network model:

a) Packet switched network. Congestion is often unavoidable.

Page 27: COT 4600 Operating Systems Fall 2009

Congestion control and resource allocation b) Connectionless Flows. Datagram (no state) | Virtual Circuit (hard state) |Flows (soft state) Flow à a sequence of packets between a source/destination pair following the

same route through the network. Established explicitly or implicitly. Soft state à cannot explicitly be created and removed by signaling

Page 28: COT 4600 Operating Systems Fall 2009

Congestion control and resource allocation

c) Underlying service model best-effort (assume for now) multiple qualities of service (later)

Taxonomy of resource allocation mechanisms: (1) Router Centric/Host Centric (2) Reservation Based / Feedback Based (3) Window based / Rate Based

Page 29: COT 4600 Operating Systems Fall 2009

Evaluation fairness power (ratio of throughput to delay)

Fairness criteria

Page 30: COT 4600 Operating Systems Fall 2009

Queuing disciplines First-In-First-Out (FIFO) with tail drop

does not discriminate between traffic sources Priority Queuing Fair Queuing (FQ)

explicitly segregates traffic based on flows, one queue per flow ensures no flow captures more than its share of capacity

Weighted fair queuing (WFQ). How many bits to transmit each time the router servers a queue.

Page 31: COT 4600 Operating Systems Fall 2009

Problem: packets not all the same length really want bit-by-bit round robin not feasible to interleave bits (schedule on packet basis) simulate by determining when packet would finish

For a single flow suppose clock ticks each time a bit is transmitted let Pi denote the length of packet i let Si denote the time when start to transmit packet i let Fi denote the time when finish transmitting packet i Fi = Si + Pi

Page 32: COT 4600 Operating Systems Fall 2009

When does router start transmitting packet i If before router finished packet i-1 from this flow, then immediately

after last bit of i--1 (Fi-1) If no current packets for this flow, then start transmitting when

arrives (call this Ai) Thus: Fi = MAX(Fi-1, Ai) + Pi

For multiple flows calculate Fi for each packet that arrives on each flow treat all Fi's as timestamps next packet to transmit is one with lowest timestamp

Page 33: COT 4600 Operating Systems Fall 2009

Not perfect: can't preempt the packet currently being transmitted Example

Page 34: COT 4600 Operating Systems Fall 2009

TCP congestion control

In late 1980’s Van Jacobson proposed to augment TCP with a congestion control mechanism.

Each source attempts to estimate how much capacity is available in the network. The source uses: (a) the implicit feedback provided by the acknowledgments to determine when

it is safe to insert a new packet into the network (self-clocking), and (b) timeouts to detect congestion.

Three inter-related mechanisms: Additive Increase/Multiplicative Decrease new state variable a window that

combines flow control and congestion control. Slow Start mechanism to allow a connection to achieve its window fast. Fast Retransmit a heuristic to deal with timeouts.

Page 35: COT 4600 Operating Systems Fall 2009

Additive Increase/Multiplicative Decrease

Objective: adjust to changes in the available capacity New state variable per connection that limits how much data source has in

transit : CongestionWindowMaxWin = MIN(CongestionWindow, AdvertisedWindow)EffWin = MaxWin - (LastByteSent - LastByteAcked)

Idea: increase CongestionWindow when congestion goes down decrease CongestionWindow when congestion goes up

Question: how does the source determine whether or not the network is congested?

Answer: a timeout occurs timeout signals that a packet was lost packets are seldom lost due to transmission error lost packet implies congestion

Page 36: COT 4600 Operating Systems Fall 2009

Additive Increase/Multiplicative Decrease

Algorithm: increment CongestionWindow by one packet per RTT (linear increase) divide CongestionWindow by two whenever a timeout occurs

(multiplicative decrease)

In practice: increment a little for each ACK. MSS – maximum segment size. Increment = MSS * MSS/CongestionWindow) CongestionWindow += Increment

Page 37: COT 4600 Operating Systems Fall 2009

Packets in transit during additive increase with one packet being added each RTT

Example trace: saw-tooth behavior

Page 38: COT 4600 Operating Systems Fall 2009

Slow start Objective: determine the available capacity in the first place It takes too long to ramp up a connection when starting from scratch. Idea: double the number of packets TCP has in transit every RTT.

begin with CongestionWindow = 1 packet double CongestionWindow each RTT (increment by 1 packet for each ACK)

Exponential growth, but slower than all in one blast. Used...

when first starting connection when connection goes dead waiting for a timeout

Page 39: COT 4600 Operating Systems Fall 2009

Example trace

Linear growth till about 0.4 sec. Then packets get lost and the window ramps up. A timeout occurs at 2 sec. Then the window size becomes about half of what it was before 17 kB versus 34, and increases linearly towards that value…..

Why packets get lost: assume that the network would only support 20 packets from this source. When they get to the destination and ACKs are sent, the window size is doubled to 40. Obviously the other half is lost.

Page 40: COT 4600 Operating Systems Fall 2009

Fast retransmit and fast recovery Problem: coarse-grain TCP timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission

Page 41: COT 4600 Operating Systems Fall 2009

Results

Fast recovery: remove the slow start phase; go directly to half the last successful CongestionWindow

Page 42: COT 4600 Operating Systems Fall 2009

Congestion avoidance mechanisms

TCP's strategy to control congestion once it happens to repeatedly increase load in an effort to find the point at which

congestion occurs, and then back off Alternative strategy

predict when congestion is about to happen, and reduce the rate at which hosts send data just before packets start being discarded

we call this congestion avoidance, to distinguish it from congestion control

Two possibilities router-centric: DECbit and RED Gateways host-centric: TCP Vegas