Flow Control and Reliability Control in WebTP Ye Xia 10/24/00.

Flow Control and Reliability Control in WebTP

Ye Xia

10/24/00

Outline

• Motivating problems

• Recall known ideas and go through simple facts about flow control

• Flow control examples: TCP (BSD) and Credit-based flow control for ATM

• WebTP challenges and tentative solutions

Motivating Problem

• Suppose a packet is lost at F1’s receive buffer. Should the pipe’s congestion window be reduced?

Answer

• Typically, No.– Imagine flow F1 fails.

• OK if loss at the receive buffer is rare.• Essentially, need flow/congestion control at flow

level. (Note: pipe’s congestion control is not end-to-end.)

• To address this problem, we need to design and feedback control scheme and a receiver buffer management scheme. Reliability control also complicates our design.

Possible Partial Solutions

• Solution 1– Distinguish losses at receive buffer from network

losses.

– Cut the pipe’s congestion window only for network losses.

– Otherwise, slow-down only the corresponding flow.

• Solution 2– Make sure losses at receiver buffer never happen or

rarely happen.

Two Styles of Flow Control

• Link provides binary information: congestion/not congested, or loss/no loss. Source decreases or increases its traffic intensity incrementally. Call it TCPC.– E.g., TCP congestion control

• Link provides complete buffer information. Source finds the right operating point. Call it CRDT.– TCP’s window flow control– Various credit-based flow control schemes

Comparison: TCPC and CRDT

• TCPC– Lossy– Traffic intensity varies slowly and oscillates.

• CRDT– No loss– Handles bursty traffic well; handles bursty

receiver link well

Simple Scenario

• If C_s(t) <= C_r(t), for all t, flow control is not needed. (B_r = 0)

• Otherwise, buffer may absorb temporary traffic overload. More likely, feedback-based flow control is necessary.

Flow Control Definition and Classification

• Link or links directly or indirectly generate feedback.• (Virtual) source calculates or adjusts data forwarding speed to avoid overloading

links.• Known examples in ATM: explicit rate, binary rate, credit-based control. What is

TCP?• New inventions: hop-by-hop explicit rate, end-to-end precise window allocation

Source Calculate rate/window Calculate credit/window

Calculation Style Try and See (or iterated algorithm), e.g., linear increase/multiplicative decrease

Precise calculation, requires allocation phase or out-of-band control

Feedback Exact feedback, e.g. credit or rate Binary feedback, e.g, mark or loss ratio

Control Loop Hop-by-hop Multi-hop

Flow Control Goals (Kung and Morris 95)

• Low loss ratio• High link utilization• Fair• Robust – e.g. against loss of control information• Simple control and parameter tuning• Reasonable cost• Perhaps we should stress

– Small buffer sizes

What is TCP?

• Control is divided into two parts.• (Network) congestion control: (congestion)-window-based

algorithm– Linear/sublinear increase and multiplicative decrease of window.– Congestion window can be thought as both rate (Win/RTT) and

credit.– Use binary loss information.– Multi-hop

• (End-to-end) flow control resembles credit scheme, with credit update protocol.

• Can the end-to-end flow control be treated the same as congestion control? Maybe, but …

TCP Receiver Flow Control

• Multiple TCP connections share the same physical buffer: need buffer management so that– One connection does not take all buffers,

effectively shutting other connections.– Deadlock may be prevented.

• Packet re-assembly

TCP Receiver Buffer Management

• Time-varying physical buffer size B_r(t), shared by n TCP connections.

• BSD implementation: receiver of connection i can buffer no more than B_i amount of data at any time. Source i tries not to overflow a buffer of size B_i.

• If TCPC-styled control is used, it is hard to guarantee not exceeding the buffer size B_i.

• Buffers are not reserved. It is possible B_i > B_r(t), for some time t.

Possible Deadlock

• Example: Two connections, each with B_i = 4.• Suppose B_r = 4. At this point, physical buffer runs out,

reassembly cannot continue.• Deadlock can be avoided if we allow dropping received packets.

Implications to reliability control (e.g. connection 1):– OK with TCP, because packets 4 and 5 have not been acked– WebTP may have already acked 4 and 5

Connection 1: … 2 3 4 5 6


Deadlock Prevention• Simple solution: completely partition the buffer, i.e. B_i <= B_r. Inefficient. Also,

what about if B_r varies?• When only one packet worth of free buffer is left, drop any incoming packet unless it fills

the first gap. • More buffer will be freed later when the initial segment of data is consumed by application.• In the following, the next incoming packet accepted is either 3 for conn. 1, or 5 for conn. 2• Performance unclear



Simple Scenario

• If C_s(t) <= C_r(t), for all t, flow control is not needed. (B_r = 0)

• Otherwise, buffer may absorb temporary traffic overload. More likely, feedback-based flow control is necessary.

Why do we need receiver buffer?• Part of flow/congestion control when C_s(t) > C_r(t)

– In TCPC, certain amount of buffer is needed to get reasonable throughput. (For optimality issues, see [Mitra92] and [Fendick92])

– In CRDT, also for good throughput.– Question: in CRDT, the complete buffer info. is passed to the

source. Why don’t we pass the complete rate info. so that buffering is not needed. This is the idea of explicit rate control. But in non-differentiable system, rate is not well-defined. Somehow, the time-interval for defining rate needs to be adaptive according to the control objective. CRDT does that and therefore handles bursty traffic well.

• Buffering is beneficial for data re-assembly.

Buffering for Flow Control: Example

• Suppose link capacities are constant. Suppose C_s >= C_r. To reach throughput C_r, B_r should be– C_r * RTT, in a naïve but robust CRDT scheme– (C_s - C_r) * C_r * RTT / C_s, if C_r is known to the sender.– 0, if C_r is known to the sender and sender never sends burst

at rate greater than C_r.– Note: upstream can estimate C_r

Re-assembly Buffer Sizing

• Without it, throughput can suffer. (by how much?)• Example: Send n packets in block, iid delays. If B = 1,

roughly ½ + e packets will be received on average. If B = n, all n packets will be received.

• Buffer size depends on network delay, loss, packet reordering behaviors. Can we quantify this?

Question: How do we put the two together? Re-assembly buffer size can simply be a threshold number, e.g. TCP.

Example: (Actual) buffer size B = 6. But we allow packet 3 and 12 coexist in the buffer.

Credit-Based Control (Kung and Morris 95)

• Hop-by-hop, exact feedback, precise calculation at the source.

• Overview of steps– Before forwarding packets, sender needs to receive credits from

receiver– At various times, receiver sends credits to sender, indicating

available receive buffer size– Sender decrements its credit balance after forwarding a packet.

• Typically ensures no buffer overflow• Works well over a wide range of network conditions, e.g.

bursty traffic.

Credit Update Protocol (Kung and Morris 95)

Credit-Based Control: Buffer Size• Crd_Bal = Buf_Alloc – (Tx_Cnt – Fwd_Cnt)

• Update credit once every N2 packets• Credit computation is for worst case but tight. Consider receiver

does not forward any data on interval [T_1, T_4]. No data will be lost.

• For N connections, maximal bandwidth: BW = Buf_Alloc / (RTT + N2 * N)

• Total buffer size: N * C_r * (RTT + N2 * N)

T_1 T_2 T_3

Receiver sends Fwd_Cnt

SendercomputesCrd_Bal

Senderfinished sendingCrd_Bal data

T_4

All Crd_Bal data received

Adaptive Credit-Based Control(Kung and Chang 95)

• Idea: make buffer size proportional to actual bandwidth, for each connection.

• For each connection and on each allocation interval,Buf_Alloc = (M/2 – TQ – N) * (VU/TU)

M: buffer size TQ: current buffer occupancy VU: amount of data forwarded for the connection TU: amount forwarded for all N connections.

• M = 4 * RTT + 2 * N• Easy to show no losses. But can allocation be controlled

precisely?• Once bandwidth is introduced, the scheme can no longer

handle burst well.

Comparison with Rate Control

• Requires large buffer size for high throughput. – Needs adaptive buffer allocation scheme.

• Perfect rate control eliminate buffer requirement.– Calculate rate allocation before or at the beginning of transmission.– Use the allocated rate until traffic condition changes– Works well when traffic pattern does not change much; otherwise

needs at least delay-bandwidth product amount of buffers to absolve losses.

– Not easy to design rate allocation algorithms.• Explicit rate calculation• Linear increase and multiplicative decrease: need delay-bandwidth product

amount of buffers for good throughput

• Performance difference depends on network traffic pattern

BSD - TCP Flow Control• Receiver advertises free buffer space

win = Buf_Alloc – Que_siz• Sender can send [snd_una, snd_una + snd_win –1].

snd_win = win; snd_una: oldest unACKed number

1 2 3 4 5 6 7 8 9 10 11 …

sent and ACKed sent not ACKed can send ASAP

snd_win = 6: advertised by receiver

snd_una snd_nxt

can’t send until

window moves

next send number

Equivalence of TCP and Credit-Based Flow Control

• The two credit/window update protocols are “equivalent”, assuming no packet losses in the network and no packet mis-ordering.

• TCP is inherently more complicated due to the need of reliability control. The receiver needs to tell the sender HOW MANY and WHICH packets have been forwarded.

• Regarding to “WHICH”, TCP takes the simplistic approach to ACK the first un-received data.

TCP Example

1 2 3 4 5 6 7 8 9 10 11 …

snd_win = 3

snd_una

1 2 3 4 5 6 7 8 9 10 11 …

• Receiver: ACKs 4, win = 3. (Total buffer size = 6)

• Sender: sends 4 again

3 4 5 6 7 8 9 10 11 12 13 …

snd_win = 6

snd_una

• Sender: after 4 is received at receiver.

WebTP Packet Header Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Packet Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Acknowledgment Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Acknowledged Vector |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| ADU Name |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Segment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Source Port | Destination Port |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Data |U|A|R|S|F|R|E|F|P| C | P | || Offset|R|C|S|Y|I|E|N|A|T| C | C | RES || |G|K|T|N|N|L|D|S|Y| A | L | |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Window | Checksum |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Options | Padding |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| data |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

WebTP: Reliability Control

• A flow can be reliable (in TCP sense) or unreliable (in UDP sense).

• Shared feedback for reliability and for congestion control.• Reliable flow uses TCP-styled flow control and data re-

assembly. A loss at the receiver due to flow-control buffer overflow is not distinguished from a loss at the pipe. But, this should be rare.

• Unreliable flow: losses at receiver due to overflowing B_i are not reported back to the sender. No window flow control is needed for simplicity. (Is the window information useful?)

WebTP: Buffer Management

• Each flow gets a fixed upper bound on queue size, say B_i. B_i >= B_r is possible.

• Later on, B_i will adapt to speed of application.• Receiver of a flow maintains rcv_nxt and rcv_adv.

B_i = rcv_adv - rcv_nxt + 1• Packets outside [rcv_nxt, rcv_adv] are

rejected.

WebTP Example

1 2 3 4 5 6 7 8 9 10 11 …

snd_win = 3

snd_una

1 2 3 4 5 6 7 8 9 10 11 …

• Receiver: (positively) ACKs 5, 6, and 7, win = 3. (B_i = 6)

• Sender: can send 4, 8 and 9, subject to congestion control

5 6 7 8 9 10 11 12 13 14 15 …

snd_win = 6

snd_una

• Sender: after 4, 8 and 9 are received at receiver.

rcv_nxt rcv_adv

snd_nxt

snd_nxt

WebTP: Deadlock Prevention (Reliable Flows)

• Deadlock prevention: pre-allocate bN buffer spaces, b >= 1, where N = max. number of flows allowed.

• When dynamic buffer runs out, enter deadlock prevention mode. In this mode,– each flow accepts only up-to b in-sequence packets for each flow.– when a flow uses up b buffers, it won’t be allowed to use any

buffers until b buffers are freed.

• We guard against case where all but one flow is still responding. In practice, we only need N to be some reasonable large number.

• b = 1 is sufficient, but can be greater than 1 for performance reason.

WebTP: Feedback Scheme

• The Window field in packet header is for each flow.

• Like TCP, it is the current free buffer space for the flow.

• When a flow starts, use the FORCE bit (FCE) for immediate ACK from the flow.

• To inform the sender about the window size, flow generates an ACK for every 2 received packets (MTU).

• Pipe generates an ACK for every k packets.

• ACK can be piggybacked in the reverse data packets.

Acknowledgement Example: Four Flows

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4Receiver:

Pipe:

Flow 2:

Flow 1:

Flow 4:

Flow 3:

Result:

With some randomness in the traffic, 62 ACKs are generated for every 100 data packets.

Summary of Issues

• Find control scheme suitable for both pipe level and flow level.– Reconcile network control and last-hop control– Note that feedback for congestion control and

reliability control is entangled.

• Buffer management at receiver– Buffer sizing

• for re-assembly • for congestion control

– Deadlock prevention

Correctness of Protocol and Algorithm

• Performance typically deals with average cases, and can be studied by model-based analysis or simulation.

• What about correctness?– Very often in networking, failures are more of the concerns than poor

performance.

• Correctness of many distributed algorithms in networking area has not been proven.

• What can be done?– Need formal description– Need methods of proof

• Some references for protocol verification: I/O Automata ([Lynch88]), Verification of TCP ([Smith97])

References[Mitra92] Debasis Mitra, “Asymptotically Optimal Design of Congestion Control for High Speed Data Networks”, IEEE Transactions on Communications, VOL. 10 NO. 2, Feb. 1992

[Fendick92] Kerry W. Fendick, Manoel A. Rodrigues and Alan Weiss, “Analysis of a rate-based feedback control strategy for long haul data transport”, Performance Evaluation 16 (1992), pp. 67-84

[Kung and Morris 95], H.T. Kung and Robert Morris, “Credit-Based Flow Control for ATM Networks”, IEEE Network Magazine, March 1995.

[Kung and Chang 95], H.T. Kung and Koling Chang, “Receiver-Oriented Adaptive Buffer Allocation in Credit-Based Flow Control for ATM Networks”, Proc. Infocom ’95.

[Smith97] Mark Smith. “Formal Verification of TCP and T/TCP”. PhD thesis, Department of EECS, MIT, 1997.

[Lynch88], Nancy Lynch and Mark Tuttle. “An introduction to Input/Output automata”. Technical Memo MIT/LCS/TM-373, Laboratory for Computer Science, MIT, 1988.

Flow Control and Reliability Control in WebTP Ye Xia 10/24/00.

Documents

Transcript of Flow Control and Reliability Control in WebTP Ye Xia 10/24/00.