School of Computing Science Simon Fraser University

52
1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and CMPT 771/471: Internet Architecture and Protocols Protocols Transport Layer Transport Layer Instructor: Dr. Mohamed Hefeeda Instructor: Dr. Mohamed Hefeeda

description

School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture and Protocols Transport Layer Instructor: Dr. Mohamed Hefeeda. Review of Basic Networking Concepts. Internet structure Protocol layering and encapsulation Internet services and socket programming - PowerPoint PPT Presentation

Transcript of School of Computing Science Simon Fraser University

Page 1: School of Computing Science Simon Fraser University

1

School of Computing Science

Simon Fraser University

CMPT 771/471: Internet Architecture and CMPT 771/471: Internet Architecture and ProtocolsProtocols

Transport LayerTransport Layer

Instructor: Dr. Mohamed HefeedaInstructor: Dr. Mohamed Hefeeda

Page 2: School of Computing Science Simon Fraser University

2

Review of Basic Networking Concepts

Internet structure Protocol layering and encapsulation Internet services and socket programming Network Layer

Network types: Circuit switching, Packet switching Addressing, Forwarding, Routing

Transport layer Reliability, congestion and flow control TCP, UDP

Link Layer Multiple Access Protocols Ethernet

Page 3: School of Computing Science Simon Fraser University

3

Transport services and protocols

provide logical communication between app processes running on different hosts

transport protocols run in end systems

send side: breaks app messages into segments, passes to network layer

rcv side: reassembles segments into messages, passes to app layer

more than one transport protocol available to apps

Internet: TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Page 4: School of Computing Science Simon Fraser University

4

Transport vs. network layer

network layer: logical communication between hosts

transport layer: logical communication between processes

relies on, enhances, network layer services

Household analogy:

12 kids sending letters to 12 kids

processes = kids app messages = letters

in envelopes hosts = houses transport protocol = Ann

and Bill network-layer protocol =

postal service

Page 5: School of Computing Science Simon Fraser University

5

Multiplexing/demultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv host:gathering data from multiplesockets, enveloping data with header (later used for demultiplexing)

Multiplexing at send host:

Page 6: School of Computing Science Simon Fraser University

6

Connectionless demux

ClientIP:B

P2

client IP: A

P1P1P3

serverIP: C

SP: 6428

DP: 9157

SP: 6428

DP: 5775

SP: 5775

DP: 6428SP: 9157

DP: 6428

UDP socket identified by: (dst IP, dst Port) datagrams with different src IPs and/or src ports are directed to same socket

Page 7: School of Computing Science Simon Fraser University

7

Connection-oriented demux

ClientIP:B

P1

client IP: A

P1P2P4

serverIP: C

SP: 9157

DP: 80

SP: 9157

DP: 80

P5 P6 P3

D-IP:CS-IP: A

D-IP:C

S-IP: B

SP: 5775

DP: 80

D-IP:CS-IP: B

TCP socket identified by 4-tuple: (src IP, src Port, dst IP, dst Port)

Page 8: School of Computing Science Simon Fraser University

8

UDP: User Datagram Protocol [RFC 768]

“no frills,” “bare bones” Internet transport protocol

“best effort” service, UDP segments may be:

lost delivered out of order

to app Connectionless:

no handshaking between UDP sender, receiver

each UDP segment handled independently of others

Why is there a UDP? no connection

establishment (which can add delay)

simple: no connection state at sender, receiver

small segment header no congestion control: UDP

can blast away as fast as desired

Page 9: School of Computing Science Simon Fraser University

9

UDP

often used for streaming multimedia apps

loss tolerant rate sensitive

other UDP uses DNS SNMP

reliable transfer over UDP: add reliability at application layer

application-specific error recovery!

source port # dest port #

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength, in

bytes of UDPsegment,including

header

Page 10: School of Computing Science Simon Fraser University

10

Reliable data transfer

important in application, transport, and link layers top-10 list of important networking topics!

characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Page 11: School of Computing Science Simon Fraser University

11

Pipelined (Sliding Window) Protocols

Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged pkts

range of sequence numbers must be increased buffering at sender and/or receiver

Two generic forms of pipelined protocols: go-Back-N, selective repeat

Page 12: School of Computing Science Simon Fraser University

12

Go-Back-N

Sender: k-bit seq # in pkt header “window” of up to N, consecutive unack’ed pkts allowed

ACK(n): ACKs all pkts up to and including seq # n -- cumulative ACK may receive duplicate ACKs (see receiver)

timer for each in-flight pkt timeout(n): retransmit pkt n and all higher seq # pkts in window

i.e., go back to n

Page 13: School of Computing Science Simon Fraser University

13

GBN inaction

Window size, N = 4

Go back to 2

Page 14: School of Computing Science Simon Fraser University

14

Go-Back-N

Do you see potential problems with GBN?

Consider high-speed links with long delays (called large bandwidth-delay product pipes)

GBN can fill that pipe by having large N many unACKed pkts could be in the pipe A single lost pkt could cause re-transmission of huge

number (up to N) of pkts waste of bandwidth

Solutions??

Page 15: School of Computing Science Simon Fraser University

15

Selective Repeat

receiver individually acknowledges correctly received pkts

buffers pkts, as needed, for eventual in-order delivery to upper layer

sender only resends pkts for which ACK not received sender timer for each unACKed pkt

sender window N consecutive seq #’s again limits seq #s of sent, unACKed pkts

Page 16: School of Computing Science Simon Fraser University

16

Selective repeat: sender, receiver windows

Page 17: School of Computing Science Simon Fraser University

17

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

full duplex data: bi-directional data flow

in same connection MSS: maximum

segment size

connection-oriented: handshaking (exchange

of control msgs) init’s sender, receiver state before data exchange

flow controlled: sender will not

overwhelm receiver

point-to-point: one sender, one receiver

reliable, in-order byte stream:

no “message boundaries”

pipelined: TCP congestion and flow

control set window size

send & receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

Page 18: School of Computing Science Simon Fraser University

18

TCP segment structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberReceive window

Urg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

Page 19: School of Computing Science Simon Fraser University

19

TCP reliable data transfer

TCP creates rdt service on top of IP’s unreliable service

Pipelined segments Cumulative acks TCP uses single

retransmission timer

Retransmissions are triggered by:

timeout events duplicate acks

Initially consider simplified TCP sender:

ignore duplicate acks ignore flow control,

congestion control

Page 20: School of Computing Science Simon Fraser University

20

TCP sender events:

data rcvd from app: Create segment with seq

# seq # is byte-stream

number of first data byte in segment

start timer if not already running (think of timer as for oldest unacked segment)

expiration interval: TimeOutInterval

timeout: retransmit segment that

caused timeout restart timer

Ack rcvd: If acknowledges

previously unacked segments

update what is known to be acked

start timer if there are outstanding segments

Page 21: School of Computing Science Simon Fraser University

21

TCP sender(simplified)

NextSeqNum = InitialSeqNum SendBase = InitialSeqNum

loop (forever) { switch(event)

event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }

} /* end of loop forever */

Page 22: School of Computing Science Simon Fraser University

22

SendBase= 120

TCP: retransmission scenarios

Host A

Seq=100, 20 bytes data

ACK=100

time

premature timeout

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

time

Seq=

92

tim

eout

SendBase= 100

SendBase= 120

Sendbase= 100

Page 23: School of Computing Science Simon Fraser University

23

TCP retransmission scenarios (more)

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Page 24: School of Computing Science Simon Fraser University

24

TCP Round Trip Time and Timeout

If TCP timeout is too short: premature timeout unnecessary

retransmissions too long: slow reaction to segment loss

Q: how to set TCP timeout value? Based on Round Trip Time (RTT), but RTT itself varies with

time! We need to estimate current RTT

RTT Estimation SampleRTT: measured time from segment transmission

until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT

Page 25: School of Computing Science Simon Fraser University

25

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125

Page 26: School of Computing Science Simon Fraser University

26

Example RTT estimation:

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 27: School of Computing Science Simon Fraser University

27

TCP Round Trip Time and Timeout

Setting the timeout

EstimtedRTT plus safety margin large variation in EstimatedRTT -> larger safety margin

first estimate how much SampleRTT deviates from EstimatedRTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-)*DevRTT + *|SampleRTT - EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:

Page 28: School of Computing Science Simon Fraser University

28

Fast Retransmit

Time-out period often relatively long:

long delay before resending lost packet

Detect lost segments via duplicate ACKs.

Sender often sends many segments back-to-back

If segment is lost, there will likely be many duplicate ACKs.

If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:

fast retransmit: resend segment before timer expires

Page 29: School of Computing Science Simon Fraser University

29

TCP Connection Management: opening

TCP: 3-way handshakeStep 1: client host sends TCP SYN

segment to server specifies initial seq # no data

Step 2: server host receives SYN, replies with SYNACK segment

server allocates buffers specifies server initial seq. #

Step 3: client receives SYNACK, replies with ACK segment, which may contain data

client

SYN=1, seq= client_isn

server

SYN=1, seq=server_isn,

ack=client_isn+1

SYN=0, seq=client_isn+1,

ack=server_isn+1

conn. request

conn. granted

Q. How would a hacker exploit TCP 3-way handshake to bring a server down?

A. SYN Flood DoS attack

Page 30: School of Computing Science Simon Fraser University

30

TCP Connection Management: closing

Step 1: client end system sends TCP FIN segment to server

Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN

Step 3: client receives FIN, replies with ACK

Enters “timed wait” – may need to re-send ACK to received FINs

Step 4: server, receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closedti

med w

ait

closed

Page 31: School of Computing Science Simon Fraser University

31

TCP Connection Management

TCP clientlifecycle

TCP serverlifecycle

Page 32: School of Computing Science Simon Fraser University

32

TCP Flow Control

receive side of TCP connection has a receive buffer:

speed-matching service: matching the send rate to the receiving app’s drain rate

app process may be slow at reading from buffer

sender won’t overflow

receiver’s buffer bytransmitting too

much, too fast

flow control

Page 33: School of Computing Science Simon Fraser University

33

TCP Flow control: how it works

(Suppose TCP receiver discards out-of-order segments)

spare room in buffer= RcvWindow

= RcvBuffer-[LastByteRcvd - LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow

guarantees receive buffer doesn’t overflow

Page 34: School of Computing Science Simon Fraser University

34

Congestion Control

Congestion: sources send too much data for network to handle

different from flow control, which is e2e Congestion results in …

lost packets (buffer overflow at routers)• more work (retransmissions) for given “goodput”

long delays (queueing in router buffers)• Premature (unneeded) retransmissions

Waste of upstream links’ capacity • Pkt traversed several links, then dropped at

congested router

Page 35: School of Computing Science Simon Fraser University

35

Approaches towards congestion control

End-end congestion control: no explicit feedback from

network congestion inferred from

end-system observed loss, delay

approach taken by TCP

Network-assisted congestion control:

routers provide feedback to end systems

single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)

explicit rate sender should send at

Two broad approaches towards congestion control:

Page 36: School of Computing Science Simon Fraser University

36

TCP congestion control: Approach

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Approach: probe for usable bandwidth in network increase transmission rate until loss occurs then

decrease Additive increase, multiplicative decrease (AIMD)

time

Rat

e (C

ongW

in)

Saw toothbehavior: probing

for bandwidth

Page 37: School of Computing Science Simon Fraser University

37

TCP Congestion Control

Sender keeps a new variable, Congestion Window (CongWin), and limits unacked bytes to:

LastByteSent - LastByteAcked min {CongWin, RcvWin}

For our discussion: assume RcvWin is large enough

Roughly, what is the sending rate as a function of CongWin? Ignore loss and transmission delay

Rate = CongWin/RTT (bytes/sec)

So, rate and CongWin are somewhat synonymous

Page 38: School of Computing Science Simon Fraser University

38

TCP Congestion Control

Congestion occurs at routers (inside the network) Routers do not provide any feedback to TCP

How can TCP infer congestion? From its symptoms: timeout or duplicate acks Define loss event ≡ timeout or 3 duplicate acks TCP decreases its CongWin (rate) after a loss event

TCP Congestion Control Algorithm: three components AIMD: additive increase, multiplicative decrease slow start Reaction to timeout events

Page 39: School of Computing Science Simon Fraser University

39

AIMD

additive increase: (congestion avoidance phase) increase CongWin by 1 MSS every RTT until loss detected TCP increases CongWin by: MSS x (MSS/CongWin) for every

ACK received Ex. MSS = 1,460 bytes and CongWin = 14,600 bytes With every ACK, CongWin is increased by 146 bytes

multiplicative decrease: cut CongWin in half after loss

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Con

gWin

Page 40: School of Computing Science Simon Fraser University

40

TCP Slow Start

When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = CongWin/RTT = 20 kbps

available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate

Slow start: When connection begins, increase rate exponentially fast

until first loss event. How can we do that? double CongWin every RTT. How? Increment CongWin by 1 MSS for every ACK received

Page 41: School of Computing Science Simon Fraser University

41

TCP Slow Start (cont’d)

Increment CongWin by 1 MSS for every ACK

Summary: initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Page 42: School of Computing Science Simon Fraser University

42

Reaction to a Loss event

TCP Tahoe (Old) Threshold = CongWin / 2 Set CongWin = 1 Slow start till threshold Then Additive Increase // congestion avoidance

TCP Reno (most current TCP implementations) If 3 dup acks // fast retransmit

• Threshold = CongWin / 2• Set CongWin = Threshold // fast recovery • Additive Increase

Else // timeout• Same as TCP Tahoe

Page 43: School of Computing Science Simon Fraser University

43

Reaction to a Loss event (cont’d)

Why differentiate between 3 dup acks and timeout? 3 dup ACKs indicate network capable of

delivering some segments timeout indicates a “more alarming” congestion scenario

3 dup acks

CongWin

Page 44: School of Computing Science Simon Fraser University

44

TCP Congestion Control: Summary

Initially

Threshold is set to large value (65 Kbytes), has no effect

CongWin = 1 MSS

Slow Start (SS): CongWin grows exponentially

till a loss event occurs (timeout or 3 dup ack) or reaches Threshold

Congestion Avoidance (CA): CongWin grows linearly

3 duplicate ACK occurs:

Threshold = CongWin/2; CongWin = Threshold; CA

Timeout occurs:

Threshold = CongWin/2; CongWin = 1 MSS; SS till Threshold

Page 45: School of Computing Science Simon Fraser University

TCP Throughput Analysis

Understand the fundamental relationship between

Packet loss probability, RTT, and TCP performance (throughput)

We present simple model, with several assumptions

Yet it still provides useful insights See Ch 5 of [HJ04] for a summary of more detailed

models with references to the original papers

45

Page 46: School of Computing Science Simon Fraser University

TCP Throughput Analysis

Any TCP model must capture Window Dynamics (internal and deterministic)

• Controlled internally by the TCP algorithms. • Depends on the particular flavor of TCP • We assume TCP Reno (the most common)

Packet Loss Process (external and uncertain)• Models the aggregate of network conditions at all

nodes in the TCP connection path• Typically modeled as a Stochastic Process with

probability p that a packet loss occurs• TCP responds by reducing the window size

We usually analyze the steady state Ignore the slow start phase (transient) Although many connections finish within slow start,

because they send only a few kilobytes

46

Page 47: School of Computing Science Simon Fraser University

Notations

X(t): Throughput at time t (transmission rate) W(t): window size at time t RTT: Round Trip Time

X(t) = W(t)/RTT

What does the above equation implicitly assume? Increasing X(t) has negligible effects on the

queuing delay in the network RTT remains constant

47

Page 48: School of Computing Science Simon Fraser University

Simple (Periodic) Model

time (RTT)

W/2

W

loss occurs

period

Packet losses occur with constant probability p

TCP window starts at W/2 grows to W, then halves, repeat forever …

W(t) packets transmitted each RTT

W(t+1) = W(t) + 1 each round until a loss occurs

48

Page 49: School of Computing Science Simon Fraser University

T

Simple (Periodic) Model

Compute the steady state throughput as a function of average loss probability p.

T

p

LengthPeriod

PeriodaDuringSentPacketsofAveragepX

/1#)(

49

Page 50: School of Computing Science Simon Fraser University

T

Simple (Periodic) Model

T: period between detecting packet losses T = RTT * W /2Now, we find W as a function of p. How? Compute the number of packets sent during a period and equate it to 1/p. (Size of the green area):

W/2 * (W/2 + W) / 2 = 1/p W = sqrt(8/3p)

50

Page 51: School of Computing Science Simon Fraser University

Simple (Periodic) Model

Inverse Square-Root-p Law

TCP throughput is inversely proportional to RTT and square root of packet loss probability p

pRTTpX

2

31)(

51

Page 52: School of Computing Science Simon Fraser University

In More Realistic Models …

Packet loss probability is not constant and is bursty

Consider effect of duplicate ACKs and Timeouts Consider receiver window limit

52