Information Network 1 Transport layer: TCP a TCP connection 3-way handshake SYN, SYN-ACK, ACK Ensure...

42
1 Information Network 1 Transport layer: TCP Youki Kadobayashi NAIST

Transcript of Information Network 1 Transport layer: TCP a TCP connection 3-way handshake SYN, SYN-ACK, ACK Ensure...

1

Information Network 1Transport layer: TCP

Youki Kadobayashi

NAIST

Copyright(C)2016 Youki Kadobayashi. All rights reserved.

Transport layer: a birds-eye view

2

HH HHRRRRRR RR

IPIP IP IPIP

Transport

Routers don’t maintain per-host stateRouters don’t maintain per-host state

Hosts maintain state for each transport-layer endpointHosts maintain state for each transport-layer endpoint

ApplicationPresentation

SessionTransportNetwork

Data LinkPhysical

ApplicationPresentation

SessionTransportNetwork

Data LinkPhysical

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 3

Functions provided by the transport layer

Communication between processes Identify process Identify inter-process channel

Interface for upper layer Connection-oriented (virtual circuit) Connectionless (datagram)

Competition and arbitration of network resource Flow control Congestion control

Next lecture

P Q

(P, Q)

Copyright(C)2016 Youki Kadobayashi. All rights reserved.

Transport protocols in the Internet protocol suite

TCP (RFC793) Transmission Control Protocol

Connection-oriented Multiple functions

for reliability

SCTP (RFC4960)

UDP (RFC768) User Datagram Protocol

Connectionless IP & Process identification

Packets can be lost

DCCP (RFC4340)

4

For self study

Advanced topic

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 5

Identifying process and connection in TCP

Unique identification of process(IP, port)

Unique identification of TCP connection (source IP, source port, destination IP, destination port)

1040

80

22

2137

(203.178.136.36, 22)

163.221.52.100 203.178.136.36

(163.221.52.100, 1040)

connection

connection

process

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 6

TCP service model (1)

Connection-oriented Virtual Circuit

Looks like a circuit, but virtual: no physical wires between specific endpoints

Adapts to speed Adapts to speed of intermediate networks as well as the

speed of endpoints

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 7

Establishing a TCP connection

3-way handshake SYN, SYN-ACK, ACK Ensure full-duplex communication

16bit source port 16bit destination port

32bit sequence number

32bit acknowledgment number

4bit hlen 16bit window size

16bit TCP checksum 16bit urgent pointer

reserved flags

URG ACK PSH RST SYN FIN

SYN

SYN-ACK

ACK

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 8

Establishing TCP connection: a concrete example with Wireshark # tshark -i en0 -n -f 'port 80' tcpdump: listening on de0 Capturing on en0 0.164720 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [SYN] Seq=0

Win=65535 Len=0 MSS=1460 WS=16 TSval=1135377929 TSecr=0 SACK_PERM=1

0.195154 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [SYN, ACK] Seq=0 Ack=1 Win=50137 Len=0 TSval=190559467 TSecr=1135377929 MSS=1460 WS=8 SACK_PERM=1

0.195322 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [ACK] Seq=1 Ack=1 Win=131760 Len=0 TSval=1135377958 TSecr=190559467

Reply with Sequence number + 1 as an ACK implies acknowledgment

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 9

Meanings of Wireshark output

time source IP -> dest IP TCP source port > dest port [ flags ] Seq=n Ack=n …

0.195154 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [SYN, ACK] Seq=0 Ack=1 Win=50137 Len=0 TSval=190559467 TSecr=1135377929 MSS=1460 WS=8 SACK_PERM=1

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 10

Virtual Circuit(2):TCP connection release

FIN

Ack of FIN

FIN

Ack of FIN

close

close

5.749366 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [FIN, ACK] Seq=659 Ack=2449 Win=401096 Len=0 TSval=190560021 TSecr=11353784825.749464 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [ACK] Seq=2449 Ack=660 Win=131104 Len=0 TSval=1135383482 TSecr=1905600215.749650 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [FIN, ACK] Seq=2449 Ack=660 Win=131104 Len=0 TSval=1135383482 TSecr=1905600215.765279 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [ACK] Seq=660 Ack=2450 Win=401096 Len=0 TSval=190560024 TSecr=1135383482

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 11

TCP connection reset

RST Abortive release Nonexistent port (e.g., dead process)

telnet to sh.naist.jp, port 80 will result in connection reset: 0.000000 10.0.1.148 -> 163.221.10.10 TCP 63436 > 80 [SYN] Seq=0 Win=65535

Len=0 MSS=1460 WS=16 TSval=1137289324 TSecr=0 SACK_PERM=1 0.018841 163.221.10.10 -> 10.0.1.148 TCP 80 > 63436 [RST, ACK] Seq=1 Ack=1

Win=0 Len=0

If you kill “Dropbox”: 55.992214 10.0.1.148 -> 108.160.163.51 TCP 62991 > 80 [RST] Seq=310 Win=0

Len=0

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 12

Virtual Circuit Summary:TCP state transition diagram

State transition:trigger / response

Copyright(C)2016 Youki Kadobayashi. All rights reserved.

$ netstatActive Internet connectionsProto Recv­Q Send­Q  Local Address          Foreign Address        (state)tcp4       0      0  45.1.20.101.57456      74.125.235.138.http    SYN_SENTtcp4       0      0  45.1.20.101.57455      ey­in­f101.1e100.http  ESTABLISHEDtcp4       0      0  45.1.20.101.57454      74.125.235.148.http    ESTABLISHED

Troubleshooting TCP with state machine

netstat

13

SYN sent, awaiting SYN+ACK response

Many other open-source tools are available:lsof, trpt, tcptraceroute, tcptrace, tcpflow, etc.Try some of these tools in the provided VM image.

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 14

Self study: observe connection setup/release

In the provided virtual machine,

1.Observe connection setup/release;e.g., by using browser within VM

2.Observe connection reset by terminating program;e.g., by skill firefox

3.Observe connection reset by connecting to nonexistent port on the working machine.

e.g., by telnet sh.naist.jp 80Try many tools: wireshark, tshark, tshark -V

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 15

Buffered transfer: adapts to varying speed of endpoints, as well as intermediate networks

TCP connection

sendbuffer

recvbuffer

sendbuffer

recvbuffer

Process

OS kernel

block/unblockRead()Write()

Process

Read()Write()

OS kernel

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 16

TCP service model (2)

Byte-stream service Upper layer can’t see boundaries between packets no boundary: structuring (framing) is needed at upper layer

Full duplex independent two streams in single connection

Reliable masks packet reordering, duplication, discard and bit error

O L L E H O L L E HTCP being viewed asbyte-stream serviceOK OK

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 17

How does TCP implement reliable stream service?

ACK: Acknowledgment Active acknowledgment

Explicitly acknowledge the receipt of packet Duplicate ACK

Implicitly communicate the packet loss information

Timeout and Retransmission Whenever sender doesn’t receive ACK after fixed time,

a “Timeout” event is triggered Retransmission: assuming that transmission has failed

Exponential back-off: 3, 6, 12, …

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 18

ACK: Acknowledgment

Packets in transit

Sent but unacknowledgedSent and acknowledged

User data arrives

Sender

Receiver

Nara Institute of Science and Technology

Nara Insti

10

16

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 19

Piggybacking: Exploiting full-duplex channel

Packets in transit

Sent but unacknowledgedSent and acknowledged

User data arrives

SenderReceiver

ReceiverSender

Nara Institute of Science and Technology

Nara Insti

Graduate School of Information Science

Graduate S

User data arrives

Note: rough sketch

Sent but unacknowledgedSent and acknowledged

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 20

Duplicate ACK: implicit communication of packet loss

Packets in transit

Sent but unacknowledgedSent and acknowledged

User data arrives

Sender

Receiver

Nara Institute of Science and Technology

Nara Institute o

Packet loss

10

16

16

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 21

From byte-stream to packets: TCP header

IPHeader

TCPHeader

TCP data

TCP segment

16bit source port 16bit destination port

32bit sequence number

32bit acknowledgment number

4bit hlen 16bit window size

16bit TCP checksum 16bit urgent pointer

(options)

(TCP data)

reserved flags

20 octets

Chop appropriate length from byte-stream buffer & add TCP header

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 22

Nagle algorithm

Q. If you added 20byte+20byte size header to 1byte data, isn’t that overhead big?

Nagle algorithm (RFC896) There is only one small segment which is unacknowledged

in the network. In case of short RTT:

acceptable overhead, as LAN bandwidth is abundant

send packets with small buffering In case of long RTT

reduce overhead, due to WAN bandwidth constraints

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 23

Self study: observe packet loss & retransmission

Within provided virtual machine,

1.Observe duplicate ACK;Emulate lossy network with:# tc qdisc add dev eth1 root netem loss 10%

1.Observe the effect of Nagle algorithm;Emulate satellite network with:# tc qdisc add dev eth1 root netem delay 500ms

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 24

Questions?

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 25

Summary thus far

Transport layer

The transport protocol on the internet – TCP TCP: service model, and features

Efficiency: ACK, piggybacking, Nagle algorithm TCP connection establishment and release Diagnosis: tools + state machine knowledge

Copyright(C)2016 Youki Kadobayashi. All rights reserved.

Transport protocol challenges

With a number of unknown parameters: Number of active communications Bottleneck bandwidth Error rate

how can we accommodate as much communications as possible, without collapsing network itself?

26

11 11

nn kk

… …

Key ideas: probing, estimation, self-policing, macro-level stability

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 27

Flow control and congestion control:Competition and Arbitration of Network Resource

Flow controlAbsorb difference of transmission rateRecovery from sequence errorRecovery from duplication, packets drop and bit

error Congestion control

Sharing the bandwidth while suppressing the congestion

Fair sharing of bandwidth

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 28

Characteristics of TCP Flow Control

End to end No global assignment of resource Estimate available bandwidth at individual hosts Routers do not explicitly allocate resource

Implicit signaling through packet drops

Scalable Host-based autonomous method has better scalability,

because it doesn’t need state management inside network. Autonomous Less state management Scalable

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 29

TCP: key characteristics

Very simple algorithm Macroscopic self-stabilization

TCP doesn't assume the presence of greedy nodes. Elimination of global control system Reject the idea of intermediate policing system

Modest performance across almost all data-links As opposed to optimal performance in specific condition

Stepwise improvements over 30 years

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 30

Flow Control on TCP

Control available bandwidth Sliding window

Using sequence number, not packet number Window size

Control transmission interval of packets ACK clocking

Miscellaneous Error detection with TCP checksum Packet drop detection with duplicate ACK, timeout

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 31

Sliding window

Packets in flight

Sent but unacknowledgedSent and acknowledged

User data arrives

Sender

Receiver

Nara Institute of Science and Technology

Nara Insti

10

16

Window size Sequence number

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 32

Congestion Control on TCP

Fair-share model: fair distribution of bandwidthEnd to end

Increase and decrease of window size additive increase multiplicative decrease

AIMD is known to be self-stabilizing (R. Jain, et al., “Analysis of the increase and decrease algorithms for congestion avoidance in computer networks”, 1989)

Switch increasing strategy of window size Detect congestion through packet drops

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 33

Increasing Window size

TCP switches increasing strategy of congestion window (cwnd) by slow start threshold (ssthresh)

(Outline of the algorithm) On receiving an ACK:

if (cwnd < ssthresh) {/* slow start: exponential increase */cwnd += 1;

} else {/* congestion avoidance: additive increase */cwnd += 1 / cwnd;

}

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 34

Increasing Window size

Slow start exponential increase

Congestion avoidance additive increase

Effectiveness:see V. Jacobson, “Congestion Avoidance and Control”, SIGCOMM’88.

Source: TCP/IP Illustrated, Vol.1

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 35

Decreasing Window size

(Outline of algorithm ) On detecting packet drop:

ssthresh = cwnd / 2;if (timeout) {

cwnd = 1;}

Source: TCP/IP Illustrated, Vol.1

Copyright(C)2016 Youki Kadobayashi. All rights reserved.

Demo: observe slow start & congestion avoidance

Start Wireshark Access speed test website:

http://www.speedtest6.com

Filter the TCP stream with: ip.src==27.120.100.45

Plot the Time-Sequence Graph Select one of the filtered packets Go to Statistics -> TCP StreamGraph -> Time Sequence Graph

(Stevens) Go to Statistics -> TCP StreamGraph -> Time Sequence Graph

(tcptrace)

36

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 37

Questions?

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 38

Packet drop

Detection with 2 methods:Duplicate ACKRetransmission Time Out (RTO)

How do I judge the time out?

→ RTO ~ RTT Estimator

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 39

RTO Calculation

Err = M – AA <- A + gErrD <- D + h(|Err| - D)RTO = A + 4D

A: smoothed RTTD: smoothed mean deviationg: gain for the average (1/8)h: gain for the deviation (1/4)

Source: TCP/IP Illustrated, Vol.1

Copyright(C)2016 Youki Kadobayashi. All rights reserved.

RTO calculation

40

Source: A Quick Tour Around TCP,http://web.eecs.utk.edu/~dunigan/tcptour/

Copyright(C)2016 Youki Kadobayashi. All rights reserved.

Hands-on: observe the effect of loss/delay

Start Wireshark Start iperf

Install iperf3 on the VM: $sudo apt-get install iperf3 run iperf3: $iperf3 -c 163.221.52.133 -p 50001 -t 2

Configure tc Loss: $sudo tc qdisc add dev eth1 root netem loss 10%

To confirm: $sudo tc qdisc show To remove: $sudo tc qdisc del dev eth1 root netem loss 10%

Delay: $sudo tc qdisc add dev eth1 root netem delay 500ms

Observe the effect of loss/delay Select one of the filtered packets Go to Statistics -> TCP StreamGraph -> Time Sequence Graph (Stevens) Go to Statistics -> TCP StreamGraph -> Time Sequence Graph (tcptrace)

41

Copyright(C)2016 Youki Kadobayashi. All rights reserved. 42

Summary

Transport layer The transport protocol on the internet – TCP TCP: service model, and features

Efficiency: ACK, piggybacking, Nagle algorithm TCP connection establishment and release Flow control and congestion control in TCP