FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

60
FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu

Transcript of FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Page 1: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

FAST TCP

Cheng JinDavid Wei

Steven Low

netlab.CALTECH.edu

Page 2: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Acknowledgments Caltech

Bunn, Choe, Doyle, Hegde, Jayaraman, Newman, Ravot, Singh, X. Su, J. Wang, Xia

UCLA Paganini, Z. Wang

CERN Martin

SLAC Cottrell

Internet2 Almes, Shalunov

MIT Haystack Observatory Lapsley, Whitney

TeraGrid Linda Winkler

Cisco Aiken, Doraiswami, McGugan, Yip

Level(3) Fernes

LANL Wu

Page 3: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Outline

Motivation & approach FAST architecture Window control algorithm Experimental evaluation

skip: theoretical foundation

Page 4: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Congestion control

xi(t)

pl(t)

Example congestion measure pl(t) Loss (Reno) Queueing delay (Vegas)

Page 5: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

TCP/AQM

Congestion control is a distributed asynchronous algorithm to share bandwidth

It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information

They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections

pl(t)

xi(t)TCP: Reno Vegas

AQM: DropTail RED REM/PI AVQ

Page 6: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Difficulties at large window

Equilibrium problem Packet level: AI too slow, MD too drastic Flow level: required loss probability too

small Dynamic problem

Packet level: must oscillate on binary signal

Flow level: unstable at large window

5

Page 7: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Packet & flow level

ACK: W W + 1/W

Loss: W W – 0.5W

Packet level

Reno TCP

Flow level

Equilibrium

Dynamics

pkts (Mathis formula)

Page 8: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Reno TCP

Packet level Designed and implemented first

Flow level Understood afterwards

Flow level dynamics determines Equilibrium: performance, fairness Stability

Design flow level equilibrium & stability Implement flow level goals at packet level

Page 9: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Reno TCP

Packet level Designed and implemented first

Flow level Understood afterwards

Flow level dynamics determines Equilibrium: performance, fairness Stability

Packet level design of FAST, HSTCP, STCP guided by flow level properties

Page 10: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Packet level

ACK: W W + 1/W

Loss: W W – 0.5W

Reno AIMD(1, 0.5)

ACK: W W + a(w)/W

Loss: W W – b(w)W

HSTCP AIMD(a(w), b(w))

ACK: W W + 0.01

Loss: W W – 0.125W

STCP MIMD(a, b)

RTT

baseRTT W W :RTT FAST

Page 11: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Flow level: Reno, HSTCP, STCP, FAST

Similar flow level equilibrium

= 1.225 (Reno), 0.120 (HSTCP), 0.075 (STCP)

pkts/sec (Mathis formula)

Page 12: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Flow level: Reno, HSTCP, STCP, FAST

Different gain and utility Ui

They determine equilibrium and stability

Different congestion measure pi Loss probability (Reno, HSTCP, STCP) Queueing delay (Vegas, FAST)

Common flow level dynamics!

windowadjustment

controlgain

flow levelgoal=

Page 13: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Implementation strategy

Common flow level dynamics

windowadjustment

controlgain

flow levelgoal=

Small adjustment when close, large far away Need to estimate how far current state is wrt target Scalable

Window adjustment independent of pi Depends only on current window Difficult to scale

Page 14: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Outline

Motivation & approach FAST architecture Window control algorithm Experimental evaluation

skip: theoretical foundation

Page 15: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Architecture

RTT timescaleLoss recovery

<RTT timescale

Page 16: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Architecture

Each component designed independently upgraded asynchronously

Page 17: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Architecture

Each component designed independently upgraded asynchronously

WindowControl

Page 18: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Uses delay as congestion measure Delay provides finer congestion info Dealy scales correctly with network capacity Can operate with low queuing delay

FAST-TCP basic idea

Loss

C Window

Que

ue D

elay

FASTLoss Based TCP

Page 19: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Window control algorithm

Full utilization regardless of bandwidth-delay product

Globally stable exponential convergence

Fairness weighted proportional fairness parameter

Page 20: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Outline

Motivation & approach FAST architecture Window control algorithm Experimental evaluation

Abilene-HENP network Haystack Observatory DummyNet

Page 21: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Abilene Test

OC48

OC192

(Yang Xia, Harvey Newman, Caltech)

Periodic lossesevery 10mins

Page 22: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

(Yang Xia, Harvey Newman, Caltech)

Periodic lossesevery 10mins

Page 23: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

(Yang Xia, Harvey Newman, Caltech)

Periodic lossesevery 10mins

FAST backs off tomake room for Reno

Page 24: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Haystack Experiments

Lapsley, MIT Haystack

Page 25: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Haystack - 1 Flow (Atlanta-> Japan)

• Iperf used to generate traffic.• Sender is a Xeon 2.6 Ghz• Window was constant:Burstiness in rate due to Host processing and ack spacing.

Lapsley, MIT Haystack

Page 26: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Haystack – 2 Flows from 1 machine (Atlanta -> Japan)

Lapsley, MIT Haystack

Page 27: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Timeout

All outstanding packets marked as lost.1. SACKs reduce lost packets

2. Lost packets retransmitted slowlyas cwnd is capped at 1 (bug).

Linux Loss Recovery

Page 28: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

DummyNet Experiments

Experiments using emulated network. 800 Mbps emulated bottleneck in

DummyNet.

Sender PC

Dual Xeon 2.6Ghz 2Gb

Intel GbE

Linux 2.4.22

DummyNet PC

Dual Xeon 3.06Ghz 2Gb

FreeBSD 5.1

800Mbps

Receiver PC

Dual Xeon 2.6Ghz 2Gb

Intel GbE

Linux 2.4.22

Page 29: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Dynamic sharing: 3 flowsFAST Linux

Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 3 flows iperf throughput Linux 2.4.x (HSTCP: UCL)

Page 30: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Dynamic sharing: 3 flowsFAST Linux

HSTCPBIC

Steady throughput

Page 31: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

FAST Linux

throughput

loss

queue

STCPHSTCP

Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 14 flows iperf throughput Linux 2.4.x (HSTCP: UCL)

30min

Page 32: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

FAST Linux

throughput

loss

queue

HSTCP

30min

Room for mice !

HSTCP BIC

Page 33: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Average Queue vs Buffer Size

Dummynet capacity

= 800Mbps Delay

=200ms 1 flows Buffer size:

50, …, 8000 pkts

(S. Hedge, B. Wydrowski, etc, Caltech)

Page 34: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Is large queue necessary for high throughput?

Page 35: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004

-release: April 2004Source freely available for any non-profit use

netlab.caltech.edu/FAST

Page 36: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Aggregate throughput

ideal performance

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Page 37: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Aggregate throughput

small window800pkts

largewindow

8000

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Page 38: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Fairness

Jain’s index

HST

CP ~

Ren

oDummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Page 39: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Stability

Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

stable indiverse

scenarios

Page 40: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

FAST TCP: motivation, architecture, algorithms, performance. IEEE Infocom March 2004

-release: April 2004Source freely available for any non-profit use

netlab.caltech.edu/FAST

Page 41: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

BACKUP Slides

Page 42: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

IP Rights

Caltech owns IP rights applicable more broadly than TCP leave all options open

IP freely available if FAST TCP becomes IETF standard Code available on FAST website for any non-commercial use

Page 43: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

WAN in Lab

Caltech: John Doyle, Raj Jayaraman, George Lee, Steven Low (PI), Harvey Newman, Demetri Psaltis, Xun Su, Yang Xia

Cisco: Bob Aiken, Vijay Doraiswami, Chris McGugan, Steven Yip

netlab.caltech.edu

NSF

Page 44: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Key Personnel Steven Low, CS/EE Harvey Newman,

Physics John Doyle, EE/CDS Demetri Psaltis, EE

Cisco Bob Aiken Vijay Doraiswami Chris McGugan Steven Yip

Raj Jayaraman, CS Xun Su, Physics Yang Xia, Physics George Lee, CS

2 grad students 3 summer students Cisco engineers

Page 45: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Spectrum of toolslog(cost)

log(abstraction)mathsimulationemulationlive nk WANiLab

NSSSFNetQualNetJavaSim

Mathis formulaOptimizationControl theoryNonlinear modelStocahstic model

DummyNetEmuLabModelNetWAIL

PlanetLabAbileneNLRDataTAGCENICWAILetc

?

…we use them all

Page 46: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Spectrum of tools

mathsimulationemulationlive nk WANiLab

Distance High High High

Speed High High Low

Realism High High Low

Traffic High Low Low

Configurable Low Medium High

Monitoring Low Medium High

Cost High Medium Low

Critical in developmente.g. Web100

Page 47: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Goal

State-of-the-art hybrid WAN High speed, large distance

2.5G 10G 50 – 200ms

Wireless devices connected by optical core

Controlled & repeatable experiments Reconfigurable & evolvable Built in monitoring capability

Page 48: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

WAN in Lab

5-year plan 6 Cisco

ONS15454 4 routers 10s servers Wireless

devices 800km fiber ~100ms

RTT

OSPF Area: 40OSPF Area: 20

OSPF Area: 10 OSPF Area:30

OPTICAL NETWORK

ONS15454Site B

ONS15454Site E

ONS15454Site C

ONS15454Site D

CISCO7613

(Bottleneck Rtr)

ML-Series NeworkModule

ML-Series NeworkModule

ML-Series networkmodule

CISCO7613

(Bottleneck Rtr)ML-Series Nework

Module

ONS15454Site A

ONS15454Site F

10GE : 100KM

10GE: 100km

Server ServerServer Server

Server Server

CISCO7613

(Bottleneck Rtr)

Server Server Server ServerServer Server Server Server

Linux Farm

Server

Server

Server

Server Server Server ServerServer Server

CISCO7613

(Bottleneck Rtr)

Server Server ServerServer

192.168.10/24 192.168.30/24

10.0.2/24

ITANIUM -10GE Server

10.0.3/24

WirelessComponents

WirelessComponents

Itanium -10GE Server

10.0.3/24

Linux Farm

Server

Server

Server

Linux FarmServer

ServerServer

Linux FarmServer

ServerServerWireless

ComponentsWireless

Components

ITANIUM10GE Server

10.0.3/24

10.0.2/24

10.0.2/24 10.0.2/24

192.168.20/24

ITANIUM10GE Server

10.0.3/24

192.168.40/24

10.0.1/24

10.0.5/2410.0.1/24

10.0.4/24

10.0.4/24

10.0.5/24

V. Doraiswami (Cisco)R. Jayaraman (Caltech)

Page 49: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

OSPF Area: 20

OSPF Area: 10

OPTICAL NETWORK

ONS15454Site B

ONS15454Site D

CISCO7613

(Bottleneck Rtr)

ONS15454 (to support

additionalML-Series cards)

ONS15454 (to support

additionalML-Series cards)

ONS15454Site A

10

GE

: 10

0K

M

Server ServerServer Server Server Server

Server Server

CISCO7613

(Bottleneck Rtr)

Server Server ServerServer

192.168.10/24

10.0.2/24

ITANIUM -10GE Server

10.0.2/24

WirelessComponents

Itanium -10GE Server

10.0.2/24

WirelessComponents

10.0.2/24

192.168.20/24

10.0.1/24

10.0.1/24

WAN in Lab

Year-1 plan 3 Cisco ONS

15454 2 routers 10s servers Wireless

devices

V. Doraiswami (Cisco)R. Jayaraman (Caltech)

Page 50: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Hybrid NetworkScenarios: Ad hoc network Cellular network Sensor network

How optical core supports wireless

edges?

X. Su (Caltech)

Page 51: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Experiments Transport & network layer

TCP, AQM, TCP/IP interaction

Wireless hybrid networking Wireless media delivery Fixed wireless access Sensor networks

Optical control plane Grid computing

UltraLight

Page 52: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip Delay: 0 – 400 ms round trip

Configurable & evolvable Topology, rate, delays, routing Always at cutting edge

Flexible, active debugging Passive monitoring, AQM

Integral part of R&A networks Transition from theory, implementation,

demonstration, deployment Transition from lab to marketplace

Global resource Part of global infrastructure UltraLight led by

Newman

Unique capabilities

Calren2/Abilene

Chicago

Amsterdam

CERN

Geneva

SURFNet

StarLight

WAN in LabCaltech

research & production networks

Multi-Gbps50-200ms delay

Experiment

Page 53: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Network debugging

Performance problems in real network Simulation will miss Emulation might miss Live network hard to debug

WAN in Lab Passive monitoring inside network Active debugging possible

Page 54: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Passive monitoring

Fibersplitter

DAG

RAID

TimestampHeader

GPS

Monitor

No overhead on system Can capture full info at OC48

UofWaikato’s DAG card captures at OC48 speed

Can filter if necessary Disk speed = 2.5Gbps*40/1500

= 66Mbps Monitors synchronized by GPS

or cheaper alternatives Data stored for offline

analysis

D. Wei (Caltech)

Page 55: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Passive monitoring

D. Wei (Caltech)

Fibersplitter

DAG

RAID

TimestampHeader

GPS

Monitor

Server

Server

router

router

monitor

monitor

monitor monitor

monitor

monitor

Web100, MonALISA

Page 56: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

UltraLight testbed

UltraLight team (Newman)

Page 57: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Status Hardware

Optical transport design: finalized IP infrastructure design: finalized (almost) Wireless infrastructure design: finalized Price negotiation/ordering/delivery: summer 04

Software Passive monitoring: summer student Management software: 2005 -

Physical lab Renovation: to be completed by summer 04

Page 58: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

2007200620052003 2004

hardwaredesign

physical building

fundraising

NSF funds10/03

Status

usabletestbed12/04

monitoring

trafficgeneration

connectedUltraLight

usefultestbed12/05

AROfunds5/04

expansion

support

management

OSPF Area: 40OSPF Area: 20

OSPF Area: 10 OSPF Area:30

OPTICAL NETWORK

ONS15454Site B

ONS15454Site E

ONS15454Site C

ONS15454Site D

CISCO7613

(Bottleneck Rtr)

ML-Series NeworkModule

ML-Series NeworkModule

ML-Series networkmodule

CISCO7613

(Bottleneck Rtr)ML-Series Nework

Module

ONS15454Site A

ONS15454Site F

10GE : 100KM

10GE: 100km

Server ServerServer Server

Server Server

CISCO7613

(Bottleneck Rtr)

Server Server Server ServerServer Server Server Server

Linux Farm

Server

Server

Server

Server Server Server ServerServer Server

CISCO7613

(Bottleneck Rtr)

Server Server ServerServer

192.168.10/24 192.168.30/24

10.0.2/24

ITANIUM -10GE Server

10.0.3/24

WirelessComponents

WirelessComponents

Itanium -10GE Server

10.0.3/24

Linux Farm

Server

Server

Server

Linux FarmServer

ServerServer

Linux FarmServer

ServerServerWireless

ComponentsWireless

Components

ITANIUM10GE Server

10.0.3/24

10.0.2/24

10.0.2/24 10.0.2/24

192.168.20/24

ITANIUM10GE Server

10.0.3/24

192.168.40/24

10.0.1/24

10.0.5/2410.0.1/24

10.0.4/24

10.0.4/24

10.0.5/24

Page 59: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

CS DeptJorgensen Lab

NetLab

WANin Lab

G. Lee, R. Jayaraman, E. Nixon (Caltech)

Page 60: FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.

Summary Testbed driven by research agenda

Rich and strong networking effort Integrated approach:

theory + implementation + experiments “A network that can break”

Integral part of real testbeds Part of global infrastructure UltraLight led by

Harvey Newman (Caltech) Integrated monitoring & measurement

facility Fiber splitter passive monitors MonALISA