FAST Protocols for High Speed Network
description
Transcript of FAST Protocols for High Speed Network
FAST Protocolsfor High Speed Network
David Wei @ netlab, Caltech
For HENP WG, Feb 1st 2003
FAST Protocols for Ultrascale Networks
netlab.caltech.edu/FAST
Internet: distributed feedback control system• TCP: adapts sending rate to congestion• AQM: feeds back congestion information
Rf (s)
Rb’(s)
x
))((1
lll
l ctyc
p
)()(1)( tan)(
)()(1-2
tqtttT
wx iid
tqtxi
ii ii
ii
y
pq
TCP AQM
Theory
Calren2/Abilene
Chicago
Amsterdam
CERN
Geneva
SURFNet
StarLight
WAN in LabCaltech
research & production networks
Multi-Gbps50-200ms delay
Experiment
155Mb/s
slowstart
equilibrium
FASTrecovery
FASTretransmit
timeout
10Gb/s
Implementation
Students Choe (Postech/CI
T) Hu (Williams)
J. Wang (CDS) Z.Wang (UCLA)
Wei (CS)
Industry Doraiswami (Cisc
o) Yip (Cisco)
Faculty Doyle (CDS,EE,B
E) Low (CS,EE)
Newman (Physics) Paganini (UCLA)
Staff/Postdoc Bunn (CACR)
Jin (CS) Ravot (Physics) Singh (CACR)
Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco
People
FAST project
• Goal: Protocols (TCP/AQM) for ultrascale networks Bandwidth: 10Mbps ~ > 100 Gbps Delay: 50-200ms delay Research: Theory, algorithms, design, implement, demo,
deployment
• Urgent Need:– Large amount of Data to share (500TB in SLAC)
– Typical file in SLAC transfer ~1 TB (15 mins with 10Gbps)
HEP Network (DataTAG)
NLNLSURFnet
GENEVA
UKUKSuperJANET4
ABILENE
ABILENE
ESNETESNET
CALREN
CALREN
ItItGARR-B
GEANT
NewYork
FrFrRenater
STAR-TAP
STARLIGHT
Wave
Triangle
• 2.5 Gbps Wavelength Triangle 2002 • 10 Gbps Triangle in 2003
Newman (Caltech)
Projected performance
Ns-2: capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps100 sources, 100 ms round trip propagation delay
’01155
’02622
’032.5
’04 5
’05 10
J. Wang (Caltech)
Throughput as function of the time
Chicago -> CERN
Linux kernel 2.4.19
Traffic generated by iperf (I measure the throughput over the last 5 sec)
TCP single stream
RTT = 119ms MTU = 1500
Duration of the test : 2 hours
0
100
200
300
400
500
0 1000 2000 3000 4000 5000 6000 7000Time (s)
Thr
ough
put
(Mb/
s)
By Sylvain Ravot (Caltech)
Current TCP (Linux Reno)
As MTU increase…
1.5K, 4K, 9K …
0
200
400
600
800
1000
0 1000 2000 3000
Time (s)
Thr
ough
put
(Mb/
s)
MTU=1498
MTU=3998
MTU=8988
By Sylvain Ravot (Caltech)
Current TCP (Linux Reno)
Better?????
By Some Dreamers (Somewhere)
FAST
Network• CERN (Geneva) SLAC (Sunnyvale), GE, Standard MTU
Sunnyval -> CERN
Linux kernel 2.4.18-FAST enabled
RTT = 180 ms
MTU = 1500
By C. Jin & D. Wei (Caltech)
Theoretical Background
Congestion control
xi(t)
xi(t)
xi(t)
liRli link uses source if 1
RliRli link usenot does source if 0
Congestion control
xi(t)
Example congestion measure pl(t)
– Loss (Reno)– Queueing delay (Vegas)
pl(t)
xi(t) xi(t)→pl(t)
liRli link uses source if 1
Xi
iill xRy ,
AQM:yl(t)
TCP
TCP/AQM
• Congestion control is a distributed asynchronous algorithm to share bandwidth
• It has two components– TCP: adapts sending rate (window) to congestion
– AQM: adjusts & feeds back congestion information
• They form a distributed feedback control system– Equilibrium & stability depends on both TCP and AQM
– And on delay, capacity, routing, #connections
pl(t)
xi(t)TCP: Reno Vegas
AQM: DropTail RED REM/PI AVQ
MethodologyProtocol (Reno, Vegas, RED,
REM/PI…)
Equilibrium Performance
Throughput, loss, delay
Fairness Utility
Dynamics Local stability Cost of stabilization
))( ),(( )1(
))( ),(( )1(
txtpGtp
txtpFtx
Goal: Fast AQM Scalable TCP
• Equilibrium properties– Uses end-to-end delay (and loss) as congestion measure
– Achieves any desired fairness, expressed by utility function
– Very high bandwidth utilization (99% in theory)
• Stability properties– Stability for arbitrary delay, capacity, routing & load
– Good performance• Negligible queueing delay & loss introduced by the protocol
• Fast response
Implementation and Experiment
Implementation
First Version (demonstrated in SuperComputing Conf, Nov 2002):
• Sender-side kernel modification (Good for File sharing service)
• Challenges:– Effects ignored in theory– Large window size and high speed
SCinet Caltech-SLAC experiments
netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
Network Topology
Sunnyvale Baltimore
Chicago
Geneva
3000km 1000km
70
00
km
C. Jin, D. Wei, S. LowFAST Team and Partners
FAST TCP Standard MTU Peak window = 14,255 pkts Throughput averaged over > 1hr
• 925 Mbps single flow/GE card
9.28 petabit-meter/sec
1.89 times LSR• 8.6 Gbps with 10 flows
34.0 petabit-meter/sec
6.32 times LSR• 21TB in 6 hours with 10 flows
Highlights
1 2
1
2
7
9
10G
enev
a-Sunnyv
ale
Baltim
ore-
Sunn
yval
eFA
ST
I2 L
SR
#flows
FAST BMPS
flows Bmps
Peta
Thruput
Mbps
Distance
km
Delay
ms
MTU
B
Duration
s
Transfer
GB
Path
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI
29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA –
ISI, Va
Caltech-SLAC
19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC
19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Caltech-SLAC
18.11.2002
7 24.17 6,123 3,948 85 1,500 21,600 15,396 Baltimore -Sunnyvale
Caltech-SLAC
19.11.2002
9 31.35 7,940 3,948 85 1,500 4,030 3,725 Baltimore -Sunnyvale
Caltech-SLAC
20.11.2002
10 33.99 8,609 3,948 85 1,500 21,600 21,647 Baltimore -SunnyvaleMbps = 106 b/s; GB = 230 bytes
• C. Jin, D. Wei, S. Low• FAST Team and Partners
FAST BMPS
flows Bmps
Peta
Thruput
Mbps
Distance
km
Delay
ms
MTU
B
Duration
s
Transfer
GB
Path
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI
29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA –
ISI, Va
Caltech-SLAC
19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC
19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Mbps = 106 b/s; GB = 230 bytes• C. Jin, D. Wei, S. Low
• FAST Team and Partners
FAST BMPS
flows Bmps
Peta
Thruput
Mbps
Distance
km
Delay
ms
MTU
B
Duration
s
Transfer
GB
Path
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI
29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA –
ISI, Va
Caltech-SLAC
19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC
19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Mbps = 106 b/s; GB = 230 bytes• C. Jin, D. Wei, S. Low
• FAST Team and Partners
FAST BMPS
flows
Bmps
Peta
Thruput
Mbps
Distance
km
Delay
ms
MTU
B
Duration
s
Transfer
GB
Path
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI
29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA –
ISI, Va
Caltech-SLAC
19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC
19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Caltech-SLAC
18.11.2002
7 24.17 6,123 3,948 85 1,500 21,600 15,396 Baltimore -Sunnyvale
Caltech-SLAC
19.11.2002
9 31.35 7,940 3,948 85 1,500 4,030 3,725 Baltimore -Sunnyvale
Caltech-SLAC
20.11.2002
10 33.99 8,609 3,948 85 1,500 21,600 21,647 Baltimore -Sunnyvale
Mbps = 106 b/s; GB = 230 bytes• C. Jin, D. Wei, S. Low
• FAST Team and Partners
FAST Aggregate throughput
1 flow 2 flows 7 flows 9 flows 10 flows
Average utilization
95%
92%
90%
90%
88%FAST• Standard MTU• Utilization averaged over > 1hr
1hr 1hr 6hr 1.1hr 6hr
C. Jin, D. Wei, S. Low
FAST vs Linux TCP (2.4.18-3)
Linux TCP Linux TCP FAST
Average utilization
19%
27%
92%FAST• Standard MTU• Utilization averaged over 1hr
txq=100 txq=10000
95%
16%
48%
Linux TCP Linux TCP FAST
2G
1G
C. Jin (Caltech)
Trial Deployment
FAST Kernel Installed:• SLAC: Les Cottrell, etc.
www-iepm.slac.stanford.edu/monitoring/bulk/fast• FermiLab: Michael Ernst, etc.
Coming soon:• 10-Gbps NIC Testing (Sunnyval - CERN)• Internet2• …
Detailed Information:
• Home Page: http://Netlab.caltech.edu/FAST
• Theory: http://netlab.caltech.edu/FAST/overview.html
• Implementation & Testing: http://netlab.caltech.edu/FAST/software.html
• Publications: http://netlab.caltech.edu/FAST/publications.html
FAST
netlab.caltech.edu/FAST
• Theory
D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)
• PrototypeC. Jin, D. Wei
• Experiment/facilities– Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (C
altech/CERN), S. Singh– CERN: O. Martin, P. Moroni– Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip– DataTAG: E. Martelli, J. P. Martin-Flatin– Internet2: G. Almes, S. Corbato– Level(3): P. Fernes, R. Struble– SCinet: G. Goddard, J. Patton– SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil,
J. Williams– StarLight: T. deFanti, L. Winkler
– TeraGrid: L. Winkler
• Major sponsors
ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
Acknowledgments
Thanks
Questions?