E-Science All Hands Meeting 1-4 Sep 03 R. Hughes-Jones Manchester 1 High Bandwidth High Throughput...
-
Upload
brian-larsen -
Category
Documents
-
view
215 -
download
1
Transcript of E-Science All Hands Meeting 1-4 Sep 03 R. Hughes-Jones Manchester 1 High Bandwidth High Throughput...
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
1
High Bandwidth High Throughput in the MB-NG & DataTAG Projects
Richard Hughes-Jones, Stephen Dallison , Gareth Fairey Dept. of Physics and Astronomy, University of Manchester
Robin TaskerDaresbury Laboratory CLRC
Miguel Rio, Yee Ting LiDept. of Physics and Astronomy, University College London
MB - NG
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
2
Topology of the MB – NG Network
KeyGigabit Ethernet2.5 Gbit POS Access
MPLS Admin. Domains
MB - NG UCL Domain
UKERNADevelopment
NetworkEdge Router Cisco 7609
man01
man03
Boundary Router Cisco 7609
Boundary Router Cisco 7609
RAL Domain
Manchester Domain lon01
lon02
lon03
man02
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
3
r04gvaCisco7606
r04chi-Cisco7609stm16(DTag)
r05chi-JuniperM10
r06chi-Alcatel7770
r05gva-JuniperM10
r06gvaAlcatel7770
cernh4-Cisco7609ar3-chicago -Cisco7606
stm4(DTag)
cernh7-Cisco7609
SURFNET
stm16(Colt)backup+projects
s01gvaExtreme S1i
w01gvaw02gvaw03gvaw04gvaw05gvaw06gvaw20gvav02gvav03gva
7x
w01chiw02chi
v10chiv11chiv12chiv13chi
s01chiExtreme S5i
8x
3x
GEANT
VTHD/INRIA
stm16(FranceTelecom)
DataTAG
CERN/Caltech production NetworkChicago Geneva
s02gva Cisco5505-management
2x
2x
ONS15454
ONS15454
Alcatel 1670 Alcatel 1670
CANARIE
SURFNETCESNET
ONS15454
stm64(GC)
Cisco2950-management
SWITCHStm16(Swisscom)
CNAF
Datatag Testbed
GEANTcernh7
[email protected] last update: 20030701
w03chiw04chiw05chi
3x 3x 2x
2x
1000baseSX
SDH/Sonet
1000baseT
10GbaseLX
w03
w06chi
w01bol
CCC tunnel
STM64
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
4
End Hosts how good are they really ?
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
5
End Hosts b2b & end-to-end UDP Tests Test with UDPmon
Supermicro P4DP6
Max throughput 975Mbit/s 20% CPU utilisation receiver
packets > 1000 bytes 40% CPU utilisation smaller
packets
PCI:64 bit 66 MHz Latency 6,1ms & well behaved Latency Slope 0.0761 µs/byte B2B Expect: 0.0118 µs/byte
PCI 0.00188 GigE 0.008 PCI 0.00188
6 routers
Jitter small 2-3 µs FWHM
lon3-man1
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Delay between sending the frames us
Recv W
ire r
ate
Mb
its/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
1472 bytes w=100 jitter UDP
0
10000
20000
30000
40000
50000
60000
0 100 200 300Latency us
1472 bytes w=200 jitter UDP
0
10000
20000
30000
40000
50000
60000
70000
0 100 200 300Latency us
1472 bytes w=300 jitter UDP
0
10000
20000
30000
40000
50000
60000
0 100 200 300Latency us
lon3-man1
y = 0.0761x + 6075.9
6080
61006120
6140
6160
61806200
6220
0 200 400 600 800 1000 1200 1400Message length bytes
Late
ncy
us
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
6
Send PCI
Receive PCI
Send setup
Data Transfers
Receive Transfers
Send PCI
Receive PCI
Send setup
Data Transfers
Receive Transfers
Signals on the PCI bus 1472 byte packets every 15 µs Intel Pro/1000
PCI:64 bit 33 MHz
82% usage
PCI:64 bit 66 MHz
65% usage
Data transfers half as long
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
7
Interrupt Coalescence Investigations Kernel parameters for
Socket Buffer size rtt*BW
TCP mem-mem lon2-man1 Tx 64 Tx-abs 64 Rx 0 Rx-abs 128 820-980 Mbit/s +- 50 Mbit/s
Tx 64 Tx-abs 64 Rx 20 Rx-abs 128 937-940 Mbit/s +- 1.5 Mbit/s
Tx 64 Tx-abs 64 Rx 80 Rx-abs 128 937-939 Mbit/s +- 1 Mbit/s
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
8
txqueuelen-vs-sendstalls
Tx Queue located betweenIP stack & NIC driver
TCP treats ‘Queue full’ as congestion !
Results for Lon Man
Select txqueuelen =2000
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
9
Network Investigations
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
10
Network BottlenecksBackbones 2.5 and 10 Gbit – usually good (in Europe)Access links need care GEANT-NRN and Campus – SuperJANET4NNW – SJ4 Access: given as example of good forward planning:
10 November 2002
1 Gbit link
24 February 200326 Feb 2003
Upgraded to 2.5 Gbit
Trunking – use of multiple 1 Gbit Ethernet links
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
11
24 Hours HighSpeed TCP mem-mem
TCP mem-mem lon2-man1 Tx 64 Tx-abs 64 Rx 64 Rx-abs 128 941.5 Mbit/s +- 0.5 Mbit/s
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
12
1 stream every 60 s: man1 lon2 man2 lon2 man3 lon2
Sample every 10ms
1 Stream: Average 940 Mbit/s No Dup ACKs No SACKs No Sendstalls
2 Streams: Average ~500 Mbit/s Many Dup ACKs Cwnd reduced
2 Streams: Average ~300 Mbit/s
TCP sharing man1-lon2
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
13
2Streams:
Dips in throughput due to Dup ACK
~4 losses /sec A bit regular ?
Cwnd decreases: 1 point 33% Ramp starts at 62% Slope 70Bytes/us
2 TCP streams man1-lon2
1 sec
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
14
Standard TCPHighSpeed TCPScalable TCP
kernel on the receiver dropped packets periodically
MB-NG Network rtt 6.2 ms.Recovery time 1.6s
DataTAG Network rtt 119 ms. Recovery time 590s 9.8 min
Throughput of the DataTAG network was factor ~5 lower than that on the MB-NG network
TCP Protocol Stack Comparisons
MB-NG
DataTAG
MSS
RTTC
*2
* 2
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
15
Application Throughput
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
16PC PC
MB – NG SuperJANET4 Development Network
UCL
OSM-1OC48-POS-SS
MCC
OSM-1OC48-POS-SS
MAN
Gigabit Ethernet2.5 Gbit POS Access2.5 Gbit POS coreMPLS Admin. Domains
SJ4 Dev
SJ4 DevSJ4
Dev
SJ4 Dev
PC PCPC
3ware RAID0
PC
3ware RAID0
MB - NG
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
17
Gridftp Throughput HighSpeedTCP RAID0 Disk Tests:
120 Mbytes/s Read 100 Mbytes/s Write
Int Coal 64 128 Txqueuelen 2000 TCP buffer 1 M byte
(rtt*BW = 750kbytes)
Interface throughput
Data Rate: 520 Mbit/s
Same for B2B tests
So its not that simple!
TCP ACK traffic
Data traffic
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
18
Gridftp Throughput + Web100
Throughput Mbit/s:
See alternate 600/800 Mbitand zero
Cwnd smooth No dup Ack / send stall /
timeouts
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
19
http data transfers HighSpeed TCP
Bulk data moved by web servers Apachie web server
out of the box! prototype client - curl http library 1Mbyte TCP buffers 2Gbyte file Throughput ~720 Mbit/s Cwnd - some variation No dup Ack / send stall /
timeouts
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
20
BaBar Case Study: Disk Performace
BaBar Disk Server Tyan Tiger S2466N
motherboard 1 64bit 66 MHz PCI bus Athlon MP2000+ CPU AMD-760 MPX chipset 3Ware 7500-8 RAID5 8 * 200Gb Maxtor IDE
7200rpm disks Note the VM parameter
readahead max
Disk to memory (read)Max throughput 1.2 Gbit/s 150 MBytes/s)
Memory to disk (write)Max throughput 400 Mbit/s 50 MBytes/s)[not as fast as Raid0]
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
21
BaBar Case Study: Throughput & PCI Activity
3Ware forces PCI bus to 33 MHz BaBar Tyan to MB-NG SuperMicro
Network mem-mem 619 Mbit/s
Disk – disk throughput bbcp 40-45 Mbytes/s (320 – 360 Mbit/s)
PCI bus effectively full!
Read from RAID5 Disks Write to RAID5 Disks
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
22
Conclusions
The MB-NG Project has achieved: Continuous memory to memory data transfers with an average user data rate of
940 Mbit/s for over 24 hours using the HighSpeed TCP stack. Sustained high throughput data transfers of 2 GByte files between RAID0 disk
systems using Gridftp and bbcp. Transfers of 2 GByte files using the http protocol from the standard apache Web
server and HighSpeed TCP that achieved data rates of ~725 Mbit/s. Ongoing operation and comparison of different Transport Protocols
- Optical Switched Networks Detailed investigation of Routers, NICs & end-host performance. Working with e-Science groups to get high performance to the user.
Sustained data flows at Gigabit rates are achievable Use Server quality PCs not Supermarket PCs + care with interfaces Be kind to the Wizards !
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
23
More Information Some URLs
MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/UDPmon / TCPmon kit + writeup:
http://www.hep.man.ac.uk/~rich/netMotherboard and NIC Tests:
www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/
TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
24
Backup Slides
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
25
EU Review Demo Consisted of:
Raid0Disk
Data over TCP Streams
Raid0Disk
GridFTP GridFTP
Dante MonitoringNode Monitoring Site Monitoring
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
26
Throughput on the day !
TCP ACKs
Data~400 Mbit/s
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
27
Some Measurements of Throughput CERN -SARAStandard TCP txlen 100 25 Jan03
0
100
200
300
400
500
1043509370 1043509470 1043509570 1043509670 1043509770
Time
I/f
Rat
e M
bits
/s
00.20.40.60.811.21.41.61.82
Re
cv. R
ate
Mb
its/s
Out Mbit/s In Mbit/s
Hispeed TCP txlen 2000 26 Jan03
0
100
200
300
400
500
1043577520 1043577620 1043577720 1043577820 1043577920Time
I/f
Rat
e M
bits
/s
00.20.40.60.811.21.41.61.82
Recv. R
ate
Mbits
/s
Out Mbit/s
In Mbit/s
Using the GÉANT Backup Link 1 GByte file transfers
Blue Data
Red TCP ACKs
Standard TCPAverage Throughput 167 Mbit/s
Users see 5 - 50 Mbit/s!
High-Speed TCPAverage Throughput 345 Mbit/s
Scalable TCPAverage Throughput 340
Mbit/s
Scalable TCP txlen 2000 27 Jan03
0
100
200
300
400
500
1043678800 1043678900 1043679000 1043679100 1043679200Time
II/f
Rat
e M
bits
/s
00.20.40.60.811.21.41.61.82
Re
cv. R
ate
Mb
its/s
Out Mbit/s
In Mbit/s
e-Science All Hands Meeting 1-4 Sep 03R. Hughes-Jones Manchester
28
What the Users Really find:CERN – RAL using production GÉANT
CMS Tests 8 streams
50 Mbit/s @ 15 MB buffer
Firewall 100 Mbit/s
NNW – SJ4 Access1 Gbit link
CERN -RAL 12 Dec 02
0102030405060708090
0 10 20 30 40 50time 0.5 hr
hro
ughput
Mbit/s
Total RateRate/Stream