110G networking within JASMIN
-
Upload
jisc -
Category
Technology
-
view
199 -
download
0
Transcript of 110G networking within JASMIN
Jonathan Churchill, Campus network engineering workshop19/10/2016 100G networking within
JASMIN
100G networking within JASMIN
Campus network engineering for data-intensive science workshop
October 19th 2016
Jonathan ChurchillJASMIN Infrastructure Manager
( STFC Scientific Computing Dept.)
JASMIN is a world leading, unique hybrid of:• 16PB high performance storage (~250GByte/s) • High-performance computing (~4,000 cores)• 35PB Archive and Elastic Tape• Non-blocking Networking (> 3Tbit/sec),
and Optical Private Network WAN’s• Coupled with cloud hosting capabilities
Cloud is here !
17PB
40PB5,000
JASMIN 1
JASMIN 2JASMIN 3
DMZ
Cloud Lives here
JASMIN 4,5 (2016–20) …)
Storage and Servers distributed over the fabric network.
JASMIN
3.5
LOTUS
100G lives here
JASMIN “Fabric” Networking
The need for speed
347Gbps
347Gbps = 34,700 Broadband connections
JC2-LSW1 JC2-LSW1 JC2-LSW1JC2-LSW1 JC2-LSW1 JC2-LSW1 JC2-LSW1 JC2-LSW1 JC2-LSW1JC2-LSW1 JC2-LSW1 JC2-LSW1
48 * 16 = 768 10GbE Non-blocking16 x 12 x 40GbE = 192 40GbE ports
S1036 = 32 x 40GbE
JC2-LSW1JC2-LSW1
JC2-SP1 JC2-SP1 JC2-SP1 JC2-SP1 JC2-SP1 JC2-SP1
16 x MSX1024B-1BFS48x10GBE + 12 40 GbE
16 x 12 40GbE = 192 Ports / 32 = 6Total 192 40 GbE Cable
1,900 @ 10GbE Ports
• Non-Blocking. Zero Contention (48x10Gb = 12x 40Gb uplinks)• Low Latency (250nS L3 / per switch/router). • Cheap(er) • But its all layer 3 routed (ECMP OSPF)
954 Routes
954 Routes
Bandwidth ?? Data via the DTZThrough the IaaS firewall
DTZ Bandwidth 1:1 match to IaaS hypervisors.
Data rates inside PaaS = IaaS ?• How can we provide data rate
access to Iaas Cloud tenants at similar rates to “inside” JASMIN (PaaS) ? – aka 100Gbits/sec
Non Blocking data access inside JASMIN
SP1 SP2 SP3 SP4
LSW1
host001host002
host024
iSCSI
Underlay networks
LSW2
host025host026
host027
LSW3172.26.66.64/26
172.26.66.0/26
LSWn
~10x 10-12Gbps per “Bladeset”
24x 10Gbps
172.16.136.0/24
172.16.137.0/24
12x 40Gb ECMP uplinksper switch/router
A non-blocking to IaaS cloud needs to duplicate or fit into this fabric.
And still 1:1 using 10Gb servers
LSW21
Non Blocking data access to JASMIN IaaS via 100G ?
SP1 SP2 SP3 SP4
LSW1
host001host002
host024
iSCSI
Underlay networks
LSW2
host025host026
host027
LSW3172.26.66.64/26
172.26.66.0/26
LSW20
1:10 server to client
LSWn
~10x 10-12Gbps per “Bladeset”
24x 10Gbps
172.16.136.0/24
172.16.137.0/24
host-100G-1host-100G-2
12x 40Gb ECMP uplinksper switch/router
vmhost1vmhost2
vmhost24
24x 10Gbps
“Blessed” private subnet
Hardware• Mellanox Connect-X4 Dual port 100Gb QSFP+ DA
– Dell R730XD servers.– VXLAN/NV|GRE and Erasure Coding offload in h/w
• Mellanox Dual MSN2100 16 port x 100G switch/routers
Potential Issues• Blocking “backdoor” access across the fabric
– Port ingress/egress ACL’s ?• …but trunked VLAN’s or VXLAN’s at hypervisor port(s)
• Performance impact of VXLAN terminations• 100G on the server at all
– cf. 1->10Gb kernel tuning transition– 2x 100Gb ports > PCI3 bandwidth limited to 120Gb
• Can the software keep up ?
100G server software ?
host-100G-1
tomcat-1 tomcat-4tomcat-2 tomcat-3
apache/nginx Load Balancing
• OpenDAP– Parallel servers and threads ?– CPU and RAM implications– JVM memory issues ?
Summary• Target :
Provide “non-blocking” data access to JASMIN IaaS.
• Use of 100Gb Networking :– Reduces server count– Scaleable for growing infrastructure
• Experimental. Many potential issues to resolve:– Fabric routing egress/ingress ACLs– 100G kernel tuning ?– Can the software keep up
Questions