High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

32
Cray User Group Meeting 24-28 May 1999, Minneapolis,USA High Speed Supercomputer Communications in Broadband Networks [email protected] et al. 1 High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger Research Center Jülich GmbH [email protected] Helmut Grund, Ferdinand Hommes, Eva Pless GMD - German National Research Center for Information Technology [email protected], [email protected], [email protected]

description

High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger Research Center Jülich GmbH [email protected] Helmut Grund, Ferdinand Hommes, Eva Pless GMD - German National Research Center for Information Technology - PowerPoint PPT Presentation

Transcript of High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Page 1: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

1

High Speed Supercomputer Communicationsin Broadband Networks

Ralph NiederbergerResearch Center Jülich GmbH

[email protected]

Helmut Grund, Ferdinand Hommes, Eva PlessGMD - German National Research Center for Information Technology

[email protected], [email protected], [email protected]

Page 2: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

2

Introduction

• Introduction

• GTB West – Goals, Projects, Timeframes and Configuration– Super Computer Impediments and Solutions

• Status of Cray Super Computer Communications• Future Tests• Summary

Page 3: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

3

• New kinds of Microprocessors and expansion of internal storage lead to new kinds of supercomputing systems solving best different kinds of problems.

• Two mostly known types of supercomputers are massively parallel systems and vector systems.

• A new kind of supercomputer is the Metacomputer.

• A Metacomputer distributes an application onto 2 or more equal or distinct machines which are coupled dynamically via an external network.

• This distribution may be done by quality (functional distribution) or by quantity.

Introduction

Page 4: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

4

Distribution of an application onto more than one system only

recommended, if computation time can be decreased significantly.

This depends ondegree of parallalization and

time of communication between processes

Communication time depends on:communication medium and protocol

length of communication link

number of intermediate systems

performance of communicating systems (cpu, internal communication, ...)

Introduction

Page 5: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

5

GTB - WestProject sponsored by BMBF and DFN with financial participation of the project

partners

Partners:

Research Center Jülich GmbH http://www.fz-juelich.de

GMD - German National Research Center for Information Technologyhttp://www.gmd.de

Deutsches Klimarechenzentrum http://www.dkrz.de

Alfred Wegener Inst. for Polar & Marine Res. http://www.awi.de

Pallas GmbH http://www.pallas.de

o.tel.o http://www.o-tel-o.de

Runtime: Aug, 1st 1997 - Jan, 31th 2000

More Info: http://www.fz-juelich.de/gigabit

Page 6: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

6

GTB - West Projects

• Giganet - Configuration, Management and Performance Analysis of the Gigabit Testbed

• Methods and Tools, Software Support

• Solute Transport in Ground Water

• Algorithmic Analysis of Magnetoenzephalography Data

• Complex Visualization over a Gigabit WAN

• Multimedia applications in a Gigabit WAN

• Distributed calculations of climate and weather models

• Porting Parallel and Distributed Applications from CEC CISPAR Project

Page 7: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

7

GTB West - Goals • Demonstrate the usefulness of high speed wide-area communication

networks for scientific computing

• Engage in selected applications which are known to need very high communication bandwidth

• Major objective:

– coupling of architecturally different supercomputers

i.e. vector computers and massively parallel computers

to build a new kind of metacomputer

• strengthen the know how in

– high speed computer communications,

– metacomputing in LAN and WAN environments

– coupling of the super computer centers in Germany

Page 8: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

8

Status (Phase 1)

• Installation on top of o.tel.o high tension cables

• most problems at last mile

• at GMD underground workings necessary

• at Research Center Jülich installation of fiber cables together with

hot water supply

• o.tel.o offers SDH infrastructure and uses Lucent technologies

• no major problems using o.tel.o trunc lines have been found

Page 9: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

9

Trunc lines (Phase 1)

Repeater

Page 10: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

10

Status

• 622 Mbit/s link stable for one year• CRAY T3E Supercomputer connected with 155 Mbit/s ATM• FZJ GMD link upgrade (622 Mbit/s 2.4 Gbit/s): End of July 1998 • Aug. 5. 1998:

– first ATM-WAN connection with 2.4 Gbit/s user data (8 Workstations with 155 / 622 Mbit/s interfaces)

– 96.4% (TCP)– 99.97% (UDP) (high packet loss)

• Beta test FORE ASX- 4000 ended• Beta test HiPPI to ATM gateway (SUN and SGI) ongoing• Throughput and delay measurement ongoing • Monitoring and accounting of trunc line with

HP-OpenView at GMD

Page 11: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

11

IBM SP2 internals

external machine

external machine

....

....

10 Mbps

SP2-Nodes:

external machine

external machine

800 Mbps

external machine

external machine

....

622 Mbps

internal Ethernet

external Ethernet

....

155 Mbps800 Mbps

10 Mbps

.... ....

HP-Switch HP-Switch

HIPPISwitch

HP-Switch

ATMSwitch

Frame 1Frame 3 Frame 2

Page 12: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

12

Cray T3E internals

800 Mbps

external machine

external machine

....

.... .... ....

FDDIEthernet

communicationnodes:

4 proc

GigaRing

10 Mbps

....

HIPPISwitch

FDDIRing

HIPPI

T3E-processors: 4 proc 4 proc 4 proc 4 proc

external machine

external machine

800 Mbps

external machine

100 Mbps

800 Mbps

GigaRing

T3E-3D-Thorus

ATM

ATMSwitch

155 Mbps

external machine

GigaRing

external machine

external machine

....

Page 13: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

13

Cray T3E internals (2)

GigaRingGigaRing GigaRing

D-MPN MPN HPN

Support/OS PE

User PE

I/O Controller

EthernetATM FDDIATM

MPN:

Sbus-System with 200 MB

I/O Controller

I/O Controller

Device-Treiber PE

Page 14: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

14

Current problem:Communication throughput within and between supercomputers differs extremly

Example:Cray/T3E with internal communication throughput of 500 MB/s bidirectional into three dimensions (3D torus)

High speed external connections: (Fast-) Ethernet (10-100 Mb/s), FDDI (100 Mb/s) ,

HiPPI (800 Mb/s-1600 Mb/s), Super HiPPI ( 6400 Mb/s ),

ATM 155 Mb/s, 622 Mb/s - 2.4 Gb/s, Gigabit-Ethernet (1Gb/s),

Impediments

Page 15: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

15

Cray SystemsNetwork Environment

CRAY/T3E 256

155 Mb/s

ATM

Essential

HiPPI

EPS1004

CRAY/T3E 512

FDDIConcentrator

CiscoRouter

CiscoRouter

CRAY/J90 File Server

CRAY/J90Compute Server

CRAY/T90

JuNet

World Wide Internet

Connecting a Cray system with n systems 2 * n PVC entries

Page 16: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

16

PVC configurationnot recommended

sp2l18-a2

192.168.110.18

sp2l19-a2

192.168.110.19

sp2l20-a2

192.168.110.20

sp2l21-a2

192.168.110.21

sp2l22-a2

192.168.110.22

sp2l23-a2

192.168.110.23

sp2l24-a2

192.168.110.24

sp2l25-a2

192.168.110.25

fzj256

192.168.110.30

fzj512

192.168.110.31

fzjt90

192.168.110.32

fzjtst-1zam449192.168.110.33

fzjtst-2zam448192.168.110.34

fzjtst-3

192.168.110.35

fzjtst-4

192.168.110.36

fzjtst-5zam169192.168.110.37

fzjtst-6

192.168.110.38

fzjgatezam002192.168.110.39

gigaclipsrvzam304e1192.168.110.40

sp2l18-a2 118 110 134

sp2l19-a2 119 111 135

sp2l20-a2 120 112 136

sp2l21-a2 121 113 137

sp2l22-a2 122 114 138sp2l23-a2 123 115 139

sp2l24-a2 124 116 140

sp2l25-a2 125 117 141fzj256 118 119 120 121 122 123 124 125 101 102 103 104 105 106 107 108 109

fzj512 110 111 112 113 114 115 116 117 101 126 127 128 129 130 131 132 133

fzjt90 134 135 136 137 138 139 140 141 102 126 142 143 144 145 146 147 148fzjtst-1zam449 103 127 142

fzjtst-2zam448 104 128 143

fzjtst-3 105 129 144fzjtst-4 106 130 145

fzjtst-5zam169 107 131 146

fzjtst-6 108 132 147

fzjgatezam002 109 133 148

gigaclipsrvzam304e1

Page 17: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

17

High speed communication

Alternatives communicating between

CRAY/T3E and IBM/SP2 • rawHiPPI (800 Mb/s)

– HiPPI Tunneling (622 Mb/s, currently MTU 9180)

– HiPPI Sonet Extender (currently 155 Mb/s or 932 Mb/s)

• TCP/IP via HiPPI (622 Mb/s, currently MTU 9180 because of routing)

• nativeATM (155 Mb/s, 622 Mb/s) (Hardware ?, Software ?)

• TCP/IP via ATM (155 Mb/s, 622 Mb/s) (Hardware ?)

Page 18: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

18

Throughput considerations• Transmission time in fiber optics cables

tt = length of medium / (0,66 * c) with c = 300.000 km/s

additionally delays in routers, switches etc.

ttopt = 100 km / (0,66 * 300.000 km/s) = 1/2000 s = 0,5 ms

use path mtu discovery

apply socket buffers to bandwidth delay product

• BDP = (B * RTT) = 622 Mb/s * 0.5 ms 311 kb 40 kB

• use setsockopt to set:

– SO_SNDBUF und SO_RCVBUF 1 MB

– TCP_NODELAY=1 and TCP_WINSHIFT=4

Page 19: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

19

Throughput considerations (2)

155 Mbit/s 622 Mbit/s 2,4 Gbit/s %Capacity 155,520 622,080 2.488,320 100SDH overhead 5,760 23,040 92,160 3,7ATM overhead 14,128 56,513 226,053 9,1ATM user data 135,632 542,527 2.170,107 87,2ATM cells 353.208 1.412.830 5.651.321CIP TCP 9140 overhead 1,118 4,474 17,896 0,7CIP TCP 9140 user data 134,513 538,053 2.152,211 86,5CIP TCP 9140 packete/s 1840 7.358 29.434LANE TCP 1460 overhead 6,711 26,844 107,375 4,3LANE TCP 1460 user data 128,921 515,683 2.062,732 82,9LANE TCP 1460 packete/s 11.038 44.151 176.604

Values: Mbit/s

Page 20: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

20

Supercomputer - Impediments

CRAY T3E communication throughput measured

• Maximum of 115 Mb/s via TCP/IP over ATM MTU 9180 (Default MTU from standard)

• Maximum of 430 Mb/s via TCP/IP over HiPPI MTU 64 KB because of IP-Header fields

• Maximum of 530 Mb/s via raw HiPPI no real MTU limitation

Netperf between SUN Ultra/60 and SGI Origin 200 maximum of 535 Mb/s user data via 622 Mb/s ATM

Page 21: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

21

HIPPISwitch

FZJ

155 Mbit/s

Cisco LS1010

800 Mbit/s

8 x GMD

Legende:

IBM SP2

SUNatm-fore

atm-sun

SGI

Fore ASX-4000

2,4 Gbit/s

SUNatm-fore

atm-sun

SUNatm-sun

atm-fore

CRAY T3E256 Proc

Fore ASX-4000

SUN Ultra 22 Proc

SUN Ultra 60 2 Proc SGI O200

SUN Ultra 22 Proc

Cisco A 100

SUN Enterprise 4000

CRAY T3E512 Proc

SUN

HIPPISwitch

622 Mbit/sCRAY T90

16 Proc

Fore ASX-1000

Fore ASX-1000

Fore ASX-1000

Fore ASX-1000

Net topology GMD - FZJ

Page 22: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

22

Gigabit Testbed West Network Layout

FZJ GMD

SUN

HiPPI/Sbus

IBM /SP2CRAY/T3E

SGI/SUN

HiPPI/PCI

HiPPI 800 Mb/s

MTU 64 K

Gigabit Testbed West

110 km

ASX4000ASX4000

2.4 Gb/s

ATM

CiscoRouter

CiscoRouter

HiPPI 800 Mb/s

MTU 64 KATM 622 Mb/s 64K MTU

ATM 155 / 622 Mb/s 9K MTU

Page 23: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

23

Gigabit Tests July, 30th 1998

2.4 Gbps

ATM/SDH

622 Mbps

GMD

2.4 Gbit Interface

ATMSwitch

filou

SUN Enterprise 5000

baloo

SUN Ultra 60

Page 24: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

24

Gigabit Speed RecordAugust, 5th 1998

2.4 Gbps

ATM/SDH

3 * 622 Mbps

622 Mbps

ATMSwitch

622 Mbps

622 Mbps

622 Mbps

622 Mbps

ATMSwitch

155 Mbps

155 Mbps

622 Mbps

622 Mbps

FZJ GMD

Page 25: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

25

Problem: Interrupt rate of CRAY/T3E systems

Solution:

Create two logical networks upon one physical network

• network 1 with 64k MTU between gateway systems (exact MTU 65280)

as specified for CRAY systems on HiPPI networks

• network 2 with 9.180 MTU between directly connected ATM systems

Advantage:

MTU-Path-Discovery on the end systems will find maximum value to use.

Gigabit Testbed West Connecting CRAY T3E and IBM SP2 via separate network

MTU: 9180 4356 1500 9180 65280

Page 26: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

26

StatusCRAY HiPPI Testbed configuration

CRAY/T3E 512CRAY/T3E 256CRAY/J90CRAY/T90 CRAY/J90

Parallel

HiPPI card

Serial HiPPI card

2 4 6 8 90 1 3 5 7 10 11 12 13 14 15E

ther

net m

odul

e

134.94.72.1134.94.72.4

134.94.72.5 134.94.72.2 134.94.72.3 192.168.115.10192.168.115.6

192.168.115.26(gmdsp2)

HiPPI-Switch

192.168.115.25

Fore ASX4000

192.168.116.36192.168.110.49192.168.110.36

192.168.115.9SGI O200

192.168.115.5SUN Ultra 60

192.168.116.49

192.168.110.3192.168.116.3

(gmdsun)

Fore ASX4000

HPN1HPN1 HPN1 HPN1 HPN1

Page 27: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

27

Communication nominal and real throughput

FZJ GMD

CRAY T90

IBM SP2

ATM/SDH

ATMSwitch

ATMSwitch

CRAY T3E/256

H/A-router

HIPPI

Switch

H/A-router

HIPPI

Switch

Real: 430 Mbps 430 Mbps 530 Mbps 530 Mbps 530 Mbps 370 Mbps 370 Mbps

Nominal: 800 Mbps 800 Mbps 622 Mbps 2.4 Gbps 622 Mbps 800 Mbps 800 Mbps

CRAY T3E/512

Page 28: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

28

Gigabit Testbed West TCP-Gateway-Layout (Beta-Tests in Jülich)

250

CRAY/T3E (256)

SUNHiPPI/PCI

ATM 622 Mb/s MTU 9180 or 64 K

Serial HiPPI

800 Mb/s MTU 64 K

Parallel HiPPI800 Mb/s MTU 64 K

CRAY/T3E (512)

2 4 6 8 90 1 3 5 7 10 11 12 13 14 15

Eth

erne

t mod

ule

430 370 350 315

320 380 440

430 (direct)

350 (direct) 270 (gate)

340 (gate)

415

535

Serial HiPPI

800 Mb/s MTU 64 K

SGIHiPPI/PCI

Page 29: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

29

• Solve HiPPI problem. Using large MTU sizes (65280 kB) does not work correctly

• Testing the other Cray Systems with HiPPI to ATM gateway (T90, J90)

• Testing different configurations if testbed is available

– using 2 HPN1

– using 2 Communication nodes within CRAY/T3E

– using one Gateway for more than one machine

– using same HiPPI device for local and remote communication

– using multiple HiPPI devices for advanced throughput

Future TestsCRAY HiPPI Testbed configuration

Page 30: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

30

Possible future test scenario

Internal communication:

M1 Mm, N1 Nn

External communication:

Mm-k+1,Mm-k+2,..Mm (Multiplex of k HiPPI interfaces)

IP over HiPPI IP over ATM IP over HiPPI

Nn-j,Nn-j+1,...Nn (Multiplex of j HiPPI interfaces)

GatewayGateway

: :

multiple

HiPPI

ATM 2.4 Gb/s

multiple

HiPPIASX4000ASX4000

ATM 622 Mb/s

4*ATM 622 Mb/s

ATM 622 Mb/s

4*ATM 622 Mb/s

Page 31: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

31

Summary

• No problems left with 2,4 Gbit/s ATM/SDH trunc line

• Workstation systems can generate and transfer datastreams saturating a 622 Mbit/s ATM link

• Coupling of supercomputer systems over WANs with high bandwidth currently only possible with an HIPPI to ATM gateway solution and special configuration

But time is ready for gigabit transmissions.

Page 32: High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger

Cray User Group Meeting 24-28 May 1999, Minneapolis,USA

High Speed Supercomputer Communications in Broadband Networks

[email protected] et al.

32

Summary

• Applications are capable using gigabit networks.

• Metacomputing may become reality in LAN as well as in WAN environments

• Therefore supercomputer system designers have to prepare their systems with gigabit communication interfaces

„The net is the computer and the computer is the net“

((SuperComputer) Communications)!= (Super (ComputerCommunications))