U.C. Berkeley and LBNL

1
U.C. Berkeley and LBNL U.C. Berkeley and LBNL • Global address space languages support • Global pointers and distributed arrays • User controls layout of data across nodes • Direct read and write to remote memory • Single Program Multiple Data (SPMD) control • Similar to using threads, but with remote accesses • Global synchronization, barriers • Languages: UPC, Co-Array Fortran, Titanium • GASNet - A common communication system tailored for global address space languages Distributed Data Structures Latency Performance GASNet Goals • Language-independence: Compatibility with several global-address space languages and compilers • UPC, Titanium, Co-array Fortran, possibly others.. • Hide language- or compiler-specific details, such as shared-pointer representation • Hardware-independence: variety of parallel architectures & OS's • SMP: Linux/UNIX SMP's, Origin 2000, etc. • Clusters of uniprocessors or SMP's: IBM SP, Compaq AlphaServer, Linux/UNIX clusters, etc. • Support many high-performance networks: MPI, Myrinet/GM, Quadrics/elan, IBM/LAPI, Infiniband • Ease of implementation on new hardware • Allow quick prototype implementations • Implementations can leverage performance features of hardware • Provide both portability & performance GASNet Core API Global Address Space Languages GASNet Extended API Bandwidth Performance Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Kathy Yelick Parry Husbands, Costin Iancu, Mike Welcome, Kathy Yelick Compiler-generated code Compiler-specific runtime system GASNet Extended API GASNet Core API Network Hardware • Wider interface that includes more complicated operations • We provide a reference implementation of the extended API in terms of the core API • Implementors can choose to directly implement any subset for performance - leverage hardware support for higher-level operations • Most basic required network primitives • Implemented directly on each platform • Minimal set of network functions needed to support a working implementation • General enough to implement everything else • Based heavily on active messages paradigm • Provides powerful extensibility mechanism G ASN etPut/G etLatency (m in overm sg sz) 0 5 10 15 20 25 30 35 40 mpi- refext elan- refext elan- elan mpi- refext elan- refext elan- elan mpi- refext gm -gm mpi- refext gm -gm m icroseconds put_nb get_nb quadrics -falcon quadrics -lem ieux myrinet-millennium myrinet-alvarez G ASN etP ut/G etB ulk B andwidth (m ax overm sg sz) 0 50 100 150 200 250 300 mpi- refext elan- refext elan- elan mpi- refext elan- refext elan- elan mpi- refext gm -gm mpi- refext gm -gm M B/sec put_nb_bulk get_nb_bulk quadrics -falcon quadrics -lem ieux myrinet-millennium myrinet-alvarez G ASNetM yrinet/G M Bandw idth 0 20 40 60 80 100 120 140 160 180 200 0 10000 20000 30000 40000 50000 60000 70000 size (bytes) M B/sec get_bulk (blocking) get_bulk (non-blocking) put_bulk (blocking) put_bulk (non-blocking) G ASNetM yrinet/G M Latency 0 10 20 30 40 50 60 1 10 100 1000 10000 size (bytes) m icroseconds get(blocking) get(non-blocking) put(blocking) put(non-blocking) G ASNetQ uadrics/elan B andw idth 0 50 100 150 200 250 300 0 10000 20000 30000 40000 50000 60000 70000 size (bytes) M B/sec get_bulk (blocking) get_bulk (non-blocking) put_bulk (blocking) put_bulk (non-blocking) G ASNetQ uadrics/elan Latency 0 2 4 6 8 10 12 14 16 18 20 1 10 100 1000 10000 size (bytes) m icroseconds get(blocking) get(non-blocking) put(blocking) put(non-blocking) http:// http:// upc.nersc.gov upc.nersc.gov [email protected] [email protected]

description

Compiler-generated code. Compiler-specific runtime system. GASNet Extended API. GASNet Core API. Network Hardware. Distributed Data Structures. U.C. Berkeley and LBNL. http://upc.nersc.gov. [email protected]. Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, - PowerPoint PPT Presentation

Transcript of U.C. Berkeley and LBNL

Page 1: U.C. Berkeley and LBNL

U.C. Berkeley and LBNLU.C. Berkeley and LBNL

• Global address space languages support

• Global pointers and distributed arrays

• User controls layout of data across nodes

• Direct read and write to remote memory

• Single Program Multiple Data (SPMD) control

• Similar to using threads, but with remote accesses

• Global synchronization, barriers

• Languages: UPC, Co-Array Fortran, Titanium

• GASNet - A common communication system tailored for global address space languages

Distributed Data Structures

Latency Performance

GASNet Goals• Language-independence: Compatibility with several global-address

space languages and compilers

• UPC, Titanium, Co-array Fortran, possibly others..

• Hide language- or compiler-specific details, such as shared-pointer representation

• Hardware-independence: variety of parallel architectures & OS's

• SMP: Linux/UNIX SMP's, Origin 2000, etc.

• Clusters of uniprocessors or SMP's: IBM SP, Compaq AlphaServer, Linux/UNIX clusters, etc.

• Support many high-performance networks:MPI, Myrinet/GM, Quadrics/elan, IBM/LAPI, Infiniband

• Ease of implementation on new hardware

• Allow quick prototype implementations

• Implementations can leverage performance features of hardware

• Provide both portability & performance

GASNet Core API

Global Address Space Languages

GASNet Extended API

Bandwidth Performance

Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Kathy YelickParry Husbands, Costin Iancu, Mike Welcome, Kathy Yelick

Compiler-generated code

Compiler-specific runtime system

GASNet Extended API

GASNet Core API

Network Hardware

• Wider interface that includes more complicated operations

• We provide a reference implementation of the extended API in terms of the core API

• Implementors can choose to directly implement any subset for performance - leverage hardware support for higher-level operations

• Most basic required network primitives• Implemented directly on each platform

• Minimal set of network functions needed to support a working implementation

• General enough to implement everything else• Based heavily on active messages paradigm

• Provides powerful extensibility mechanism

GASNet Put/Get Latency(min over msg sz)

0

5

10

15

20

25

30

35

40

mpi-refext

elan-refext

elan-elan

mpi-refext

elan-refext

elan-elan

mpi-refext

gm-gm mpi-refext

gm-gm

mic

rose

cond

s

put_nbget_nb

quadrics -falcon quadrics - lemieux myrinet - millennium myrinet - alvarez

GASNet Put/Get Bulk Bandwidth (max over msg sz)

0

50

100

150

200

250

300

mpi-refext

elan-refext

elan-elan

mpi-refext

elan-refext

elan-elan

mpi-refext

gm-gm mpi-refext

gm-gm

MB

/sec

put_nb_bulkget_nb_bulk

quadrics - falcon quadrics - lemieux myrinet - millennium myrinet - alvarez

GASNet Myrinet/GM Bandwidth

0

20

40

60

80

100

120

140

160

180

200

0 10000 20000 30000 40000 50000 60000 70000size (bytes)

MB

/sec get_bulk (blocking)

get_bulk (non-blocking)put_bulk (blocking)put_bulk (non-blocking)

GASNet Myrinet/GM Latency

0

10

20

30

40

50

60

1 10 100 1000 10000size (bytes)

mic

rose

cond

s

get (blocking)get (non-blocking)put (blocking)put (non-blocking)

GASNet Quadrics/elan Bandwidth

0

50

100

150

200

250

300

0 10000 20000 30000 40000 50000 60000 70000size (bytes)

MB

/sec

get_bulk (blocking)get_bulk (non-blocking)put_bulk (blocking)put_bulk (non-blocking)

GASNet Quadrics/elan Latency

0

2

4

6

8

10

12

14

16

18

20

1 10 100 1000 10000size (bytes)

mic

rose

cond

s

get (blocking)get (non-blocking)put (blocking)put (non-blocking)

http://upc.nersc.govhttp://upc.nersc.gov [email protected]@lbl.gov