Download - BigBen @ PSC

Transcript
Page 1: BigBen @ PSC

BigBen @ PSC

Page 2: BigBen @ PSC

BigBen @ PSC

Page 3: BigBen @ PSC

BigBen @ PSC

Page 4: BigBen @ PSC

BigBen Features

Compute Nodes

• 2068 nodes running Catamount (QK) microkernel

• Seastar interconnect in a 3-D torus configuration

• No external connectivity (no TCP)

• All Inter-node communication is over Portals

• Applications use MPI which is based on Portals

Service & I/O Nodes (SIO) Nodes

• 22 nodes running Suse Linux

• Also on the Seastar interconnect

• SIO nodes can have PCI-X hardware installed, defining unique roles for each

• 2 SIO nodes are externally connected to ETF with 10GigE cards (currently)

Page 5: BigBen @ PSC

Portals Direct I/O (PDIO) Details

• Portals-to-TCP routing– PDIO daemons aggregate hundreds of portals data streams

into a configurable number of outgoing TCP streams– Heterogenous portals (both QK + Linux nodes)

• Explicit Parallelism– Configurable # of Portals receivers (on SIO nodes)

• Distributed across multiple 10GigE-connectedService & I/O (SIO) nodes

– Corresponding # of TCP streams (to the WAN)• one per PDIO daemon

– A Parallel TCP receiver in the Goodhue booth• Supports a variable/dynamic number of connections

Page 6: BigBen @ PSC

Portals Direct I/O (PDIO) Details

• Utilizing the ETF network– 10GigE end-to-end– Benchmarked >1Gbps in testing

• Inherent flow-control feedback to application– Aggregation protocol allows TCP transmission or even remote

file system performance to throttle the data streams coming out of the application (!)

• Variable message sizes and file metadata supported• Multi-threaded ring buffer in the PDIO daemon

– Allows the Portals receiver, TCP sender, and computation to proceed asynchronously

Page 7: BigBen @ PSC

Portals Direct I/O (PDIO) Config

• User-configurable/tunable parameters:– Network targets

• Can be different for each job

– Number of streams• Can be tuned for optimal host/network utilization

– TCP network buffer size• Can be tuned for maximum throughput over the WAN

– Ring buffer size/length• Controls total memory utilization of PDIO daemons

– Number of portals writers• Can be any subset of the running application’s processes

– Remote filename(s)• File metadata are propagated through the full chain, per write

Page 8: BigBen @ PSC

ETF

network

Compute Nodes

I/O NodesSteering

iGRIDPSC

HPC resource and rendererwaiting…

Page 9: BigBen @ PSC

pdiodpdiod

pdiodpdiod

pdiod

pdiod

recvrecv

recv

ETF

network

Compute Nodes

I/O NodesSteering

iGRIDPSC

Launch PPM job, PDIO daemons, and iGRID recv’ers

Page 10: BigBen @ PSC

pdiodpdiod

pdiodpdiod

pdiod

pdiod

recvrecv

recv

ETF

network

Compute Nodes

I/O NodesSteering

iGRIDPSC

Aggregate data via Portals

Page 11: BigBen @ PSC

pdiodpdiod

pdiodpdiod

pdiod

pdiod

recvrecv

recv

ETF

network

Compute Nodes

I/O NodesSteering

iGRIDPSC

Route traffic to ETF net

Page 12: BigBen @ PSC

pdiodpdiod

pdiodpdiod

pdiod

pdiod

recvrecv

recv

ETF

network

Compute Nodes

I/O NodesSteering

iGRIDPSC

Recv data @ iGRID

Page 13: BigBen @ PSC

pdiodpdiod

pdiodpdiod

pdiod

pdiod

recvrecv

recv

ETF

network

rend

er

Compute Nodes

I/O NodesSteering

iGRIDPSC

Render real-time data

Page 14: BigBen @ PSC

pdiodpdiod

pdiodpdiod

pdiod

pdiod

recvrecv

recv

ETF

network

rend

er

Compute Nodes

I/O NodesSteering

iGRIDPSC

Send steering data back toactive job

input

Page 15: BigBen @ PSC

pdiodpdiod

pdiodpdiod

pdiod

pdiod

recvrecv

recv

ETF

network

rend

er

Compute Nodes

I/O NodesSteering

iGRIDPSC

Dynamically update rendering

input