BigBen @ PSC
BigBen @ PSC
BigBen @ PSC
BigBen Features
Compute Nodes
• 2068 nodes running Catamount (QK) microkernel
• Seastar interconnect in a 3-D torus configuration
• No external connectivity (no TCP)
• All Inter-node communication is over Portals
• Applications use MPI which is based on Portals
Service & I/O Nodes (SIO) Nodes
• 22 nodes running Suse Linux
• Also on the Seastar interconnect
• SIO nodes can have PCI-X hardware installed, defining unique roles for each
• 2 SIO nodes are externally connected to ETF with 10GigE cards (currently)
Portals Direct I/O (PDIO) Details
• Portals-to-TCP routing– PDIO daemons aggregate hundreds of portals data streams
into a configurable number of outgoing TCP streams– Heterogenous portals (both QK + Linux nodes)
• Explicit Parallelism– Configurable # of Portals receivers (on SIO nodes)
• Distributed across multiple 10GigE-connectedService & I/O (SIO) nodes
– Corresponding # of TCP streams (to the WAN)• one per PDIO daemon
– A Parallel TCP receiver in the Goodhue booth• Supports a variable/dynamic number of connections
Portals Direct I/O (PDIO) Details
• Utilizing the ETF network– 10GigE end-to-end– Benchmarked >1Gbps in testing
• Inherent flow-control feedback to application– Aggregation protocol allows TCP transmission or even remote
file system performance to throttle the data streams coming out of the application (!)
• Variable message sizes and file metadata supported• Multi-threaded ring buffer in the PDIO daemon
– Allows the Portals receiver, TCP sender, and computation to proceed asynchronously
Portals Direct I/O (PDIO) Config
• User-configurable/tunable parameters:– Network targets
• Can be different for each job
– Number of streams• Can be tuned for optimal host/network utilization
– TCP network buffer size• Can be tuned for maximum throughput over the WAN
– Ring buffer size/length• Controls total memory utilization of PDIO daemons
– Number of portals writers• Can be any subset of the running application’s processes
– Remote filename(s)• File metadata are propagated through the full chain, per write
ETF
network
Compute Nodes
I/O NodesSteering
iGRIDPSC
HPC resource and rendererwaiting…
pdiodpdiod
pdiodpdiod
pdiod
pdiod
recvrecv
recv
ETF
network
Compute Nodes
I/O NodesSteering
iGRIDPSC
Launch PPM job, PDIO daemons, and iGRID recv’ers
pdiodpdiod
pdiodpdiod
pdiod
pdiod
recvrecv
recv
ETF
network
Compute Nodes
I/O NodesSteering
iGRIDPSC
Aggregate data via Portals
pdiodpdiod
pdiodpdiod
pdiod
pdiod
recvrecv
recv
ETF
network
Compute Nodes
I/O NodesSteering
iGRIDPSC
Route traffic to ETF net
pdiodpdiod
pdiodpdiod
pdiod
pdiod
recvrecv
recv
ETF
network
Compute Nodes
I/O NodesSteering
iGRIDPSC
Recv data @ iGRID
pdiodpdiod
pdiodpdiod
pdiod
pdiod
recvrecv
recv
ETF
network
rend
er
Compute Nodes
I/O NodesSteering
iGRIDPSC
Render real-time data
pdiodpdiod
pdiodpdiod
pdiod
pdiod
recvrecv
recv
ETF
network
rend
er
Compute Nodes
I/O NodesSteering
iGRIDPSC
Send steering data back toactive job
input
pdiodpdiod
pdiodpdiod
pdiod
pdiod
recvrecv
recv
ETF
network
rend
er
Compute Nodes
I/O NodesSteering
iGRIDPSC
Dynamically update rendering
input
Top Related