Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager [email protected]...

Planned Machines: ASCI Planned Machines: ASCI Purple, ALC and M&IC MCRPurple, ALC and M&IC MCR

Presented to SOS7Presented to SOS7

Mark SeagerMark Seager

[email protected]@llnl.gov

925-423-3141925-423-3141

ICCD ADH for Advanced TechnologyICCD ADH for Advanced Technology

Lawrence Livermore National LaboratoryLawrence Livermore National Laboratory

This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.

mailto:[email protected]

Q1: What is unique in structure and Q1: What is unique in structure and function of your machine?function of your machine?

Purple’s unique structure is fat SMPs with Purple’s unique structure is fat SMPs with 16 rails of Federation interconnect16 rails of Federation interconnect

MCR+ALC’s unique structure is the shared MCR+ALC’s unique structure is the shared global file systemglobal file system

However, most important point is that However, most important point is that applications are highly mobile between applications are highly mobile between Purple, MCR+ALC, White, Q and other Purple, MCR+ALC, White, Q and other clusters of SMP systems…clusters of SMP systems…

Purple’s unique structure is fat SMPs Purple’s unique structure is fat SMPs with 16 rails of interconnectwith 16 rails of interconnect

Purple’s unique structure is fat SMPs Purple’s unique structure is fat SMPs with 16 rails of interconnectwith 16 rails of interconnect

Purple System• 100 TF/s + 30-45 TF/s delivered on sPPM+UMT2000• 50 TB memory, 2.0 PB of disk @ 108 GB/s delivered• 197 x 64-way Armada SMP w 16 Federation Links• 4 Login/network nodes

• Login/network nodes for login/NFS• 8x10 Gb/s for parallel FTP on each Login• All external networking is 1-10 Gb/s Ethernet

• Clustered I/O services for cluster wide file system• Fibre Channel2 I/O attach does not extend

Programming/Usage Model• Application launch over all compute nodes up to 8,192 tasks• 1 MPI task/CPU and Shared Memory, full 64b support• Scalable MPI (MPI_allreduce, buffer space)• Likely usage

•multiple MPI tasks/node with 4-16 OpenMP/MPI task• Single STDIO interface• Parallel I/O to single file, multiple serial I/O (1 file/MPI task)

191 Parallel Batch/Interactive/Visualization Nodes

System Data and Control NetworksSystem Data and Control NetworksSystem Data and Control Networks

…

I/O … NFSLogin

LoginNet

NFSLogin

LoginNet

NFSLogin

LoginNet

NFSLogin

LoginNet

16 Federation links per SMP in four switch planes

I/O I/O I/O I/O I/O I/O I/O

Fibre Channel 2 I/O Network

Unique feature of ALC+MCR is Unique feature of ALC+MCR is Lustre Lite shared file systemLustre Lite shared file system††

†Cluster wide file system leverages DOE/NNSA ASCI PathForward Open Source Lustre development

OSTOST OST

OST OSTOST OST

OST OSTOST OST

OST OSTOST OST

OST

QsNet Elan3, 100BaseT Control

1,116 P4 Compute Nodes

2 Login nodes with 4 Gb-Enet

2 Service

Aggregated OSTfor Single Lustre file system

GW

2 MetaData (fail-over) Servers32 Gateway nodes @ 140 MB/s

delivered Lustre I/O over 2x1GbE GW GW GW GW GW GW GW

1,152 Port (10x96D32U+4x96D32U) QsNet Elan3

MDS MDS

2 Service

QsNet Elan3, 100BaseT Control

924 P4 Compute Nodes

960 Port (10x96D32U+4x80D48U) QsNet Elan3

GW GW GW GW GW GW GW GWMDS MDS

GbEnet Federated Switch

OSTOST

OST OSTOST OST

OST OSTOST OST

OST OSTOST OST

OSTOST

GbEnet Federated Switch

Q2: What characterizes your applications? Q2: What characterizes your applications? Examples are: Intensities of message passing, Examples are: Intensities of message passing, memory utilization, computing, IO, and data.memory utilization, computing, IO, and data.

Applications characterized as multi-physics Applications characterized as multi-physics package simulationspackage simulations

All applications compute/comms intensiveAll applications compute/comms intensive Each package pushes performance Each package pushes performance

envelope along a different dimensionenvelope along a different dimension– Some packages are MPI latency dominatedSome packages are MPI latency dominated– Some packages are MPI BW dominatedSome packages are MPI BW dominated– Memory BW is critical factor, but expensive Memory BW is critical factor, but expensive

memory subsystems don’t perform much better memory subsystems don’t perform much better than commodity ones…than commodity ones…

Q3: What prior experience guided Q3: What prior experience guided you to this choice?you to this choice?

Mission and ApplicationsMission and Applications BudgetsBudgets PoliticsPolitics Delivered performanceDelivered performance Balanced risk and cost performanceBalanced risk and cost performance

Strategic Approach: straddle multiple Strategic Approach: straddle multiple curves to balance risk and opportunity of curves to balance risk and opportunity of

new disruptive technologies new disruptive technologies Three complementary Three complementary

curves…curves…1.1. Delivers to today’s stockpile’s demanding Delivers to today’s stockpile’s demanding

needsneeds Production environmentProduction environment For “must have” deliverables nowFor “must have” deliverables now

2.2. Delivers transition for next generationDelivers transition for next generation ““Near production” environment Near production” environment Provides cycles for scienceProvides cycles for science Provides cycles for stockpileProvides cycles for stockpile Leading to next generation production Leading to next generation production

systemssystems These are the capacity systems in a These are the capacity systems in a

strategic capacity/capability mixstrategic capacity/capability mix

3.3. Delivers affordable path to petaFLOP/s Delivers affordable path to petaFLOP/s Research environment, leading transition to Research environment, leading transition to

petaflop systems?petaflop systems? Are there other paths to a Are there other paths to a breakthrough breakthrough

regimeregime by 2006-7? by 2006-7?

Performance

Time

Mainframes(RIP)

Vendor integrated SMP Cluster

(IBM SP, HP SC)

IA32/ IA64/AMD + Linux

Cell-Based(IBM BG/L)

Today FY05

Straddle strategyfor stability and preeminence

$2M/TF (Purple C)

$1.2M/TF(MCR)

$170K/TF

$10 M/TF (White)

$7M/TF (Q) $ 500K /TF

Any given technology curve is ultimately limited by Moore’s Law

Q4. Other than your own machine, for your needs Q4. Other than your own machine, for your needs what are the best and worst machines? And, why?what are the best and worst machines? And, why?

Clusters of SMPs with full node OS makes Clusters of SMPs with full node OS makes system administration and programming system administration and programming much easier, but scalability is an issuemuch easier, but scalability is an issue

Vectors suckVectors suck– 10x potential speed-up from vectorization on 10x potential speed-up from vectorization on

Cray YMP class machines yielded only 1.5-2x in Cray YMP class machines yielded only 1.5-2x in delivered performance boost to stockpile codesdelivered performance boost to stockpile codes

Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager [email protected]...

Documents

Transcript of Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager [email protected]...