Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager [email protected]...
-
Upload
mariah-hunt -
Category
Documents
-
view
218 -
download
3
Transcript of Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager [email protected]...
Planned Machines: ASCI Planned Machines: ASCI Purple, ALC and M&IC MCRPurple, ALC and M&IC MCR
Presented to SOS7Presented to SOS7
Mark SeagerMark Seager
[email protected]@llnl.gov
925-423-3141925-423-3141
ICCD ADH for Advanced TechnologyICCD ADH for Advanced Technology
Lawrence Livermore National LaboratoryLawrence Livermore National Laboratory
This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.
Q1: What is unique in structure and Q1: What is unique in structure and function of your machine?function of your machine?
Purple’s unique structure is fat SMPs with Purple’s unique structure is fat SMPs with 16 rails of Federation interconnect16 rails of Federation interconnect
MCR+ALC’s unique structure is the shared MCR+ALC’s unique structure is the shared global file systemglobal file system
However, most important point is that However, most important point is that applications are highly mobile between applications are highly mobile between Purple, MCR+ALC, White, Q and other Purple, MCR+ALC, White, Q and other clusters of SMP systems…clusters of SMP systems…
Purple’s unique structure is fat SMPs Purple’s unique structure is fat SMPs with 16 rails of interconnectwith 16 rails of interconnect
Purple’s unique structure is fat SMPs Purple’s unique structure is fat SMPs with 16 rails of interconnectwith 16 rails of interconnect
Purple System• 100 TF/s + 30-45 TF/s delivered on sPPM+UMT2000• 50 TB memory, 2.0 PB of disk @ 108 GB/s delivered• 197 x 64-way Armada SMP w 16 Federation Links• 4 Login/network nodes
• Login/network nodes for login/NFS• 8x10 Gb/s for parallel FTP on each Login• All external networking is 1-10 Gb/s Ethernet
• Clustered I/O services for cluster wide file system• Fibre Channel2 I/O attach does not extend
Programming/Usage Model• Application launch over all compute nodes up to 8,192 tasks• 1 MPI task/CPU and Shared Memory, full 64b support• Scalable MPI (MPI_allreduce, buffer space)• Likely usage
•multiple MPI tasks/node with 4-16 OpenMP/MPI task• Single STDIO interface• Parallel I/O to single file, multiple serial I/O (1 file/MPI task)
191 Parallel Batch/Interactive/Visualization Nodes
System Data and Control NetworksSystem Data and Control NetworksSystem Data and Control Networks
…
I/O … NFSLogin
LoginNet
NFSLogin
LoginNet
NFSLogin
LoginNet
NFSLogin
LoginNet
16 Federation links per SMP in four switch planes
I/O I/O I/O I/O I/O I/O I/O
Fibre Channel 2 I/O Network
Unique feature of ALC+MCR is Unique feature of ALC+MCR is Lustre Lite shared file systemLustre Lite shared file system††
†Cluster wide file system leverages DOE/NNSA ASCI PathForward Open Source Lustre development
OSTOST OST
OST OSTOST OST
OST OSTOST OST
OST OSTOST OST
OST
QsNet Elan3, 100BaseT Control
1,116 P4 Compute Nodes
2 Login nodes with 4 Gb-Enet
2 Service
Aggregated OSTfor Single Lustre file system
GW
2 MetaData (fail-over) Servers32 Gateway nodes @ 140 MB/s
delivered Lustre I/O over 2x1GbE GW GW GW GW GW GW GW
1,152 Port (10x96D32U+4x96D32U) QsNet Elan3
MDS MDS
2 Service
QsNet Elan3, 100BaseT Control
924 P4 Compute Nodes
960 Port (10x96D32U+4x80D48U) QsNet Elan3
GW GW GW GW GW GW GW GWMDS MDS
GbEnet Federated Switch
OSTOST
OST OSTOST OST
OST OSTOST OST
OST OSTOST OST
OSTOST
GbEnet Federated Switch
Q2: What characterizes your applications? Q2: What characterizes your applications? Examples are: Intensities of message passing, Examples are: Intensities of message passing, memory utilization, computing, IO, and data.memory utilization, computing, IO, and data.
Applications characterized as multi-physics Applications characterized as multi-physics package simulationspackage simulations
All applications compute/comms intensiveAll applications compute/comms intensive Each package pushes performance Each package pushes performance
envelope along a different dimensionenvelope along a different dimension– Some packages are MPI latency dominatedSome packages are MPI latency dominated– Some packages are MPI BW dominatedSome packages are MPI BW dominated– Memory BW is critical factor, but expensive Memory BW is critical factor, but expensive
memory subsystems don’t perform much better memory subsystems don’t perform much better than commodity ones…than commodity ones…
Q3: What prior experience guided Q3: What prior experience guided you to this choice?you to this choice?
Mission and ApplicationsMission and Applications BudgetsBudgets PoliticsPolitics Delivered performanceDelivered performance Balanced risk and cost performanceBalanced risk and cost performance
Strategic Approach: straddle multiple Strategic Approach: straddle multiple curves to balance risk and opportunity of curves to balance risk and opportunity of
new disruptive technologies new disruptive technologies Three complementary Three complementary
curves…curves…1.1. Delivers to today’s stockpile’s demanding Delivers to today’s stockpile’s demanding
needsneeds Production environmentProduction environment For “must have” deliverables nowFor “must have” deliverables now
2.2. Delivers transition for next generationDelivers transition for next generation ““Near production” environment Near production” environment Provides cycles for scienceProvides cycles for science Provides cycles for stockpileProvides cycles for stockpile Leading to next generation production Leading to next generation production
systemssystems These are the capacity systems in a These are the capacity systems in a
strategic capacity/capability mixstrategic capacity/capability mix
3.3. Delivers affordable path to petaFLOP/s Delivers affordable path to petaFLOP/s Research environment, leading transition to Research environment, leading transition to
petaflop systems?petaflop systems? Are there other paths to a Are there other paths to a breakthrough breakthrough
regimeregime by 2006-7? by 2006-7?
Performance
Time
Mainframes(RIP)
Vendor integrated SMP Cluster
(IBM SP, HP SC)
IA32/ IA64/AMD + Linux
Cell-Based(IBM BG/L)
Today FY05
Straddle strategyfor stability and preeminence
$2M/TF (Purple C)
$1.2M/TF(MCR)
$170K/TF
$10 M/TF (White)
$7M/TF (Q) $ 500K /TF
Any given technology curve is ultimately limited by Moore’s Law
Q4. Other than your own machine, for your needs Q4. Other than your own machine, for your needs what are the best and worst machines? And, why?what are the best and worst machines? And, why?
Clusters of SMPs with full node OS makes Clusters of SMPs with full node OS makes system administration and programming system administration and programming much easier, but scalability is an issuemuch easier, but scalability is an issue
Vectors suckVectors suck– 10x potential speed-up from vectorization on 10x potential speed-up from vectorization on
Cray YMP class machines yielded only 1.5-2x in Cray YMP class machines yielded only 1.5-2x in delivered performance boost to stockpile codesdelivered performance boost to stockpile codes