Dr. Dong Chen IBM T.J. Watson Research Center Yorktown Heights, NY
description
Transcript of Dr. Dong Chen IBM T.J. Watson Research Center Yorktown Heights, NY
Dr. Dong ChenIBM T.J. Watson Research CenterYorktown Heights, NY
Overview of the Blue Gene supercomputers
Supercomputer trends Blue Gene/L and Blue Gene/P architecture Blue Gene applications
Terminology:
FLOPS = Floating Point Operations Per Second
Giga = 10^9, Tera = 10^12, Peta = 10^15, Exa = 10^18
Peak speed v.s. Sustained Speed
Top 500 list (top500.org):
Based on the Linpack Benchmark:
Solve dense linear matrix equation, A x = b
A is N x N dense matrix, total FP operations, ~ 2/3 N^3 + 2 N^2
Green 500 list (green500.org):
Rate Top 500 supercomputers in FLOPS/Watt
Supercomputer speeds over time
BG/QNext Gen Super
BG/PJaguarRoadrunner
TACCNASA Pleiades
Red Storm
BG/PBG/LSX-9
SX-8RBG/L
ASC PurpleNASA ColumbiaRed Storm
ThunderSX-8
ASCI QEarth Simulator
SX-6ASCI WhiteASCI Red
ASCI Blue MountainT3E
SR8000SX-5Blue Pacific
SX-4Red Option
CP-PACST3D
ParagonNWTCM-5
TestDeltaSX-3/44
i860(MPPs)Y-MPSX-2Cray 2
X-MPS810/20X-MPCyber 205
Cray 1
ILLIAV IVCDC Star 100
CDC7600CDC6600
IBM Stretch
IBM7090IBM704
IBM701
UnivacEniac (vacuum tubes)
1.00E+02
1.00E+05
1.00E+08
1.00E+11
1.00E+14
1.00E+17
1940 1950 1960 1970 1980 1990 2000 2010 2020
Year
Peak
Spe
ed (fl
ops)
© 2007 IBM Corporation4
CMOS Scaling in Petaflop Era
Three decades of exponential clock rate (and electrical power!) growth has endedInstruction Level Parallelism (ILP) growth has endedSingle threaded performance improvement is dead (Bill Dally)Yet Moore’s Law continues in transistor countIndustry response: Multi-core (i.e. double the number of cores every 18 months instead of the clock frequency (and power!)
Source: “The Landscape of Computer Architecture,” John Shalf, NERSC/LBNL, presented at ISC07, Dresden, June 25, 2007
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
Jun9
3
Nov
93
Jun9
4
Nov
94
Jun9
5
Nov
95
Jun9
6
Nov
96
Jun9
7
Nov
97
Jun9
8
Nov
98
Jun9
9
Nov
99
Jun0
0
Nov
00
Jun0
1
Nov
01
Jun0
2
Nov
02
Jun0
3
Nov
03
Jun0
4
Nov
04
Jun0
5
Nov
05
Jun0
6
Nov
06
Jun0
7
Nov
07
Jun0
8
Nov
08
Jun0
9
Nov
09
Jun1
0
Rm
ax P
erfo
rman
ce (
GF
lop
s)TOP500 Performance Trend
Over the long haul IBM has demonstrated continued leadership in various TOP500 metrics, even as the performance continues it’s relentless growth.
Total Aggregate Performance# 1
# 10# 500
Source: www.top500.org
Blue Square Markers Indicate IBM Leadership
IBM has most aggregate performance for last 22 listsIBM has #1 system for 10 out of last 12 lists (13 in total)IBM has most in Top10 for last 14 listsIBM has most systems 14 out of last 22 lists
32.43 PF
1.759 PF
433.2 TF
24.67 TF
President Obama Honors IBM's Blue Gene Supercomputer With National Medal Of Technology And InnovationNinth time IBM has received nation's most prestigious tech award Blue Gene has led to breakthroughs in science, energy efficiency and analytics
WASHINGTON, D.C. - 18 Sep 2009: President Obama recognized IBM (NYSE: IBM) and its Blue Gene family of supercomputers with the National Medal of Technology and Innovation, the country's most prestigious award given to leading innovators for technological achievement.President Obama will personally bestow the award at a special White House ceremony on October 7. IBM, which earned the National Medal of Technology and Innovation on eight other occasions, is the only company recognized with the award this year. Blue Gene's speed and expandability have enabled business and science to address a wide range of complex problems and make more informed decisions -- not just in the life sciences, but also in astronomy, climate, simulations, modeling and many other areas. Blue Gene systems have helped map the human genome, investigated medical therapies, safeguarded nuclear arsenals, simulated radioactive decay, replicated brain power, flown airplanes, pinpointed tumors, predicted climate trends, and identified fossil fuels – all without the time and money that would have been required to physically complete these tasks. The system also reflects breakthroughs in energy efficiency. With the creation of Blue Gene, IBM dramatically shrank the physical size and energy needs of a computing system whose processing speed would have required a dedicated power plant capable of generating power to thousands of homes. The influence of the Blue Gene supercomputer's energy-efficient design and computing model can be seen today across the Information Technology industry. Today, 18 of the top 20 most energy efficient supercomputers in the world are built on IBM high performance computing technology, according to the latest Supercomputing 'Green500 List' announced by Green500.org in July, 2009.
Blue Gene Roadmap
• BG/L (5.7 TF/rack) – 130nm ASIC (1999-2004GA)– 104 racks, 212,992 cores, 596 TF/s, 210 MF/W; dual-core system-on-chip, – 0.5/1 GB/node
• BG/P (13.9 TF/rack) – 90nm ASIC (2004-2007GA)– 72 racks, 294,912 cores, 1 PF/s, 357 MF/W; quad core SOC, DMA– 2/4 GB/node– SMP support, OpenMP, MPI
• BG/Q (209 TF/rack) – 20 PF/s
IBM Blue Gene/P Solution: Expanding the Limits of Breakthrough Science
IBM® System Blue Gene®/P Solution © 2007 IBM Corporation
Blue Gene Technology Roadmap
Blue Gene/QPower Multi Core
Scalable to 100 PF
Per
form
ance
2004 2010
Blue Gene/P(PPC 450 @ 850MHz)Scalable to 3.56 PF
2007
Blue Gene/L(PPC 440 @ 700MHz)
Scalable to 595 TFlops
Note: All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
© 2007 IBM Corporation
BlueGene/L System Buildup
2.8/5.6 GF/s4 MB
2 processors
2 chips, 1x2x1
5.6/11.2 GF/s2.0 GB
(32 chips 4x4x2)16 compute, 0-2 IO cards
90/180 GF/s32 GB
32 Node Cards
2.8/5.6 TF/s1 TB
64 Racks, 64x32x32
180/360 TF/s64 TB
Rack
System
Node Card
Compute Card
Chip
IBM System Blue Gene®/P Solution
BlueGene/L Compute ASIC
PLB (4:1)
“Double FPU”
Ethernet Gbit
JTAGAccess
128 +16 ECC DDR512/1024MB
JTAG
Gbit Ethernet
440 CPU
440 CPUI/O proc
L2
L2
MultiportedSharedSRAM Buffer
Torus
DDR Control with ECC
SharedL3 directoryfor EDRAM
Includes ECC
4MB EDRAM
L3 CacheorMemory
6 out and6 in, each at 1.4 Gbit/s link
256
256
1024+144 ECC256
128
128
32k/32k L1
32k/32k L1
“Double FPU”
256
snoop
Collective
3 out and3 in, each at 2.8 Gbit/s link
GlobalInterrupt
4 global barriers orinterrupts
128
© 2006 IBM Corporation
IBM Research | BlueGene Systems
Double Floating-Point Unit
Quadword Store Data
Quadword Load Data
SecondaryFPR
S0
S31
PrimaryFPR
P0
P31
– Two replicas of a standard single-pipe PowerPC FPU
–2 x 32 64-bit registers– Attached to the PPC440 core
using the APU interface–Issues instructions across APU interface–Instruction decode performed in Double FPU –Separate APU interface from LSU to provide up to 16B data for load and store –Datapath width is 16 bytes–Feeds two FPUs with 8 bytes each every cycle
– Two FP multiply-add operations per cycle
–2.8 GF/s peak
Memory: Node System (64k nodes)L1 32kB/32kBL2 2kB per processorSRAM 16kBL3 4MB (ECC)/nodeMain store 512MB (ECC)/node 32TB
Bandwidth:L1 to Registers 11.2 GB/s Independent R/W and InstructionL2 to L1 5.3 GB/s Independent R/W and InstructionL3 to L2 11.2 GB/sMain (DDR) 5.3GB/s
Latency:L1 miss, L2 hit 13 processor cycles (pclks)L2 miss, L3 hit 28 pclks (EDRAM page hit/EDRAM page miss)L2 miss (main store) 75 pclks for DDR closed page access (L3
disabled/enabled)
Blue Gene L/Memory Charateristics
Blue Gene Interconnection Networks
3 Dimensional Torus– Interconnects all compute nodes (65,536)– Virtual cut-through hardware routing– 1.4Gb/s on all 12 node links (2.1 GB/s per node)– Communications backbone for computations– 0.7/1.4 TB/s bisection bandwidth, 67TB/s total bandwidth
Global Collective Network– One-to-all broadcast functionality– Reduction operations functionality– 2.8 Gb/s of bandwidth per link; Latency of tree traversal
2.5 µs– ~23TB/s total binary tree bandwidth (64k machine)– Interconnects all compute and I/O nodes (1024)
Low Latency Global Barrier and Interrupt– Round trip latency 1.3 µs
Control Network– Boot, monitoring and diagnostics
Ethernet– Incorporated into every node ASIC– Active in the I/O nodes (1:64)– All external comm. (file I/O, control, user interaction, etc.)
BlueGene/P
13.6 GF/s8 MB EDRAM
4 processors
1 chip, 20 DRAMs
13.6 GF/s2.0 GB DDR2
(4.0GB 6/30/08)
32 Node Cards
13.9 TF/s2 (4) TB
72 Racks, 72x32x32
1 PF/s144 (288) TB
Cabled 8x8x16Rack
System
Compute Card
Chip
435 GF/s64 (128) GB
(32 chips 4x4x2)32 compute, 0-1 IO cards
Node Card
JTAG 10 Gb/s
256
256
32k I1/32k D132k I1/32k D1
PPC450PPC450
Double FPUDouble FPU
Ethernet10 Gbit
Ethernet10 GbitJTAG
Access
JTAGAccess Collective
CollectiveTorus
Torus GlobalBarrier
GlobalBarrier
DDR-2Controllerw/ ECC
DDR-2Controllerw/ ECC
32k I1/32k D132k I1/32k D1
PPC450PPC450
Double FPUDouble FPU
4MBeDRAM
L3 Cacheor
On-ChipMemory
4MBeDRAM
L3 Cacheor
On-ChipMemory
6 3.4Gb/sbidirectional
4 globalbarriers orinterrupts
128
32k I1/32k D132k I1/32k D1
PPC450PPC450
Double FPUDouble FPU
32k I1/32k D132k I1/32k D1
PPC450PPC450
Double FPUDouble FPU L2
L2
Snoop filter
Snoop filter
4MBeDRAM
L3 Cacheor
On-ChipMemory
4MBeDRAM
L3 Cacheor
On-ChipMemory
512b data 72b ECC
128
L2L2
Snoop filter
Snoop filter
128
L2L2
Snoop filter
Snoop filter
128
L2L2
Snoop filter
Snoop filter
Mu
ltiple
xing switch
Mu
ltiple
xing switch
DMADMA
Mu
ltiple
xing switch
Mu
ltiple
xing switch
3 6.8Gb/sbidirectional
DDR-2Controllerw/ ECC
DDR-2Controllerw/ ECC
13.6 Gb/sDDR-2 DRAM bus
32
SharedSRAM
SharedSRAM
snoop
Hybrid PMU
w/ SRAM256x64b
Hybrid PMU
w/ SRAM256x64b
BlueGene/P compute ASIC
Shared L3 Directory
for eDRAM
w/ECC
Shared L3 Directory
for eDRAM
w/ECC
Shared L3 Directory
for eDRAM
w/ECC
Shared L3 Directory
for eDRAM
w/ECC
ArbArb
512b data 72b ECC
Memory: Node
L1 32kB/32kB
L2 2kB per processorL3 8MB (ECC)/nodeMain store 2-4GB (ECC)/node
Bandwidth:L1 to Registers 6.8 GB/s instruction Read
6.8 GB/s data Read 6.8 GB/s Write
L2 to L1 5.3 GB/s Independent R/W and InstructionL3 to L2 13.6 GB/sMain (DDR) 13.6 GB/s
Latency:L1 hit 3 processor cycles (pclks)L1 miss, L2 hit 13 pclksL2 miss, L3 hit 46 pclks (EDRAM page hit/EDRAM page miss)L2 miss (main store) 104 pclks for DDR closed page access (L3
disabled/enabled)
Blue Gene/P Memory Characteristics
BlueGene/P Interconnection Networks
3 Dimensional Torus Interconnects all compute nodes (73,728) Virtual cut-through hardware routing 3.4 Gb/s on all 12 node links (5.1 GB/s per node) 0.5 µs latency between nearest neighbors, 5 µs to the
farthest MPI: 3 µs latency for one hop, 10 µs to the farthest Communications backbone for computations 1.7/3.9 TB/s bisection bandwidth, 188TB/s total bandwidth
Collective Network One-to-all broadcast functionality Reduction operations functionality 6.8 Gb/s of bandwidth per link per direction Latency of one way tree traversal 1.3 µs, MPI 5 µs ~62TB/s total binary tree bandwidth (72k machine) Interconnects all compute and I/O nodes (1152)
Low Latency Global Barrier and Interrupt Latency of one way to reach all 72K nodes 0.65 µs,
MPI 1.6 µs
0.21
0.37
0.15
0.08
0.05
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
BG/L BG/P SGI8200
HPCluster
CraySandia
CrayORNL
CrayNERSC
JS21BSC
November 2007 Green 500L
inp
ac
k G
FL
OP
S/W
0.05
0.02
0.09
IBM System Blue Gene®/P Solution
Relative power, space and cooling efficiencies(Published specs per peak performance)
0%
100%
200%
300%
400%
Racks/TF kW/TF Sq Ft/TF Tons/TF
Sun/Constellation Cray/XT4 SGI/ICE
IBM BG/P
0.23
0.370.25
0.44
0.25
0.635
0.829
0.958
1.68
0.000.200.400.600.801.001.201.401.601.80
BG/L2005
BG/P2007
SGINASA-AMES2010
RoadRunner
2008
CrayXT52009
TianHe-1A2010
FujitsuK 2010
Titech2010
BG/QProt2010
System Power Efficiency L
inp
ac
k G
F/W
att
Source: www.top500.org
HPCC 2009
IBM BG/P 0.557 PF peak (40 racks) Class 1: Number 1 on G-Random Access (117 GUPS) Class 2: Number 1
Cray XT5 2.331 PF peak Class 1: Number 1 on G-HPL (1533 TF/s) Class 1: Number 1 on EP-Stream (398 TB/s) Number 1 on G-FFT (11 TF/s)
Source: www.top500.org
Main Memory Capacity per Rack
0500
10001500200025003000350040004500
LRZIA64
CrayXT4
ASCPurple
RR BG/P SunTACC
SGIICE
Peak Memory Bandwidth per node (byte/flop)
0 0.5 1 1.5 2
BG/P 4 core
Roadrunner
Cray XT3 2 core
Cray XT5 4 core
POWER5
Itanium 2
Sun TACC
SGI ICE
Main Memory Bandwidth per Rack
0
2000
4000
6000
8000
10000
12000
14000
LRZItanium
Cray XT5
ASCPurple
RR BG/P SunTACC
SGI ICE
Interprocessor Peak Bandwidth per node (byte/flop)
0 0.2 0.4 0.6 0.8
BG/L,P
Cray XT5 4c
Cray XT4 2c
NEC ES
Power5
Itanium 2
Sun TACC
x86 cluster
Dell Myrinet
Roadrunner
Failures per Month per TFFrom:http://acts.nersc.gov/events/Workshop2006/slides/Simon.pdf
IBM System Blue Gene®/P Solution © 2007 IBM Corporation
Execution Modes in BG/P per Node
Hardware Abstractions BlackSoftware Abstractions Blue
node
core
core core
core
P0
T0
T1
T2
P0
T0
T1 T3
T2
P0
T0
T1
P0
T0
SMP Mode1 Process
1-4 Threads/Process
P0
T0
T1 T1
T0
P1
P0
T0
T0
P1
Dual Mode2 Processes
1-2 Threads/Process
P0
T0
T0
Quad Mode (VNM)4 Processes
1 Thread/Process
P1
P2
T0
T0
P3
Next Generation HPC– Many Core
– Expensive Memory
– Two-Tiered Programming Model
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
Blue Gene Software Hierarchical Organization
Compute nodes dedicated to running user application, and almost nothing else - simple compute node kernel (CNK)
I/O nodes run Linux and provide a more complete range of OS services – files, sockets, process launch, signaling, debugging, and termination
Service node performs system management services (e.g., partitioning, heart beating, monitoring errors) - transparent to application software
Front-end nodes, file system
10 Gb Ethernet
1 Gb Ethernet
Noise measurements (from Adolphy Hoisie)
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
Blue Gene/P System Architecture
Functional Ethernet
(10Gb)
Functional Ethernet
(10Gb)
I/O Node
Linux
ciod
C-Node 0
CNK
I/O Node
Linux
ciod
C-Node 0
CNK
C-Node n
CNK
C-Node n
CNK
Control Ethernet
(1Gb)
Control Ethernet
(1Gb)
FPGA
LoadLeveler
SystemConsole
MMCS
JTAG
torus
tree
DB2
Front-endNodes
I2C
FileServers
fs client
fs client
Service Node
app app
appapp
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
BG/P Software Stack Source AvailabilityA
pp
licat
ion
Sys
tem
Fir
mw
are
Har
dw
are
ESSL
MPI GPSHMEMGA
XL RuntimeOpen Toolchain Runtime
Message Layer
CNK
CIOD
Linux kernel
MPI-IO
Messaging SPIs
Common Node Services
Hardware init, RAS, Recovery, MailboxDiags
Compute node
I/O node
Use
r/S
ched
Sys
tem
Fir
mw
are
Har
dw
are
Link card
SNFEN
FEN
Low Level Control SystemPower On/Off, Hardware probe,
Hardware init, Parallel monitoringParallel boot, Mailbox
High Level Control System (MMCS)Partitioning, Job management and
monitoring,RAS, Administrator interfaces, CIODB
ISVSchedulers, debuggers
Link card
Service card
Node cardNode card
Compute nodeCompute node
Compute nodeCompute node
Compute nodeCompute node
I/O node
Bootloader
Node SPIs
totalviewd
New open source community under CPL license. Active IBM participation.
Key:
Closed. No source provided. Not buildable. Closed. Buildable source. No redistribution of derivative works allowed under license.
Existing open source communities under various licenses. BG code will be contributed and/or new sub-community started..
DB2
CSM
Loadleveler
GPFS (1)
PerfMon
mpirun Bridge API
BG Nav
New open source reference implementation licensed under CPL.
I/O and Compute Nodes Service Node/Front End Nodes
Notes:
1. GPFS does have an open build license available which customers may utilize.
HPC Toolkit
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
Areas Where BG is Used
Weather/Climate Modeling (GOVERNMENT / INDUSTRY / UNIVERSITIES)
Computational Fluid Dynamics – Airplane and Jet Engine Design, Chemical Flows, Turbulence (ENGINEERING / AEROSPACE)
Seismic Processing : (PETROLEUM, Nuclear industry) Particle Physics : (LATTICE Gauge QCD) Systems Biology – Classical and Quantum Molecular Dynamics
(PHARMA / MED INSURANCE / HOSPITALS / UNIV) Modeling Complex Systems
(PHARMA / BUSINESS / GOVERNMENT / UNIVERSITIES) Large Database Search Nuclear Industry Astronomy
(UNIVERSITIES) Portfolio Analysis via Monte Carlo
(BANKING / FINANCE / INSURANCE)
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
LLNL Applications
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
IDC Technical Computing Systems ForecastBio Sci Genomics, proteomics, pharmacogenomics, pharma research, bioinformatics, drug discovery.
Chem Eng Chemical Engineering: Molecular modeling, computational chemistry, process design
CAD Mechanical CAD, 3D Wireframe – mostly graphics
CAE Computer Aided Engineering – Finite Element modeling, CFD, crash, solid modeling (Cars, Aircraft, …)
DCC&D Digital Content Creation and Distribution
Econ Fin Economic and Financial Modeling, econometric modeling, portfolio management, stock market modeling.
EDA Electronic Design and Analysis: schematic capture, logic synthesis, circuit simulation, system modeling
Geo Sci Geo Sciences and Geo Engineering: seismic analysis, oil services, reservoir modeling.
Govt Lab Government Labs and Research Centers: government-funded R&D
Defense Surveillance, Signal Processing, Encryption, Command, Control, Communications, Intelligence, Geospatial Image Management. Weapon Design
Software Engineering Development and Testing of Technical Applications
Technical Management Product Data management, Maintenance Records management, Revision Control, Configuration Management
Academic University Based R&D
Weather Atmospheric Modeling, Meteorology, Weather Forecasting
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
Materials Science
Climate Modeling
Genome Sequencing BiologicalModeling
Pandemic Research
Fluid Dynamics
Drug Discovery
Geophysical Data Processing
What is driving the need for more HPC cycles?
Financial Modeling
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
HPC Use Cases
Capability– Calculations not possible on small machines – Usually these calculations involve systems where many disparate
scales are modeled.– One scale defines required work per “computation step”– A different scale determines total time to solution.
Complexity– Calculations which seek to combine multiple components to
produce an integrated model of a complex system. – Individual components can have significant computational
requirements.– Coupling between components requires that all components be
modeled simultaneously.– As components are modeled, changes in interfaces are
constantly transferred between the components
Understanding– Repetition of a basic calculation many times with different model
parameters, inputs and boundary conditions.– Goal is to develop a clear understanding of behavior /
dependencies / and sensitivities of the solution over a range of parameters
Examples– Protein Folding:
• 10-15.secs – 1 sec– Refined grids in Weather forecasting:
• 10km today -> 1km in a few years– Full Simulation of Human Brain
Examples– Water Cycle Modeling in Climate/Environment
– Geophysical Modeling for Oil Recovery
– Virtual Fab
– Multisystem / Coupled Systems Modeling
Examples– Multiple independent simulations of Hurricane
paths to develop probability estimates of possible paths, possible strength,
– Thermodynamics of Protein / Drug Interactions
– Sensitivity Analysis in Oil Reservoir Modeling
– Optimization of Aircraft Wing Design,
Useful as proofs of concept
Critical to manage multiple scales in physical systems
Essential to develop parameter understanding, and sensitivity analysis
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
Capability
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
Complexity: Modern Integrated Water Management
– Climatologists– Environmental Observation Systems Companies– Sensors Companies– Environmental Sciences Consultants– Engineering Services Companies.– Subject Matter Experts– Universities
– Physical– Chemical– Biological– Environmental– In-situ– Remotely sensed– Planning and placement
– Climate– Hydrological – Meteorological– Ecological
– Stochastic model & stats– Machine learning– Optimization
– Selection– Integration & coupling– Validation– Temporal/spatial scales
– HPC– Visualization– Data management
Historical – Present – Near future – Seasonal – Long term – Far future
Physical Models
PartnerEcosystem
Adv Water MgmtReference ITArchitecture
AnalysesModel Strategy
Enabling IT
Sensors
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
Overall Efficiencies of BG Applications - Major Scientific Advances
1. Qbox (DFT) LLNL: 56.5%; 2006 Gordon-Bell Award 64 L racks, 16 PCPMD IBM: 30% highest scaling 64 LMGDC highest scaling 32 P
2. ddcMD (Classical MD) LLNL: 27.6% 2005 Gordon-Bell Award 64 LNew ddcMD LLNL: 17.4% 2007 Gordon-Bell Award 104 LMDCASK LLNL, SPaSM LANL: highest scaling 64 LLAMMPS SNL: highest scaling 64 L, 32 PRXFF, GMD: highest scaling 64 LRosetta UW: highest scaling 20 LAMBER 4 L
3. Quantum Chromodynamics CPS: 30%; 2006 GB Special Award 64L, 32PMILC, Chroma 32 P
4. sPPM (CFD) LLNL: 18%; highest scaling 64 LMiranda, Raptor LLNL: highest scaling 64 LDNS3D highest scaling 32 PNEK5 (Thermal Hydraulics) ANL: 22% 32 PHYPO4D, PLB (Lattice Boltzmann) 32 P
5. ParaDis (dislocation dynamics) LLNL: highest scaling 64 L
6. WRF (Weather) NCAR: 10%; highest scaling 64 LPOP (Oceanography): highest scaling 8 LHOMME (Climate) NCAR: 12%; highest scaling 32 L, 24Ki P
7. GTC (Plasma Physics) PPPL: 7%; highest scaling 20 L, 32 PNimrod GA: 17%
8. FLASH (Supernova Ia) highest scaling 64 L, 40 PCactus (General Relativity) highest scaling 16 L, 32 P
9. DOCK5, DOCK6 highest scaling 32 P
10. Argonne v18 Nuclear Potential 16% 2010 Bonner Prize 32 P
11. “Cat” Brain 2009 GB Special Award 36 P
BG/P Software Overview | IBM Confidential © 2007 IBM Corporation
High Performance Computing Trends Three distinct phases .
– Past: Exponential growth in processor performance mostly through CMOS technology advances– Near Term: Exponential (or faster) growth in level of parallelism.– Long Term: Power cost = System cost ; invention required
Curve is not only indicative of peak performance but also performance/$
Supercomputer Peak Performance
1940 1950 1960 1970 1980 1990 2000 2010
Year Introduced
1E+2
1E+5
1E+8
1E+11
1E+14
1E+17
Pea
k S
pee
d (
flo
ps)
Doubling time = 1.5 yr.
ENIAC (vacuum tubes)UNIVAC
IBM 701IBM 704
IBM 7090 (transistors)
IBM StretchCDC 6600 (ICs)
CDC 7600CDC STAR-100 (vectors) CRAY-1
Cyber 205 X-MP2 (parallel vectors)
CRAY-2X-MP4 Y-MP8
i860 (MPPs)
ASCI White
Blue Gene/L
Blue Pacific
DeltaCM-5 Paragon
NWT
ASCI Red OptionASCI Red
CP-PACS
Earth
VP2600/10SX-3/44
Red Storm
ILLIAC IV
SX-2
SX-4
SX-5
S-810/20
T3D
T3E
ASCI Purple
2020
Blue Gene/P
Blue Gene/Q
Past
Near Term
Long Term
1PF: 2008
10PF: 2011