Post on 15-Mar-2016
description
What Supercomputers Still Can’t Do – a Reflection on the State of the Art in CSEHorst D. SimonAssociate Laboratory Director, Computing SciencesDirector, NERSC
CIS’04 Shanghai, P.R. ChinaDecember 16, 2004http://www.nersc.gov/~simon
Overview
• Introducing NERSC and Computing Sciences at Berkeley Lab
• Current Trends in Supercomputing (High-End Computing)
• What Supercomputers Do
• What Supercomputers Can’t Do
NERSC Serves the Scientific Community
NERSC Center Overview
• Funded by DOE, annual budget $38M, about 60 staff– Traditional strategy to invest equally in newest
compute platform, staff, and other resources• Supports open, unclassified, basic research• Close collaborations between university and NERSC
in computer science and computational science
NERSC System Architecture
SYMBOLICMANIPULATION
SERVER
ETHERNET10/100 Megabit
FC Disk
STKRobots
ESnet
HPSS
Gigabit EthernetJumbo Gigabit Ethernet
SGI
HPSS
OC 48 – 2400 Mbps
HPPS12 IBM SP servers
15 TB of cache disk, 8 STK robots, 44,000 tape slots, 20
200 GB drives, 60 20 GB drives,max capacity 5-8 PB
PDSF400 processors
(Peak 375 GFlop/s)/ 360
GB of Memory/ 35 TB of
Disk/Gigabit and Fast EthernetRatio = (1,93)
IBM SPNERSC-3 – “Seaborg”
6,656 Processors (Peak 10 TFlop/s)/ 7.8 Terabyte
Memory/44Terabytes of Disk Ratio = (8,7)
LBNL “Alvarez” Cluster174 processors (Peak
150 GFlop/s)/87 GB of Memory/1.5
terabytes of Disk/ Myrinet 2000
Ratio - (.6,100)
Ratio = (RAM Bytes per Flop, Disk Bytes per Flop)
Testbeds and servers
Visualization Server – “escher”SGI Onyx 3400 – 12 Processors/
2 Infinite Reality 4 graphics pipes 24 Gigabyte Memory/4Terabytes
Disk
NERSC Capability Plan
2005 2006 2007 2008 20090
10203040
Tera
Flop
s
NERSC 3NCSNCSbNERSC 5LNERSC 6LCluster
5060708090
100
Year
13
81
248
110120130140150160170
2010
81
229 202
Overview
• Introducing NERSC and Computing Sciences at Berkeley Lab
• Current Trends in Supercomputing (High-End Computing)
• What Supercomputers Do
• What Supercomputers Can’t Do
Technology Trends: Microprocessor Capability
2X transistors/chip every 1.5 years Called “Moore’s Law”
Moore’s Law
Microprocessors have become smaller, denser, and more powerful.
Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.
Slide source: Jack Dongarra
1.127 PF/s
1.167 TF/s
59.7 GF/s
70.72 TF/s
0.4 GF/s
850 GF/s
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Fuj itsu'NWT' NAL
NECEarth Simulator
Intel ASCI RedSandia
IBM ASCI WhiteLLNL
N=1
N=500
SUM
1 Gflop/ s
1 Tflop/ s
100 Mflop/ s
100 Gflop/ s
100 Tflop/ s
10 Gflop/ s
10 Tflop/ s
1 Pflop/ s IBMBlueGene/ L
TOP 500 Performance Development
My Laptop
TOP 500 Performance Projection
1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015
N=1
N=500
SUM
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
10 Pflop/s
1 Eflop/s
100 Pflop/s
DARPA HPCS
Asian Countries
0
50
100
150
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Others
India
China
Korea, South
J apan
• Microprocessors have made desktop computing in 2004 what supercomputing was in 1993.
• Massive Parallelism has changed the “high end” completely.
• Today clusters of Symmetric Multiprocessors are the standard supercomputer architecture.
• The microprocessor revolution will continue with little attenuation for at least another 10 years.
• Continued discussion over architecture for High-End Computing (custom versus commodity).
Supercomputing Today
Overview
• Introducing NERSC and Computing Sciences at Berkeley Lab
• Current Trends in Supercomputing (High-End Computing)
• What Supercomputers Do
• What Supercomputers Can’t Do
What Supercomputers Do
- Introducing Computational Science and Engineering (CSE)
- Four important important observations about CSE illustrated by examples from NERSC
Simulation: The Third Pillar of Science
• Traditional scientific and engineering paradigm:(1) Do theory or paper design.(2) Perform experiments or build system.
• Limitations: – Too difficult—build large wind tunnels. – Too expensive—build a throw-away passenger jet. – Too slow—wait for climate or galactic evolution. – Too dangerous—weapons, drug design, climate
experimentation.
• Computational science paradigm:(3) Use high performance computer systems
to simulate the phenomenon• Based on known physical laws and efficient numerical
methods.
Computational Science – Third Pillar of Science
SubsurfaceTransport
Many programs in DOEneed dramatic advancesin simulation capabilities
to meet theirmission goals-
SciDAC program created in 2001
Health Effects, Bioremediation
Combustion
Materials
Fusion Energy
Componentsof Matter
GlobalClimate
Computational Science and Engineering (CSE)
• CSE is a widely accepted label for an evolving field concerned with the science of and the engineering of systems and methodologies to solve computational problems arising throughout science and engineering
• CSE is characterized by– Multi - disciplinary– Multi - institutional– Requiring high end resources– Large teams– Focus on community software
• CSE is not “just programming” (and not CS)• Teraflop/s computing is necessary but not sufficientReference: Petzold, L., et al., Graduate Education in CSE, SIAM Rev., 43(2001), 163-
177
First Observation about CSE
1. CSE permits us to ask new scientific questions
• The increased computational capability available today lets us do more of the same (scaling to larger problems, more refinement etc.) ,
• but it is most effectively used, when addressing qualitatively new science questions.
High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL
Wintertime PrecipitationAs model resolution becomes finer,
results converge towards observations
Tropical Cyclones and Hurricanes
Research by: Michael Wehner, Berkeley Lab, Phil Duffy, and G. Bala, LLNL
• Hurricanes are extreme events with large impacts on human and natural systems
• Characterized by high vorticity (winds), very low pressure centers, and upper air temperature warm anomalies
• Wind speeds on the Saffir-Simpson Hurricane Scale– Category one: 74-95 mph (64-82 kt or 119-153 km/hr)– Category two: 96-110 mph (83-95 kt or 154-177 km/hr)– Category three: 111-130 mph (96-113 kt or 178-209 km/hr) – Category four: 131-155 mph (114-135 kt or 210-249 km/hr)– Category five: >155 mph (135 kt or 249 km/hr).
How will the hurricane cycle change as the mean climate changes?
Tropical Cyclones in Climate Models
• Tropical cyclones are not generally seen in integrations of global atmospheric general circulation models at climate model resolutions (T42 ~ 300 km).
• In fact, in CCM3 at T239 (50 km), the lowest pressure attained is 995 mb. No realistic cyclones are simulated.
• However, in high resolution simulations of the finite volume dynamics version of CAM2, strong tropical cyclones are common.
Finite Volume Dynamics CAM
• Run in an ‘AMIP’ Mode– Specified sea surface temperature and sea ice extent– Integrated from 1979 to 2000
• We are studying four resolutions– B: 2ox2.5o
– C: 1ox1.25o
– D: 0.5ox0.625o
– E: 0.25ox0.375o
• Processor Configuration and Cost (IBM SP3)– B: 64 processors, 10 wall clock hours / simulated year– C: 160 processors, 22 wall clock hours / simulated year– D: 640 processors, 33 wall clock hours / simulated year– E: 640 processors, 135 wall clock hours / simulated year
New Science Question: Hurricane Statistics
1979 1980 1981 1982 ObsNorthwest PacificBasin
>25 ~30 40
Atlantic Basin
~6 ~12 ?
Work in progress—results to be published later this year
What is the effect of different climate scenarios on number and severity of tropical storms?
Second Observation about CSE
2. CSE makes most progress when applied mathematics and computer science are tightly integrated into the project
• Increasing computer power alone will not give us sufficient capability to solve most important problems
• Teraflop/s is necessary but not sufficient
Application in Combustion:Block-Structured AMR
(J. Bell and P. Colella, LBNL)Each level is a union of rectangular patchesEach grid patch:• Logically structured, rectangular• Refined in space and time by
evenly dividing coarse grid cells• Dynamically created/destroyed to track time-
dependent features• In parallel, grids distributed based on work
estimate
Level 0 Level 1 Level 2
Block-structured hierarchical grids(Berger and Colella, 1989)
Experiment and Simulation
Experiment by R. Cheng in
LBNL combustion lab
Simulations by J. Bell and M. Day
LBNL using NERSC
V-Flame Simulation Stats
• AMR stats
• Run on seaborg.nersc.gov, 256 CPUs, 2 steps/hr
• In 2004, the Berkeley Lab group is the only group capable of fully detailed simulations of laboratory-scale methane flames.Groups employing traditional simulation techniques areseverely limited, even on vector-parallel supercomputers
Third Observation about CSE
3. The most promising algorithms are a poor match for today’s most popular system architectures
SciDAC Algorithm Success Story• A general sparse solver,
Parallel SuperLU, developed at Berkeley Lab by Sherry Li, has been incorporated into NIMROD
• Improvement in NIMROD execution time by a factor of five to ten on the NERSC IBM SP. “This would be the equivalent of three to five years progress in computing hardware.”
• Sustained performance of sparse solvers on current architectures is less than 10 % of peak
Near Term Science Breakthroughs Enabled by Computing
Science Areas Goals Computational Methods Breakthrough Target
Nanoscience Simulate the synthesis and predict the properties of multi-component nanosystems
Quantum molecular dynamicsQuantum Monte CarloIterative eigensolversDense linear algebra
Parallel 3D FFTs
Simulate nanostructures with hundreds to thousands of atoms as well as transport and optical properties and other parameters
Combustion Predict combustion processes to provide efficient, clean and sustainable energy
Explicit finite differenceImplicit finite difference
Zero-dimensional physicsAdaptive mesh refinement
Lagrangian particle methods
Simulate laboratory scale flames with high fidelity representations of governing physical processes
Fusion Understand high-energy density plasmas and develop an integrated simulation of a fusion reactor
Multi-physics, multi-scaleParticle methods
Regular and irregular accessNonlinear solvers
Adaptive mesh refinement
Simulate the ITER reactor
Climate Accurately detect and attribute climate change, predict future climate and engineer mitigation strategies
Finite difference methodsFFTs
Regular and irregular accessSimulation ensembles
Perform a full ocean/atmosphere climate model with 0.125 degree spacing, with an ensemble of 8-10 runs
Astrophysics Determine through simulations and analysis of observational data the origin, evolution and fate of the universe, the nature of matter and energy, galaxy and stellar evolutions
Multi-physics, multi-scaleDense linear algebra
Parallel 3D FFTsSpherical transforms
Particle methodsAdaptive mesh refinement
Simulate the explosion of a supernova with a full 3D model
Science Drives ArchitectureState-of-the-art computational science requires increasingly diverse and complex algorithms
Only balanced systems that can perform well on a variety of problems will meet future scientists’ needs!Data-parallel and scalar performance are both important
Science Areas
Multi-Physics
and Multi-Scale
Dense Linear
Algebra
FFTs Particle Methods
AMR Data Parallelism
Irregular Control
Flow
Nanoscience X X X X X X
Combustion X X X X XFusion X X X X X XClimate X X X X XAstrophysics X X X X X X X
New Science Presents New Architecture Challenges
Future high end computing requires an architecture capable of achieving high performance across a spectrum of key state-of-the-art applications
• Data parallel algorithms do well on machines with high memory bandwidth (vector or superscalar)
• Irregular control flow requires excellent scalar performance
• Spectral and other methods require high bisection bandwidth
Scalar Performance Increasingly Important
• Cannot use dense methods for largest systems because of N3 algorithm scaling. Need to use sparse and adaptive methods with irregular control flow
• Complex microphysics results in complex inner loops
“It would be a major step backward to acquire a new platform that could reach the 100 Tflop level for only a few applications that had ‘clean’ microphysics. Increasingly realistic models usually mean increasingly complex microphysics. Complex microphysics is not amenable to [simple vector operations].”
– Doug Swesty, SUNY Stony Brook
Overview
• Introducing NERSC and Computing Sciences at Berkeley Lab
• Current Trends in Supercomputing (High-End Computing)
• What Supercomputers Do
• What Supercomputers Still Can’t Do
Projected Performance Development
TOP 500 Performance Projection
1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015
N=1
N=500
SUM
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s10 Pflop/s
1 Eflop/s100 Pflop/s
DARPA HPCS
BlueGene/L
The Exponential Growth of Computing, 1900-1998
Hollerith Tabulator
Bell Calculator Model 1
ENIAC
IBM 704
IBM 360 Model 75Cray 1
Pentium II PC
Adapted from Kurzweil, The Age of Spiritual Machines
The Exponential Growth of Computing, 1900-2100
Adapted from Kurzweil, The Age of Spiritual Machines
Growth of Computing Power and “Mental Power”
Hans Moravec, CACM 10, 2003, pp 90-97
Why this simplistic view is wrong
• Unsuitability of Current Architectures– Teraflop systems are focused on excelling in
computing; only one of the six (or eight) dimensions of human intelligence
• Fundamental lack of mathematical models for cognitive processes
– That’s why we are not using the most powerful computers today for cognitive tasks
• Complexity limits– We don’t even know yet how to model
turbulence, how then do we model thought?
“The computer model turns out not to be helpful in explaining what people actually do when they think and perceive”Hubert Dreyfus, pg.189
Example: one of the biggest success stories of machine intelligence, the chess computer “Deep Blue”, did not teach us anything about how a chess grandmaster thinks.
Six Dimensions of Intelligence
1. Verbal-Linguistic ability to think in words and to use language to express and appreciate complex concepts
2. Logical-Mathematical makes it possible to calculate, quantify, consider propositions and hypotheses, and carry out complex mathematical operations
3. Spatial capacity to think and orientate in physical three-dimensional environment
4. Bodily-Kinesthetic ability to manipulate objects and fine-tune physical skills
5. Musical sensitivity to pitch, melody, rhythm, and tone
6. Interpersonal capacity to understand and interact effectively with others
Howard Gardner. Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books, 1983, 1993.
Building New Models
• About 1/3 of human brain is probably dedicated towards processing of visual information
• We have only very rudimentary knowledge of the principles for human vision computing
• Research project by Don Glaser at UC Berkeley investigates mapping from retina to visual cortex• Attempt to model “optical illusions” and simple movement of objects in the visual cortex• Current models limited to about 10**5 neurons•Project at NERSC in 2005
Fourth Observation about CSE
4. There are vast areas of science and engineering where CSE has not even begun to make an impact
– current list of CSE applications is almost the same as 15 years ago
– current set of architectures is capturing only a small subset of human cognitive abilities
– in many scientific areas there is still an almost complete absence of computational models
See also: Y. Deng, J. Glimm, and D. H. Sharp, Perspectives on Parallel Computing, Daedalus Vol 12 (1992) 31-52.
Major Application Areas of CSE
• Science– Global climate modeling– Astrophysical modeling– Biology: genomics, protein folding, drug design– Computational chemistry– Computational material sciences and nanosciences
• Engineering– Crash simulation– Semiconductor design– Earthquake and structural modeling– Computational fluid dynamics– Combustion
• Business– Financial and economic modeling– Transaction processing, web services, and search engines
• Defense– Nuclear weapons—test by simulations– Cryptography
This list from 2004 is identical to a list from 1992!
Conclusions
• CSE has become well established in the US and is at the threshold of enabling significant scientific breakthroughs
• CSE permits us to ask new scientific questions• CSE makes most progress when applied mathematics and computer
science are tightly integrated
• CSE has tremendous research opportunities for computer scientists and applied mathematicians
• The most promising algorithms are a poor match for today’s most popular system architectures
• The are vast areas of science and engineering where computational modeling has not even begun to make an impact (e.g. cognitive computing)