Post on 26-Dec-2015
How fast are fast computers?How fast are fast computers?
Xing CaiXing CaiOctober 26, 1998October 26, 1998
October 26, 1998October 26, 1998 2Xing Cai
OverviewOverview
• Modern fast computers at a glimpseModern fast computers at a glimpse
• Fast computers & scientific computingFast computers & scientific computing
• A closer look at SC performanceA closer look at SC performance
• Current situation & future trendsCurrent situation & future trends
• Concluding remarksConcluding remarks
October 26, 1998October 26, 1998 3Xing Cai
An indirect answerAn indirect answer
The slowest fast computer is faster than The slowest fast computer is faster than the fastest slow computer.the fastest slow computer.
October 26, 1998October 26, 1998 4Xing Cai
• Performance ranking of world’s 500 Performance ranking of world’s 500 most powerful computersmost powerful computers
• LINPACK benchmark LINPACK benchmark (floating-pt intensive)(floating-pt intensive)
• J. Dongara, H. Meuer, E. StrohmaierJ. Dongara, H. Meuer, E. Strohmaier
• Report every 6 months since June 93 Report every 6 months since June 93
• A good correction of A good correction of peak performancepeak performance
http://www.top500.orghttp://www.top500.org
KFlopsKFlops MFlopsMFlops GFlopsGFlops TFlopsTFlops
http://www.top500.orghttp://www.top500.orgRank Vendor/Type Rmax
Rpeak
#Proc Location Field
1 Intel ASCI Red 1,338G1,830G
9,152 SandiaLab US
Research
2 SGI T3E 1200 891.5G1,296G
1,080 Govern.US
Classified
3 SGI T3E 900 634.2G1,123G
1,248 Govern.US
Classified
4 SGI T3E 900 450.5G756G
840 UK MET Research
5 SGI T3E 448.6G614.4G
1,024 NASA Research
6 Hitachi/Tsukuba 368.2G614.4G
2,048 Univ.Tsukuba
Academic
7 SGI T3E 342.8G470.4G
784 MPGGermany
Research
120 SGI Origin2000 40.25G49.92G
128 UiBNorway
Academic
135 SGI T3E 38.58G52.80G
88 NTNUNorway
Academic
6/986/98
October 26, 1998October 26, 1998 6Xing Cai
ASCI Red TFLOPSASCI Red TFLOPS
85 cabinets, 9216 Intel Pentium Pro processors85 cabinets, 9216 Intel Pentium Pro processors
http://www.sandia.gov/ASCI/Red/main.html
October 26, 1998October 26, 1998 7Xing Cai
Some “high-end” computersSome “high-end” computers
• SGI Cray T3E 1200SGI Cray T3E 1200
• SGI Cray Origin 2000SGI Cray Origin 2000
• Fujitsu VPP 700Fujitsu VPP 700
• NEC SX-4NEC SX-4
• IBM RS/6000 SPIBM RS/6000 SP
October 26, 1998October 26, 1998 8Xing Cai
Vendor overviewVendor overview
http://www.netlib.org/utk/people/JackDongarra/top500-698/
October 26, 1998October 26, 1998 9Xing Cai
Vendor overviewVendor overview
http://www.netlib.org/utk/people/JackDongarra/top500-698/http://www.netlib.org/utk/people/JackDongarra/top500-698/
Scientific computing 50 yearsScientific computing 50 years
ENIAC - world’s 1st electronic computer for scientific computingENIAC - world’s 1st electronic computer for scientific computing
October 26, 1998October 26, 1998 11Xing Cai
Advance in hardwareAdvance in hardware• Rapid advance of microprocessor tech.Rapid advance of microprocessor tech.
• World’s most powerful computerWorld’s most powerful computer– ENIAC ENIAC 330 Flops330 Flops, 1946, 1946– Digital Alpha-21164 processor Digital Alpha-21164 processor 1.2 GFlops1.2 GFlops, 1997, 1997
• World’s most powerful computing siteWorld’s most powerful computing site– ONR ONR 583.73 KFlops583.73 KFlops, 1956, 1956– NSA NSA 4,088.76 GFlops4,088.76 GFlops, 1998-Oct-14, 1998-Oct-14
http://www.cnct.com/~gunter
““If car industry had made equal progress, you could buy a If car industry had made equal progress, you could buy a car for a few $, drive across US in a few minutes, and parkcar for a few $, drive across US in a few minutes, and parkit in your pocket!”it in your pocket!”
Scientific computing todayScientific computing today
http://www.psc.edu/science/projects.html
Earth & environmentEarth & environment
DNA modelling &DNA modelling &medical researchmedical research
October 26, 1998October 26, 1998 13Xing Cai
Grand challengeGrand challenge
“ “Fundamental problem in science or Fundamental problem in science or engineering, with potentially broad engineering, with potentially broad economic, political and/or scientific economic, political and/or scientific impact, that could be advanced by impact, that could be advanced by applying high performance computing applying high performance computing resources.”resources.”
Keyword: simulationKeyword: simulation
Numerical simulationNumerical simulation
Phy.Phy.phenomphenom
Math.Math.modelmodel
SoftwareSoftware hardwarehardwareAlgorithmAlgorithm
3rd paradigm3rd paradigmof science!of science!
October 26, 1998October 26, 1998 15Xing Cai
Advance in numericsAdvance in numerics
• Solution of Poisson’s equationSolution of Poisson’s equation
• For “standard” size For “standard” size nn=10=1066 (100x100x100)(100x100x100)
– Multigrid Multigrid 14.42 seconds14.42 seconds– Banded LU Banded LU 232.96 days232.96 days
Banded LU O (n7/3)
Jacobi O (n2)
Conj. Grad. O (n3/2)
Multigrid O (n)
56 MBytes56 MBytes160 GBytes160 GBytes
fu 2
bAx
Linear system withLinear system withsparsesparse matrices matrices
October 26, 1998October 26, 1998 16Xing Cai
How fast (and big) should fast How fast (and big) should fast computers be?computers be?
Global weather predictionGlobal weather prediction
• Navier-Stokes on 3D grid for the earthNavier-Stokes on 3D grid for the earth
• 100 m cells, 100 levels - 100 m cells, 100 levels - 5x105x1012 12 cellscells
• 5 variables per cell - 5 variables per cell - 200 TBytes200 TBytes
• 100 Flops/cell/minute100 Flops/cell/minute
• Required performance: Required performance: 8TFlops8TFlops
There is never enough computing power?There is never enough computing power?
October 26, 1998October 26, 1998 17Xing Cai
Electrical potential Electrical potential depolarization in human heartdepolarization in human heart
• Grid node spacing 2 nodes/mmGrid node spacing 2 nodes/mm
• Estimated 3D grid - Estimated 3D grid - 4,200,0004,200,000 nodes nodes
• Estimated CPU time - one processorEstimated CPU time - one processor– cpu per node 3.3 secondscpu per node 3.3 seconds– total: 4,200,000x3.3 = total: 4,200,000x3.3 = 160 days160 days
• Elapsed physical time:Elapsed physical time: 300 ms 300 ms
http://www.ifi.uio.no/~xingca/HEART/
We need parallel computingWe need parallel computing
October 26, 1998October 26, 1998 18Xing Cai
Parallel computingParallel computing
• We are approaching the limit of single We are approaching the limit of single microprocessor performancemicroprocessor performance
• We want to run larger simulationsWe want to run larger simulations
• We want shorter simulation timeWe want shorter simulation time
• More cost-effective computingMore cost-effective computing
October 26, 1998October 26, 1998 19Xing Cai
Oil reservoir simulationOil reservoir simulation
Simulation of 1000 days of gas injectionSimulation of 1000 days of gas injection
• Single-processor workstation simulationSingle-processor workstation simulation– one day for 80,000 unknownsone day for 80,000 unknowns
– 10 days for 800,000 unknowns10 days for 800,000 unknowns
– 200 days for 32,000,000 unknowns (impossible)200 days for 32,000,000 unknowns (impossible)
• Efficient parallel computingEfficient parallel computing– 128 processor IBM SP128 processor IBM SP
– 23 minutes for 32,000,000 unknowns (PETSc)23 minutes for 32,000,000 unknowns (PETSc)
Importance of efficient parallel computing!Importance of efficient parallel computing!
http://www.mcs.anl.gov/petsc/petsc.html
October 26, 1998October 26, 1998 20Xing Cai
Main questionMain question
Actual performance of real-life SC Actual performance of real-life SC applications are well below the peak applications are well below the peak performance. performance. Why?Why?
October 26, 1998October 26, 1998 21Xing Cai
LINPACK benchmark revisitedLINPACK benchmark revisited
• Direct Direct solution ofsolution of dense dense matrix systemsmatrix systems
• Limited application in SCLimited application in SC
• Simple data structureSimple data structure
• Close to artificial test problemClose to artificial test problem• Only a more realistic upper-bound of Only a more realistic upper-bound of
achievable peak performance - achievable peak performance - 20% of 20% of reported performance can be expectedreported performance can be expected
October 26, 1998October 26, 1998 22Xing Cai
Characteristics of SCCharacteristics of SC
• Data intensive computingData intensive computing– 1 GFlops - memory bandwidth 24GB/s 1 GFlops - memory bandwidth 24GB/s
(example DAXPY)(example DAXPY)– Memory hierarchyMemory hierarchy
• Complex data structureComplex data structure– Sparse matricesSparse matrices– Structured grid vs unstructured gridStructured grid vs unstructured grid– Adaptive grid refinementAdaptive grid refinement
• Communication & synchronizationCommunication & synchronization
October 26, 1998October 26, 1998 23Xing Cai
Multigrid methodMultigrid method
• Suits well for large sparse systemsSuits well for large sparse systems– asymptotically optimal operation countasymptotically optimal operation count– less 100 floating pt ops per unknownless 100 floating pt ops per unknown
• Complex data structureComplex data structure
• Relatively low performanceRelatively low performance
Stals & RüdeStals & Rüde - - Techniques for improving the data Techniques for improving the data locality of iterative methodslocality of iterative methods
October 26, 1998October 26, 1998 24Xing Cai
Architecture bottleneckArchitecture bottleneck
• Imbalance between processor speed Imbalance between processor speed and memory access speedand memory access speed– Processor speed annual increase >= 60%Processor speed annual increase >= 60%– Memory access speed annual increase Memory access speed annual increase
5%-10%5%-10%
• Inter-processor communication latency Inter-processor communication latency & bandwidth& bandwidth
• Memory sizeMemory size
October 26, 1998October 26, 1998 25Xing Cai
SC software todaySC software today
• Inefficient (not very cache-aware)Inefficient (not very cache-aware)
• Not very portableNot very portable
• Not very easy to maintainNot very easy to maintain
• Not very user-friendlyNot very user-friendly
• Hard to program real-life applicationsHard to program real-life applications
• Limited compiler parallelismLimited compiler parallelism– Hard to program parallel codesHard to program parallel codes
October 26, 1998October 26, 1998 26Xing Cai
O-O numerical softwareO-O numerical software
• Better representation of mathematicsBetter representation of mathematics
• Manpower effectiveManpower effective
• Stable code, easy maintenance Stable code, easy maintenance
• Good flexibility & extensibilityGood flexibility & extensibility
• Structured & efficient parallelizationStructured & efficient parallelization
• Need care for efficiencyNeed care for efficiency
• Standard is not settled yetStandard is not settled yet
October 26, 1998October 26, 1998 27Xing Cai
Trend in architectureTrend in architecture
http://www.netlib.org/utk/people/JackDongarra/top500-698/http://www.netlib.org/utk/people/JackDongarra/top500-698/
October 26, 1998October 26, 1998 28Xing Cai
Trend in CPU technologyTrend in CPU technology
http://www.netlib.org/utk/people/JackDongarra/top500-698/http://www.netlib.org/utk/people/JackDongarra/top500-698/
October 26, 1998October 26, 1998 29Xing Cai
Future trendsFuture trends
• Progress of semi-conductor technologyProgress of semi-conductor technology– over 10over 1099 transistors per chip in future transistors per chip in future– increased on-chip parallelismincreased on-chip parallelism
• Architecture changes are neededArchitecture changes are needed
• Impact on scientific computingImpact on scientific computing– RüdeRüde::Technological trends and their Technological trends and their
impact on the future of supercomputersimpact on the future of supercomputers
• Different levels of parallelismDifferent levels of parallelism
October 26, 1998October 26, 1998 30Xing Cai
MetacomputingMetacomputing
• Demand for enormous computing powerDemand for enormous computing power– US airforce battle simulation (8 US supercomputing centers)US airforce battle simulation (8 US supercomputing centers)
– Unicore project (link supercomputers in Germany and US)Unicore project (link supercomputers in Germany and US)
• Better utilization of idle comp. powerBetter utilization of idle comp. power
• ““Seamless web” - heterogeneous comp.Seamless web” - heterogeneous comp.
• Need a balanced system connected by Need a balanced system connected by high-speed networkshigh-speed networks
• Need a scalable distr. operating systemNeed a scalable distr. operating system
October 26, 1998October 26, 1998 31Xing Cai
Supercomputers in futureSupercomputers in future• ASCI Option White - IBM 10 TFlopsASCI Option White - IBM 10 TFlops
• 100 TFlops computers in near future100 TFlops computers in near future
• Petaflops (10Petaflops (101515))– 10,000-1,000,000 procs10,000-1,000,000 procs– feasible and “affordable” in 2010?feasible and “affordable” in 2010?
October 26, 1998October 26, 1998 32Xing Cai
Some observationsSome observations
• HPSC is a small but exciting fieldHPSC is a small but exciting field
• Supercomputers adopt commodity techSupercomputers adopt commodity tech
• Affordable parallel systems availableAffordable parallel systems available– SMP, distributed shared memorySMP, distributed shared memory– cluster of shared memory machinescluster of shared memory machines– parallel computing standard appearingparallel computing standard appearing
• Scientific software industry is still in its Scientific software industry is still in its early stageearly stage
October 26, 1998October 26, 1998 33Xing Cai
Challenges for SCChallenges for SC
• NumericsNumerics– faster algorithmsfaster algorithms– good data localitygood data locality– low communication requirementlow communication requirement
• SoftwareSoftware– efficient (performance, manpower)efficient (performance, manpower)– high-level problem solving environmenthigh-level problem solving environment
• HardwareHardware– changes of architecturechanges of architecture
October 26, 1998October 26, 1998 34Xing Cai
Some citations Some citations
‘ ‘Intentions of the scientific users strongly Intentions of the scientific users strongly differ from the industrial users.’differ from the industrial users.’
Ulrich TrottenbergUlrich Trottenberg, , GMDGMD
‘ ‘There’s a future for high-performance There’s a future for high-performance parallel computing out there.’parallel computing out there.’
Tony HeyTony Hey, , Univ. SouthamptonUniv. Southampton
‘ ‘Allow datastructures and algorithms to Allow datastructures and algorithms to guide us to the appropriate architecture.’guide us to the appropriate architecture.’
John VrolykJohn Vrolyk, , SGI senior vice presidentSGI senior vice president
October 26, 1998October 26, 1998 35Xing Cai
The whole pictureThe whole picture
We are in the same boat...We are in the same boat...
SupercomputerSupercomputerVendorVendor
ScientificScientificComputingComputing
IndustryIndustry
GovernmentGovernment
General Public?General Public?
October 26, 1998October 26, 1998 36Xing Cai
Concluding remarksConcluding remarks
• Huge potential of scientific computingHuge potential of scientific computing
• More real-life applications to comeMore real-life applications to come
• Growing demand of computing powerGrowing demand of computing power
• Scientific computing needs advances inScientific computing needs advances in– numerical algorithmsnumerical algorithms– software technologysoftware technology– hardwarehardware
October 26, 1998October 26, 1998 37Xing Cai
QuizQuiz
What was world’s fastest computer on June 2nd 1998?What was world’s fastest computer on June 2nd 1998?
‘‘It was a HP notebook used on Space shuttle “Discovery”It was a HP notebook used on Space shuttle “Discovery”to compute orbital position. The speed was 17,500 mph.’to compute orbital position. The speed was 17,500 mph.’
Jack DongaraJack Dongara