Jezelf Groen Rekenen met Supercomputers - SURF .Jezelf Groen Rekenen met Supercomputers...
Embed Size (px)
Transcript of Jezelf Groen Rekenen met Supercomputers - SURF .Jezelf Groen Rekenen met Supercomputers...
Symposium Groene ICT en duurzaamheid: Nieuwe energie in het hoger onderwijs
Jezelf Groen Rekenen met Supercomputers
Walter Lioen Groepsleider Supercomputing
SURFsara offers an integrated ICT research infrastructure and provides services in the areas of computing, data storage, visualization, networking, cloud and e-Science.
SARA was founded in 1971 as an Amsterdam computing center by the two Amsterdam universities (UvA and VU) and the current CWI.
Independent as of 1995. Founded Vancis in 2008 offering ICT services and ICT products to
enterprises, universities, and educational and healthcare institutions.
As from 1 January 2013, SARA from then on SURFsara forms part of the SURF Foundation.
First supercomputer in The Netherlands in 1984 (Control Data Cyber 205). Hosting the national supercomputer(s) ever since.
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 2
What is a Supercomputer?
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation
Consequently, the specification of a supercomputer is constantly changing Rule of thumb: a supercomputer is at least 1,000 10,000 up to 100,000 times faster than an
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 3
Large scale scientific computing Simulation of processes tot are otherwise - Impossible in practice - Too expensive - Too dangerous - Too extended
Examples - Astronomy
- How did the universe begin? - How do stars form and evolve?
- Weather Prediction, Climatology - Nuclear Physics - Aerodynamics (cars, planes, rockets) - Biology (proteins, DNA, drugs) - Medical sciences (bone formation, blood flow)
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 4
HPL, the High-Performance Linpack benchmark, solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers
For Tianhe-2, the as of June 2013 nr. 1 (3,120,000 cores, 54.9 PFlop/s, 17.8 MW): - n = 9,960,000
Computational kernel: DGEMM (matrix multiply) Extremely efficient on all processors (in cache)
Limiting factors: - Speed of interconnect - Speed to (local accelerator) memory (for e.g. GPGPU)
However, far more important: application speed In Amsterdam a Ferrari is useless (speed-wise)
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 5
Top500 iPad 2 performance
An A5 processor core of an iPad 2 is as fast as a four processor Cray 2 supercomputer (1.951 GFlop/s)
In 1985 an eight processor Cray 2 was the fastest supercomputer in the world The iPad 2 would still have been listed in the Top500 of 1994
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 6
Green500: MFlop/s / Watt
November 2013 Green500 List observations: Rank 1 10: (Intel Xeon + NVIDIA K20)
- commodity processors with GPGPUs (graphics processing units) Rank 1: TSUBAME-KFC (Japan, Ivy Bridge + NVIDIA K20x)
- 4,503.17 MFlop/s / W (first time > 4 GFlop/s / W)
- An exaflop system would require 222 MW (DARPAs target is > 1 EFlop/s using < 20 MW)
Rank 4: Piz Daint (Switzerland, Cray XC30, Sandy Bridge + NVIDIA K20x) - 3,185.91 MFlop/s / W - the greenest petaflop supercomputer - the current Top500 #6
Rank 12: (USA, Blue Gene/Q) - 2,299.15 MFlop/s / W - highest ranked non-heterogeneous (CPU only) system
Rank 40: Thianhe-2 (China, Ivy Bridge + Xeon Phi) - 1,901.54 MFlop/s / W - the current Top500 #1
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 7
SURFsara National Supercomputing History
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014
Year Machine Rpeak GFlop/s kW GFlop/s
/ kW 1984 CDC Cyber 205 1-pipe 0.1 250 0.0004 1988 CDC Cyber 205 2-pipe 0.2 250 0.0008 1991 Cray Y-MP/4128 1.33 200 0.0067 1994 Cray C98/4256 4 300 0.0133 1997 Cray C916/121024 12 500 0.024 2000 SGI Origin 3800 1,024 300 3.4 2004 SGI Origin 3800 + Altix 3700 3,200 500 6.4 2007 IBM p575 Power5+ 14,592 375 40 2008 IBM p575 Power6 62,566 540 116 2009 IBM p575 Power6 64,973 560 116 2013 Bull bullx B710 (DLC) + R428 270,950 245 1106 2014 Bull bullx B515 (NVIDIA K40) >200,000 3333 2014 Bull bullx complete system >1,000,000 >520 >1923
Moores Law (1965)
The number of transistors on an integrated circuit doubles every 2 years Because of faster transistors, the speed doubles every 18 months The clock speed stopped doubling a couple of years ago Nowadays the number of cores doubles
Moore noted that if car manufacturers had something like this, cars would get 100,000 miles to the gallon and it would be cheaper to buy a Rolls Royce than park it. (Cars would also be only a half an inch long.)
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 9
Phase 1 (production June 2013, total peak performance 271 TFlop/s) Direct Liquid Cooled thin node islands
- 360 thin nodes, 2 12-core 2.4 GHz Intel Ivy Bridge CPUs/node, 64 GB/node - 180 thin nodes, 2 12-core 2.4 GHz Intel Ivy Bridge CPUs/node, 64 GB/node
Fat node island - 32 fat nodes, 4 8-core Intel Sandy Bridge CPUs/node, 256 GB/node
Total - 13,968 cores, 41.75 TB memory, 2.4 PB disk - Interconnect: InfiniBand 56 Gbit/s bandwidth, 3 s latency - Top 500 November 2013: # 184
Phase 1.5 (scheduled production 2014 Q2, total peak performance ~ 470 TFlop/s) Addition of accelerator island
- 66 nodes, 2 Intel Ivy Bridge CPUs/node, 2 NVIDIA Tesla K40 GPGPUs/node
Phase 2 (scheduled production 2014 H2, total peak performance > 1 PFlop/s) On-demand addition of thin node islands with latest Intel Haswell CPUs
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 10
All thin compute nodes use Direct Liquid Cooling - inlet temperature 30C: warm water cooling - free cooling if outdoor temperature < 30C
in Amsterdam: 99.1% of days - (Cartesius System) Power Usage Effectiveness 1.2
(typical PUE for cold water cooling: 1.4; air cooling: 1.6) System requirements based on detailed usage analysis
- which user applications - actual memory usage - I/O profiles
Optimized price/performance - TCO: total budget investment energy cooling housing ups (storage only) - performance: application throughput using the 7 most relevant applications (# jobs / lifetime) - maximization of application throughput / TCO
(optimization of power related costs vs. investment costs) left as an exercise for the vendor during the procurement
- result: using slower processors (lower clock frequency)
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 11
On demand growth - minimizes idle time - use latest technology maximizes value for money
- higher performance - lower energy
- (less good for Top500 ranking) On demand growth: accelerator island (NVIDIA K40)
- Phase 1 and Phase 2 (both CPU only) are general purpose - accelerators are more special purpose
- can deliver more MFlop/s / Watt - efficient use of accelerators requires
- suitable applications - investment in programming effort
- proven interest of more than 10 research groups
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 12
Scalable Hybrid Architecture
PRACE-2IP prototype: Bull System @ CSC, Finland EU collaboration: CSC, SURFsara, CSCS 44 nodes with two Intel Xeon Phi 7120X co-processors 37 nodes with two NVIDIA K40 GPGPUs SURFsara research topics: Programming paradigms
- Application porting to accelerator + MPI Energy policies
- Dynamic Voltage and Frequency Scaling (DVFS) Adjust frequency and voltage of the CPU. The actual workload determines which frequency/voltage is chosen.
- Dynamic Power Management (DPM) Power off when device becomes idle. Activation uses temporarily more energy.
- Maybe a hybrid policy, e.g. a mix of DPM and DVFS, is preferable.
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 13
Measuring Energy Consumption of Applications
MRA Cluster Green Software - SEFLab Software Energy Footprint Lab (founded by SIG and HvA) R&D project - SURFsara one of the seven partners
Provide insight in energy consumption - Total consumption after run - Consumption during run (time curve)
Using sensors in modern CPUs (RAPL) - CPU cores - Memory controller - PAPI to read hardware counters - Correlate with performance measurements (Flop/s/Watt)
Using sensors on node (IPMI) - Memory - Disk drives - Network card
Use SLURM (Cartesius batch system) - Link with resource manager - Energy consumption in job report
Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 14
Prof. dr. ir. Bendiks Jan Boersma (TU Delft) Studies
- conversion of heat into work or movement - conversion of movement into electricity - interaction between liquids and their
environment - fluid mechanics
Lower resistance in pipe networks using agents, polymers or chemicals - Gasunie during cold winters - Oil companies Trans-Alaska pipeline - Drilling of oil wells - Fire fighting in situations where the water
must be sprayed twice as high or far Tw