Jezelf Groen Rekenen met Supercomputers - SURF · Jezelf Groen Rekenen met Supercomputers –...

Symposium Groene ICT en duurzaamheid: Nieuwe energie in het hoger onderwijs

Jezelf Groen Rekenen met Supercomputers

Walter Lioen <[email protected]> Groepsleider Supercomputing

About SURFsara

•  SURFsara offers an integrated ICT research infrastructure and provides services in the areas of computing, data storage, visualization, networking, cloud and e-Science.

•  SARA was founded in 1971 as an Amsterdam computing center by the two Amsterdam universities (UvA and VU) and the current CWI.

•  Independent as of 1995. •  Founded Vancis in 2008 offering ICT services and ICT products to

enterprises, universities, and educational and healthcare institutions.

•  As from 1 January 2013, SARA – from then on SURFsara – forms part of the SURF Foundation.

•  First supercomputer in The Netherlands in 1984 (Control Data Cyber 205). Hosting the national supercomputer(s) ever since.

Jezelf Groen Rekenen met Supercomputers – Walter Lioen January 30, 2014 2

What is a Supercomputer?

•  A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation

•  Consequently, the specification of a supercomputer is constantly changing •  Rule of thumb: a supercomputer is at least 1,000 – 10,000 up to 100,000 times faster than an

average PC


Why supercomputing?

•  Large scale scientific computing Simulation of processes tot are otherwise -  Impossible in practice -  Too expensive -  Too dangerous -  Too extended

•  Examples -  Astronomy

-  How did the universe begin? -  How do stars form and evolve?

-  Weather Prediction, Climatology -  Nuclear Physics -  Aerodynamics (cars, planes, rockets) -  Biology (proteins, DNA, drugs) -  Medical sciences (bone formation, blood flow)


Top500: PFlop/s

•  HPL, the High-Performance Linpack benchmark, solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers

•  For Tianhe-2, the as of June 2013 nr. 1 (3,120,000 cores, 54.9 PFlop/s, 17.8 MW): -  n = 9,960,000

•  Computational kernel: DGEMM (matrix multiply) •  Extremely efficient on all processors (in cache)

•  Limiting factors: -  Speed of interconnect -  Speed to (local accelerator) memory (for e.g. GPGPU)

•  However, far more important: application speed •  “In Amsterdam a Ferrari is useless (speed-wise)”


Top500 – iPad 2 performance

•  An A5 processor core of an iPad 2 is as fast as a four processor Cray 2 supercomputer (1.951 GFlop/s)

•  In 1985 an eight processor Cray 2 was the fastest supercomputer in the world •  The iPad 2 would still have been listed in the Top500 of 1994


Green500: MFlop/s / Watt

November 2013 Green500 List observations: •  Rank 1 – 10: (Intel Xeon + NVIDIA K20)

-  commodity processors with GPGPUs (graphics processing units) •  Rank 1: TSUBAME-KFC (Japan, Ivy Bridge + NVIDIA K20x)

-  4,503.17 MFlop/s / W (first time > 4 GFlop/s / W)

-  An exaflop system would require 222 MW (DARPA’s target is > 1 EFlop/s using < 20 MW)

•  Rank 4: Piz Daint (Switzerland, Cray XC30, Sandy Bridge + NVIDIA K20x) -  3,185.91 MFlop/s / W -  the greenest petaflop supercomputer -  the current Top500 #6

•  Rank 12: (USA, Blue Gene/Q) -  2,299.15 MFlop/s / W -  highest ranked non-heterogeneous (CPU only) system

•  Rank 40: Thianhe-2 (China, Ivy Bridge + Xeon Phi) -  1,901.54 MFlop/s / W -  the current Top500 #1


SURFsara National Supercomputing History

Jezelf Groen Rekenen met Supercomputers – Walter Lioen January 30, 2014

Year Machine Rpeak GFlop/s kW GFlop/s

/ kW 1984 CDC Cyber 205 1-pipe 0.1 250 0.0004 1988 CDC Cyber 205 2-pipe 0.2 250 0.0008 1991 Cray Y-MP/4128 1.33 200 0.0067 1994 Cray C98/4256 4 300 0.0133 1997 Cray C916/121024 12 500 0.024 2000 SGI Origin 3800 1,024 300 3.4 2004 SGI Origin 3800 + Altix 3700 3,200 500 6.4 2007 IBM p575 Power5+ 14,592 375 40 2008 IBM p575 Power6 62,566 540 116 2009 IBM p575 Power6 64,973 560 116 2013 Bull bullx B710 (DLC) + R428 270,950 245 1106 2014 Bull bullx B515 (NVIDIA K40) >200,000 <60 >3333 2014 Bull bullx complete system >1,000,000 >520 >1923

8

Moore’s Law (1965)

•  The number of transistors on an integrated circuit doubles every 2 years •  Because of faster transistors, the speed doubles every 18 months •  The clock speed stopped doubling a couple of years ago •  Nowadays the number of cores doubles

• Moore noted that if car manufacturers had something like this, cars would get 100,000 miles to the gallon and it would be cheaper to buy a Rolls Royce than park it. (Cars would also be only a half an inch long.)


Cartesius – specs

Phase 1 (production June 2013, total peak performance 271 TFlop/s) •  Direct Liquid Cooled thin node islands

-  360 thin nodes, 2 × 12-core 2.4 GHz Intel Ivy Bridge CPUs/node, 64 GB/node -  180 thin nodes, 2 × 12-core 2.4 GHz Intel Ivy Bridge CPUs/node, 64 GB/node

•  Fat node island -  32 fat nodes, 4 × 8-core Intel Sandy Bridge CPUs/node, 256 GB/node

•  Total -  13,968 cores, 41.75 TB memory, 2.4 PB disk -  Interconnect: InfiniBand 56 Gbit/s bandwidth, 3 µs latency -  Top 500 November 2013: # 184

Phase 1.5 (scheduled production 2014 Q2, total peak performance ~ 470 TFlop/s) •  Addition of accelerator island

-  66 nodes, 2 × Intel Ivy Bridge CPUs/node, 2 × NVIDIA Tesla K40 GPGPUs/node

Phase 2 (scheduled production 2014 H2, total peak performance > 1 PFlop/s) •  On-demand addition of thin node islands with latest Intel Haswell CPUs


Cartesius – Greenness

•  All thin compute nodes use Direct Liquid Cooling -  inlet temperature 30ºC: warm water cooling -  free cooling if outdoor temperature < 30ºC

in Amsterdam: 99.1% of days -  (Cartesius System) Power Usage Effectiveness 1.2

(typical PUE for cold water cooling: 1.4; air cooling: 1.6) •  System requirements based on detailed usage analysis

-  which user applications -  actual memory usage -  I/O profiles

• Optimized price/performance -  TCO: total budget ＝investment ＋ energy ＋ cooling ＋ housing ＋ ups (storage only) -  performance: application throughput using the 7 most relevant applications (# jobs / lifetime) -  maximization of application throughput / TCO

(optimization of power related costs vs. investment costs) left as an “exercise” for the vendor during the procurement

-  result: using “slower” processors (lower clock frequency)


Cartesius – Greenness

• On demand growth -  minimizes idle time -  use latest technology maximizes value for money

-  higher performance -  lower energy

-  (less good for Top500 ranking) • On demand growth: accelerator island (NVIDIA K40)

-  Phase 1 and Phase 2 (both CPU only) are general purpose -  accelerators are more special purpose

-  can deliver more MFlop/s / Watt -  efficient use of accelerators requires

-  suitable applications -  investment in programming effort

-  proven interest of more than 10 research groups


Scalable Hybrid Architecture

PRACE-2IP prototype: Bull System @ CSC, Finland EU collaboration: CSC, SURFsara, CSCS •  44 nodes with two Intel Xeon Phi 7120X co-processors •  37 nodes with two NVIDIA K40 GPGPUs SURFsara research topics: •  Programming paradigms

-  Application porting to accelerator + MPI •  Energy policies

-  Dynamic Voltage and Frequency Scaling (DVFS) Adjust frequency and voltage of the CPU. The actual workload determines which frequency/voltage is chosen.

-  Dynamic Power Management (DPM) Power off when device becomes idle. Activation uses temporarily more energy.

-  Maybe a hybrid policy, e.g. a mix of DPM and DVFS, is preferable.


Measuring Energy Consumption of Applications

• MRA Cluster Green Software -  SEFLab – Software Energy Footprint Lab (founded by SIG and HvA) R&D project -  SURFsara one of the seven partners

•  Provide insight in energy consumption -  Total consumption after run -  Consumption during run (time curve)

•  Using sensors in modern CPUs (RAPL) -  CPU cores -  Memory controller -  PAPI to read hardware counters -  Correlate with performance measurements (Flop/s/Watt)

•  Using sensors on node (IPMI) -  Memory -  Disk drives -  Network card

•  Use SLURM (Cartesius batch system) -  Link with resource manager -  Energy consumption in job report


•  Prof. dr. ir. Bendiks Jan Boersma (TU Delft) •  Studies

-  conversion of heat into work or movement -  conversion of movement into electricity -  interaction between liquids and their

environment -  fluid mechanics

•  Lower resistance in pipe networks using agents, polymers or chemicals -  Gasunie during cold winters -  Oil companies Trans-Alaska pipeline -  Drilling of oil wells -  Fire fighting in situations where the water

must be sprayed twice as high or far •  Two images of axial velocity in a cross-

section of a pipe flow. The pictures show the friction Reynolds number

Energy Technology


Sustainable Energy

•  Dr. Evgeny Pidko (TU/e assistant professor) •  Field of study: computational catalysis for

sustainable energy technologies •  Combining theory and experiment to

understand mechanisms of catalytic reactions on a molecular level

•  Computational studies using state-of-the-art quantum chemical methods

•  Used to formulate design rules for new and improved catalytic systems

•  Studying different processes related to the conversion of biomass and carbon dioxide to value-added chemicals and fuel components (fuels)

•  Research also focuses on more classical chemical systems in order to make technologies greener


Thank you for listening!


Jezelf Groen Rekenen met Supercomputers - SURF · Jezelf Groen Rekenen met Supercomputers –...

Documents

Transcript of Jezelf Groen Rekenen met Supercomputers - SURF · Jezelf Groen Rekenen met Supercomputers –...