Jezelf Groen Rekenen met Supercomputers - SURF .Jezelf Groen Rekenen met Supercomputers...

download Jezelf Groen Rekenen met Supercomputers - SURF .Jezelf Groen Rekenen met Supercomputers – Walter

of 17

  • date post

    26-Feb-2019
  • Category

    Documents

  • view

    225
  • download

    1

Embed Size (px)

Transcript of Jezelf Groen Rekenen met Supercomputers - SURF .Jezelf Groen Rekenen met Supercomputers...

Symposium Groene ICT en duurzaamheid: Nieuwe energie in het hoger onderwijs

Jezelf Groen Rekenen met Supercomputers

Walter Lioen Groepsleider Supercomputing

About SURFsara

SURFsara offers an integrated ICT research infrastructure and provides services in the areas of computing, data storage, visualization, networking, cloud and e-Science.

SARA was founded in 1971 as an Amsterdam computing center by the two Amsterdam universities (UvA and VU) and the current CWI.

Independent as of 1995. Founded Vancis in 2008 offering ICT services and ICT products to

enterprises, universities, and educational and healthcare institutions.

As from 1 January 2013, SARA from then on SURFsara forms part of the SURF Foundation.

First supercomputer in The Netherlands in 1984 (Control Data Cyber 205). Hosting the national supercomputer(s) ever since.

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 2

What is a Supercomputer?

A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation

Consequently, the specification of a supercomputer is constantly changing Rule of thumb: a supercomputer is at least 1,000 10,000 up to 100,000 times faster than an

average PC

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 3

Why supercomputing?

Large scale scientific computing Simulation of processes tot are otherwise - Impossible in practice - Too expensive - Too dangerous - Too extended

Examples - Astronomy

- How did the universe begin? - How do stars form and evolve?

- Weather Prediction, Climatology - Nuclear Physics - Aerodynamics (cars, planes, rockets) - Biology (proteins, DNA, drugs) - Medical sciences (bone formation, blood flow)

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 4

Top500: PFlop/s

HPL, the High-Performance Linpack benchmark, solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers

For Tianhe-2, the as of June 2013 nr. 1 (3,120,000 cores, 54.9 PFlop/s, 17.8 MW): - n = 9,960,000

Computational kernel: DGEMM (matrix multiply) Extremely efficient on all processors (in cache)

Limiting factors: - Speed of interconnect - Speed to (local accelerator) memory (for e.g. GPGPU)

However, far more important: application speed In Amsterdam a Ferrari is useless (speed-wise)

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 5

Top500 iPad 2 performance

An A5 processor core of an iPad 2 is as fast as a four processor Cray 2 supercomputer (1.951 GFlop/s)

In 1985 an eight processor Cray 2 was the fastest supercomputer in the world The iPad 2 would still have been listed in the Top500 of 1994

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 6

Green500: MFlop/s / Watt

November 2013 Green500 List observations: Rank 1 10: (Intel Xeon + NVIDIA K20)

- commodity processors with GPGPUs (graphics processing units) Rank 1: TSUBAME-KFC (Japan, Ivy Bridge + NVIDIA K20x)

- 4,503.17 MFlop/s / W (first time > 4 GFlop/s / W)

- An exaflop system would require 222 MW (DARPAs target is > 1 EFlop/s using < 20 MW)

Rank 4: Piz Daint (Switzerland, Cray XC30, Sandy Bridge + NVIDIA K20x) - 3,185.91 MFlop/s / W - the greenest petaflop supercomputer - the current Top500 #6

Rank 12: (USA, Blue Gene/Q) - 2,299.15 MFlop/s / W - highest ranked non-heterogeneous (CPU only) system

Rank 40: Thianhe-2 (China, Ivy Bridge + Xeon Phi) - 1,901.54 MFlop/s / W - the current Top500 #1

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 7

SURFsara National Supercomputing History

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014

Year Machine Rpeak GFlop/s kW GFlop/s

/ kW 1984 CDC Cyber 205 1-pipe 0.1 250 0.0004 1988 CDC Cyber 205 2-pipe 0.2 250 0.0008 1991 Cray Y-MP/4128 1.33 200 0.0067 1994 Cray C98/4256 4 300 0.0133 1997 Cray C916/121024 12 500 0.024 2000 SGI Origin 3800 1,024 300 3.4 2004 SGI Origin 3800 + Altix 3700 3,200 500 6.4 2007 IBM p575 Power5+ 14,592 375 40 2008 IBM p575 Power6 62,566 540 116 2009 IBM p575 Power6 64,973 560 116 2013 Bull bullx B710 (DLC) + R428 270,950 245 1106 2014 Bull bullx B515 (NVIDIA K40) >200,000 3333 2014 Bull bullx complete system >1,000,000 >520 >1923

8

Moores Law (1965)

The number of transistors on an integrated circuit doubles every 2 years Because of faster transistors, the speed doubles every 18 months The clock speed stopped doubling a couple of years ago Nowadays the number of cores doubles

Moore noted that if car manufacturers had something like this, cars would get 100,000 miles to the gallon and it would be cheaper to buy a Rolls Royce than park it. (Cars would also be only a half an inch long.)

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 9

Cartesius specs

Phase 1 (production June 2013, total peak performance 271 TFlop/s) Direct Liquid Cooled thin node islands

- 360 thin nodes, 2 12-core 2.4 GHz Intel Ivy Bridge CPUs/node, 64 GB/node - 180 thin nodes, 2 12-core 2.4 GHz Intel Ivy Bridge CPUs/node, 64 GB/node

Fat node island - 32 fat nodes, 4 8-core Intel Sandy Bridge CPUs/node, 256 GB/node

Total - 13,968 cores, 41.75 TB memory, 2.4 PB disk - Interconnect: InfiniBand 56 Gbit/s bandwidth, 3 s latency - Top 500 November 2013: # 184

Phase 1.5 (scheduled production 2014 Q2, total peak performance ~ 470 TFlop/s) Addition of accelerator island

- 66 nodes, 2 Intel Ivy Bridge CPUs/node, 2 NVIDIA Tesla K40 GPGPUs/node

Phase 2 (scheduled production 2014 H2, total peak performance > 1 PFlop/s) On-demand addition of thin node islands with latest Intel Haswell CPUs

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 10

Cartesius Greenness

All thin compute nodes use Direct Liquid Cooling - inlet temperature 30C: warm water cooling - free cooling if outdoor temperature < 30C

in Amsterdam: 99.1% of days - (Cartesius System) Power Usage Effectiveness 1.2

(typical PUE for cold water cooling: 1.4; air cooling: 1.6) System requirements based on detailed usage analysis

- which user applications - actual memory usage - I/O profiles

Optimized price/performance - TCO: total budget investment energy cooling housing ups (storage only) - performance: application throughput using the 7 most relevant applications (# jobs / lifetime) - maximization of application throughput / TCO

(optimization of power related costs vs. investment costs) left as an exercise for the vendor during the procurement

- result: using slower processors (lower clock frequency)

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 11

Cartesius Greenness

On demand growth - minimizes idle time - use latest technology maximizes value for money

- higher performance - lower energy

- (less good for Top500 ranking) On demand growth: accelerator island (NVIDIA K40)

- Phase 1 and Phase 2 (both CPU only) are general purpose - accelerators are more special purpose

- can deliver more MFlop/s / Watt - efficient use of accelerators requires

- suitable applications - investment in programming effort

- proven interest of more than 10 research groups

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 12

Scalable Hybrid Architecture

PRACE-2IP prototype: Bull System @ CSC, Finland EU collaboration: CSC, SURFsara, CSCS 44 nodes with two Intel Xeon Phi 7120X co-processors 37 nodes with two NVIDIA K40 GPGPUs SURFsara research topics: Programming paradigms

- Application porting to accelerator + MPI Energy policies

- Dynamic Voltage and Frequency Scaling (DVFS) Adjust frequency and voltage of the CPU. The actual workload determines which frequency/voltage is chosen.

- Dynamic Power Management (DPM) Power off when device becomes idle. Activation uses temporarily more energy.

- Maybe a hybrid policy, e.g. a mix of DPM and DVFS, is preferable.

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 13

Measuring Energy Consumption of Applications

MRA Cluster Green Software - SEFLab Software Energy Footprint Lab (founded by SIG and HvA) R&D project - SURFsara one of the seven partners

Provide insight in energy consumption - Total consumption after run - Consumption during run (time curve)

Using sensors in modern CPUs (RAPL) - CPU cores - Memory controller - PAPI to read hardware counters - Correlate with performance measurements (Flop/s/Watt)

Using sensors on node (IPMI) - Memory - Disk drives - Network card

Use SLURM (Cartesius batch system) - Link with resource manager - Energy consumption in job report

Jezelf Groen Rekenen met Supercomputers Walter Lioen January 30, 2014 14

Prof. dr. ir. Bendiks Jan Boersma (TU Delft) Studies

- conversion of heat into work or movement - conversion of movement into electricity - interaction between liquids and their

environment - fluid mechanics

Lower resistance in pipe networks using agents, polymers or chemicals - Gasunie during cold winters - Oil companies Trans-Alaska pipeline - Drilling of oil wells - Fire fighting in situations where the water

must be sprayed twice as high or far Tw