IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32...
Transcript of IBM Presentations: Smart Planet Template · 1. Chip 16 cores 2. Module Single Chip 4. Node Card 32...
•
•
•
2
•
•
•
•
•
•
3
•
•
•
•
•
•
4
•
•
•
•
•
•
5
6
•
•
•
•
Fujitsu K Computer SPARC64 VIIIfx CPUs 8-core 2.0 GHz
8 floating point ops per cycle
Custom Tofu Interconnect
Approx 800 racks total Water cooled
17,136 (nodes) x 4 (sockets) x 8 (cores) x 8 (FP/cycle) x 2.0 (GHz)
= 8.773632 PFlops (Rpeak)
New #1 – K Computer at RIKEN Advanced Institute for Computational Science - Japan
10.51
8.162 PFlops Rmax
93% Linpack Efficiency
3.2 times previous #1
13.8% of Jun’11 aggregate throughput
9.898 Megawatts
Computer Power Consumption
824.6 MF/w 11.280 PF Rpeak
830.2
12.66 Megawatts
22,032 (nodes)
14.2% of Nov’11
•
Fortran C/C++
MPI OpenMP
CUDA OpenCL
OpenACC
HMPP
MIC
Scout
X10
CAF
CLIK
StarSs
G.Array
UPC
Chapel 7
8
•
•
•
•
•
•
–
–
–
9
10
•
•
•
•
11
•
•
•
•
•
•
12
•
•
•
•
•
•
•
•
•
•
•
•
•
•
o
o
•
•
13
•
•
•
•
•
•
14
•
•
•
15
• °C
•
•
•
•
16
17
•
•
•
•
18
•
•
•
•
•
•
19
The Mont-Blanc Project
• To develop an European exascale approach
• Based on embedded power-efficient technology
•
Taken from Alex Ramirez’s presentation, BSC 20
The Mont-Blanc Project
Integrated system design built from mobile / embedded components
• ARM multicore processors
• Nvidia Tegra / Denver, Calxeda, Marvell Armada, ST-Ericsson Nova A9600, …
• Mobile accelerators
• Mobile GPU (Nvidia GT 500M etc.)
• Embedded GPU (Nvidia Tegra, ARM Mali T604)
• Low power 10 GbE switches
Taken from Alex Ramirez’s presentation, BSC 21
The Mont-Blanc Project
• Exploit massive number of low-power processors
• Sustain performance with lower bandwidth components (i.e
interconnect and Memory)
• Programmability
Taken from Alex Ramirez’s presentation, BSC 22
1. Chip 16 cores
2. Module
Single Chip
4. Node Card
32 Compute Cards,
Optical Modules, Link
Chips, Torus
5a. Midplane
16 Node Cards
6. Rack
2 Midplanes
1, 2 or 4 I/O Drawers
7. System
96 racks @ 20PF/s
3. Compute Card
One single chip module,
16 GB DDR3 Memory
5b. I/O Drawer
8 I/O Cards w/16 GB
8 PCIe Gen2 slots
IBM Blue Gene/Q
Per Rack
Peak Performance 209 TF
Sustained (Linpack) ~170+ TF
Power ~100 kW
Power Efficiency ~2 GF/W
Scalability
23
BG/Q
Processor 64-bit
PowerPC
Processor Frequency 1.6 GHz
Nodes/Rack x Cores 1024 x 16
Memory/Core 1 GB
Memory Bandwidth 43 GB/s
Cores/Rack 16384
Peak
Performance/Rack 209.7 TF
Average Power/Rack 65 kW
Availability 1H12
•
•
•
•
•
Blue Gene/Q Ultra Low Power, Dense Parallel System
24
25
•
•
•
•
•
•
26
•
o
•
•
o
27
••
••
••
•
•
•
•28
•
•
•
•
•
•
•
•
•
•
29
Application performance
difference in %
power consumption
difference in %
CP2K 8.5 16.3
SEISSOL 10.9 18.2
GADGET 7.2 18.7
LBDC 4.5 16.5
NAMD 6.0 19.7
WRF 4.5 13.6
Lesli3d 1.7 13.1
GemsFDTD 1.3 13.1
BQCD 0.0 13.6
WALBERLA 0.0 13.7
•
•30
•
•
•
•
•
•
31
•
•
•
•
•
– MDDTL ~ 7 years (simulated, MTTFdisk=600Khrs, Weibull, 100-PB usable)
32
Software / De-clustered RAID
Failu
re
Read
Write
Failu
re
22 HDDs
Traditional RAID
Declustered RAID
33
•
•
•
•
•
•
34
Thank you
35