Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for...

12
Post-K Computer Development, Updates for SC’18 Copyright 2018 FUJITSU LIMITED 0

Transcript of Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for...

Page 1: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Post-K Computer Development, Updates for SC’18

Copyright 2018 FUJITSU LIMITED0

Page 2: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Fujitsu’s High-end HPC Development

Fujitsu has been leading HPC technologies for over 40 years

Copyright 2018 FUJITSU LIMITED

© RIKEN

Ranked Top500 No.1 in 2011Competitive in various fields

K computer

HPCGNo.3(2018)

Graph500No.1(2018)

Gordon Bell Prize Finalist

(2016)

PRIMEHPC FX10

PRIMEHPCFX100

Next Japanese flagship supercomputerunder development

Post-K computer

1

Page 3: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

RIKEN and Fujitsu are currently developing Post-K computer, the most advanced general-purpose supercomputer, in the world

Copyright 2018 FUJITSU LIMITED

Japanese National Project

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

K ComputerDevelopment

OperationInstall /Tuning

MassProduct

Prototype・Detailed DesignBasic

DesignPost-K

K ComputerOperation

Calendar Year

Post-K computer is optimized to achieve superior performance in real applications as next Japanese flagship system

2

Page 4: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Copyright 2018 FUJITSU LIMITED

Post-K Computer Goals and Approaches

Low power consumption

User convenience

Ability to produce ground-breaking

results

Application performanceApplication

performanceApplication

performance

Approach

1. A64FX CPU

2. Compiler and Runtime

3. LLIO (Lightweight Layered IO-Accelerator)

3

Page 5: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

A64FX CPU

Achieves high performance in HPC and AI applications

Arm Scalable vector extension (SVE), high-bandwidth caches and memory

(D|S|H)GEMM and INT (16b/8b) GEMM > 90%, STREAM Triad > 80%

Copyright 2018 FUJITSU LIMITED

A64FX(Post-K computer)

SPARC64 VIIIfx(K computer)

ISA (Base + Extension) Armv8.2-A + SVE SPARC-V9 + HPC-ACE

Process Technology 7 nm 45 nm

Peak Performance > 2.7 TFLOPS 128 GFLOPS

SIMD 512-bit 128-bit

# of Cores 48+4 8

Memory Peak B/W 1024 GB/s 64 GB/s

CMG: Core Memory GroupHBM2 8GiBHBM2 8GiBHBM2 8GiBHBM2 8GiB

Performance> 2.7 TFLOPS

CMG

L1 Cache > 11.0 TB/s (BF ratio = 4)

L2 Cache > 3.6 TB/s (BF ratio = 1.3)

512-bit wide SIMD 2x FMAs

CoreCore

12x Computing Cores + 1x Assistant Core

Memory 1024 GB/s (BF ratio =~0.37)

L2 Cache 8MiB, 16way

256GB/s

>115GB/s

>57GB/s

Core Core

L1D 64KiB, 4way

>230GB/s

>115GB/s

4

Page 6: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Compiler and Runtime

Fujitsu’s compiler and runtime libraries exploit the hardware capabilities along three dimensions

Support Fortran, C/C++, Python software development environment

Copyright 2018 FUJITSU LIMITED

Memory accessperformance

Computational performance

• Software prefetch• Loop-blocking

• Out-of-order• 512-bit SVE

• Software pipelining with Loop fission

• Auto-vectorization with SVE

Thread-parallel performance

• CMG & SVE optimized math library

• OpenMP 5.0 API & fast barrierCompiler & Runtime

Hardware capabilities• 48 cores in 4 CMG• Inter-core barrier

• Hardware prefetch• Stacked memory; HBM2

5

Page 7: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Preliminary Performance Evaluation Results

A64FX’s instruction set and compiler achieve high performance on loop of math function

Step 1. Armv8 coding

Step 2. + SVE + accel.Instruction coding

Step 3. + Inlined by compiler

Step 4. + Applied software pipelining by compiler

Copyright 2018 FUJITSU LIMITED

7.6 5.4 7.6 3.624.0

14.6

34.1

14.7

38.3

22.8

66.8

28.9

FLOAT EXP DOUBLE EXP FLOAT SIN DOUBLE SIN

Rel

ativ

e to

Ste

p.1

Step.2 Step.3 Step.4

6

Page 8: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Copyright 2018 FUJITSU LIMITED

LLIO (Lightweight Layered IO-Accelerator)

Boosts I/O performancew/o modifying Apps

Exploits SSD as a shared cache of persistent filesystem (PFS)

Provides two kinds of temporary filesystems for I/O optimization

Shared/Local temporary filesystem

Persistent filesystem (FEFS)

IO Network

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

CN

SSD

CN

CN

CN

CN

SSD

CN

CN

CN

CN

SSD

CN

CN

CN

CN

SSD

CN

CN

CN

CN

Compute node (CN) cluster

LLIO

data

POSI

X A

PI /

MPI

-IO

Application

Au

tom

ated

cac

hin

g

data

acce

ss t

o P

FS

Tran

spar

ent

7

Page 9: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Copyright 2018 FUJITSU LIMITED

Post-K Computer Current Status

CPU powered-on, OS running

System design verification and testing are underway

Preliminary performance evaluation started

Development Proceeding on Schedule8

Page 10: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

Post-K computer’s high-density mounting achieves over 1 PFlops per rack

Post-K Computer Hardware Features

Copyright 2018 FUJITSU LIMITED

Shelf24 CMUs/Shelf

(48 Nodes)

Rack8 Shelves/Rack

(384 Nodes)

1CPU/Node CMU (CPU Memory Unit)2 Nodes/CMU

9

Page 11: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

(Cont.) Post-K Computer Hardware Features

Copyright 2018 FUJITSU LIMITED

• High-efficiency water cooling uniton the CPU memory unit (CMU)provides 100% water-cooling

Conventional connection

• Single action connection of electric connectors and water couplers achieve compact CMU

CMU

High-density mounting, shortened transmission distance between CPUs

High-efficiency water cooling unit

Electric connectors and water couplers

• The back-to-back layout and cable box shorten the cables length

Post-K computerconnection

Cable box

10

Page 12: Post-K Computer Development, Updates for SC 18 · Title: Post-K Computer Development, Updates for SC 18 Author: Fujitsu Limited Created Date: 11/28/2018 2:08:28 PM

11