Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K....

28
Future of Green Multicore Computing Hironori Kasahara IEEE Computer Society President Elect 2017, President 2018 Professor, Dept. of Computer Science & Engineering Director, Advanced Multicore Processor Research Institute Waseda University, Tokyo, Japan URL: http://www.kasahara.cs.waseda.ac.jp/ Waseda Univ. Green Computing R&D Center

Transcript of Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K....

Page 1: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Future of Green Multicore Computing

Hironori KasaharaIEEE Computer Society

President Elect 2017, President 2018 Professor, Dept. of Computer Science & Engineering

Director, Advanced Multicore Processor Research InstituteWaseda University, Tokyo, JapanURL: http://www.kasahara.cs.waseda.ac.jp/

Waseda Univ. Green Computing R&D Center

Page 2: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

2

Page 3: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

3

IEEE Computer Society60,000+ members,   volunteer‐led organization,  200 technical conferences,   industry‐oriented "Rock Stars" , 17 scholarly journals and 13 magazines,   awards program,Digital Library with more than 550,000 articles and papers , 400 local and regional chapters,    40 technical committees,

Page 4: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

IEEE Computer Society BoG(Board of Governors) Feb.1, 2017

Page 5: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Toward 20181. Refining content and services to further improve the satisfaction of CS

members; 2. Considering an incentive for volunteers to further accelerate CS activities

and promptly provide technical benefits for people around the globe; To express appreciation to volunteers: CS Point (Mileage) System: Annual & Life Time Honor, Premier Seating, Premier Registration, Distinguished Reviewer, etc

3. Offering more attractive services for practitioners in industry; 4. Providing the world’s best educational content and historical treasures for

future generations, which only the CS can create with our pioneering researchers (for example, the Multicore Compiler Video Series found at www.computer.org/web/education/multicore-video-series);

5. Thinking about sustainable membership fees while considering the diversity of economic situations within the 10 regions;

6. Cooperating with other IEEE societies and sister societies in a timely and efficient manner;

7. Intelligibly introducing the latest computer-related technologies to younger generations, including children, so that they can realize their technological dreams.

5

Page 6: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Core#2 Core#3

Core#1

Core#4 Core#5

Core#6 Core#7

SNC

0SN

C1

DBSC

DDRPADGCPGC

SM

LB

SC

SHWY

URAMDLRAM

Core#0ILRAM

D$

I$

VSWC

Multicores for Performance and Low Power

IEEE ISSCC08: Paper No. 4.5, M.ITO, … and H. Kasahara,

“An 8640 MIPS SoC with Independent Power-off Control of 8 CPUs and 8 RAMs by an Automatic

Parallelizing Compiler”

Power ∝ Frequency * Voltage2

(Voltage ∝ Frequency)Power ∝ Frequency3

If Frequency is reduced to 1/4(Ex. 4GHz1GHz),

Power is reduced to 1/64 and Performance falls down to 1/4 .<Multicores>If 8cores are integrated on a chip,Power is still 1/8 and Performance becomes 2 times.

6

Power consumption is one of the biggest problems for performance scaling from smartphones to cloud servers and supercomputers (“K” more than 10MW) .

Page 7: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

With 128 cores, OSCAR compiler gave us 100 times speedup against 1 core execution and 211 times speedup against 1 core using Sun (Oracle) Studio compiler. 7

Earthquake Simulation “GMS” on Fujitsu M9000 Sparc CC-NUMA Server

Page 8: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Power Reduction of MPEG2 Decoding to 1/4 on 8 Core Homogeneous Multicore RP-2

by OSCAR Parallelizing Compiler

Avg. Power5.73 [W]

Avg. Power1.52 [W]

73.5% Power Reduction8

MPEG2 Decoding with 8 CPU cores

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

7

Without Power Control(Voltage:1.4V)

With Power Control (Frequency, Resume Standby: Power shutdown & Voltage lowering 1.4V-1.0V)

Page 9: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Demo of NEDO Multicore for Real Time Consumer Electronicsat the Council of Science and Engineering Policy on April 10, 2008

CSTP MembersPrime Minister: Mr. Y. FUKUDAMinister of State for Science, Technology and Innovation Policy:Mr. F. KISHIDAChief Cabinet Secretary: Mr. N. MACHIMURAMinister of Internal Affairs and Communications :Mr. H. MASUDAMinister of Finance :Mr. F. NUKAGAMinister of Education, Culture, Sports, Science and Technology: Mr. K. TOKAIMinister of Economy,Trade and Industry: Mr. A. AMARI

Page 10: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

<R & D Target>Hardware, Software, Application for Super Low-Power Manycore More than 64 coresNatural air cooling (No fan)

Cool, Compact, Clear, QuietOperational by Solar Panel<Industry, Government, Academia>Hitachi, Fujitsu, NEC, Renesas, Olympus,Toyota, Denso, Mitsubishi, Toshiba, OSCAR Technology, etc<Ripple Effect>Low CO2 (Carbon Dioxide) EmissionsCreation Value Added ProductsAutomobiles, Medical, IoT, Servers

Green Computing Systems R&D CenterWaseda University

Supported by METI (Mar. 2011 Completion)

Beside Subway Waseda Station,Near Waseda Univ. Main Campus

10

Hitachi SR16000:Power7 128coreSMP

Fujitsu M9000SPARC VII 256 core SMP

Page 11: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

To improve effective performance, cost-performance and software productivity and reduce power

OSCAR Parallelizing Compiler

Multigrain Parallelizationcoarse-grain parallelism among loops and subroutines, near fine grain parallelism among statements in addition to loop parallelism

Data LocalizationAutomatic data management fordistributed shared memory, cacheand local memory

Data Transfer OverlappingData transfer overlapping using DataTransfer Controllers (DMAs)

Power ReductionReduction of consumed power bycompiler control DVFS and Powergating with hardware supports.

1

23 45

6 7 8910 1112

1314 15 16

1718 19 2021 22

2324 25 26

2728 29 3031 32

33Data Localization Group

dlg0dlg3dlg1 dlg2

Page 12: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Low Power Heterogeneous Multicore Code

GenerationAPI

Analyzer(Available

from Waseda)

Existing sequential compiler

Multicore Program Development Using OSCAR API V2.0Sequential Application

Program in Fortran or C(Consumer Electronics, Automobiles, Medical, Scientific computation, etc.)

Low Power Homogeneous Multicore Code

GenerationAPI

AnalyzerExisting

sequential compiler

Proc0

Thread 0

Code with directives

Waseda OSCARParallelizing Compiler

Coarse grain task parallelization

Data Localization DMAC data transfer Power reduction using

DVFS, Clock/ Power gating

Proc1

Thread 1

Code with directives

Parallelized API F or C program

OSCAR API for Homogeneous and/or Heterogeneous Multicores and manycoresDirectives for thread generation, memory,

data transfer using DMA, power managements

Generation of parallel machine

codes using sequential compilers

Exe

cuta

ble

on v

ario

us m

ultic

ores

OSCAR: Optimally Scheduled Advanced MultiprocessorAPI: Application Program Interface

HomegeneousMulticore s

from Vendor A(SMP servers)

Server Code GenerationOpenMP Compiler

Shred memory servers

HeterogeneousMulticores

from Vendor B

Hitachi, Renesas, NEC, Fujitsu, Toshiba, Denso, Olympus, Mitsubishi, Esol, Cats, Gaio, 3 univ.

Accelerator 1Code

Accelerator 2Code

Hom

ogen

eous

Accelerator Compiler/ User Add “hint” directives

before a loop or a function to specify it is executable by

the accelerator with how many clocks

Het

ero

Manual parallelization / power reduction

Page 13: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

8.9times speedup by 12 processorsIntel Xeon X5670 2.93GHz 12 core SMP (Hitachi HA8000)

55 times speedup by 64 processorsIBM Power 7 64 core SMP

(Hitachi SR16000)

National Institute of Radiological Sciences (NIRS)

Cancer Treatment Carbon Ion Radiotherapy

(Previous best was 2.5 times speedup on 16 processors with hand optimization)

Page 14: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Engine Control by multicore with Denso

14

Though so far parallel processing of the engine control on multicore has been very difficult, Denso and Waseda succeeded 1.95 times speedup on 2core  V850 multicore processor.

1 core 2 cores

Hard real-time automobile engine

control by multicore

Page 15: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

OSCAR Compile Flow for Simulink Applications

15

Simulink model C code

Generate C codeusing Embedded Coder

OSCAR Compiler

(1) Generate MTG→ Parallelism 

(2) Generate gantt chart→ Scheduling in a multicore

(3) Generate parallelized C code    using the OSCAR API→ Multiplatform execution(Intel, ARM and SH etc)

Page 16: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

16

Road Tracking, Image Compression : http://www.mathworks.co.jp/jp/help/vision/examplesBuoy Detection :  http://www.mathworks.co.jp/matlabcentral/fileexchange/44706‐buoy‐detection‐using‐simulinkColor Edge Detection : http://www.mathworks.co.jp/matlabcentral/fileexchange/28114‐fast‐edges‐of‐a‐color‐image‐‐actual‐color‐‐not‐converting‐to‐grayscale‐/Vessel Detection : http://www.mathworks.co.jp/matlabcentral/fileexchange/24990‐retinal‐blood‐vessel‐extraction/

Speedups of MATLAB/Simulink Image Processing on Various 4core Multicores

(Intel Xeon, ARM Cortex A15 and Renesas SH4A)

Page 17: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Parallel Processing of Face Detection on Manycore, Highend and PC Server

17

OSCAR compiler gives us 11.55 times speedup for 16 cores against 1 core on SR16000 Power7 highend server.

1.00 1.72 

3.01 

5.74 

9.30 

1.00 1.93 

3.57 

6.46 

11.55 

1.00 1.93 

3.67 

6.46 

10.92 

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

1 2 4 8 16

速度

向上

コア数

速度向上率tilepro64 gcc

SR16k(Power7 8core*4cpu*4node) xlc

rs440(Intel Xeon 8core*4cpu) icc

Automatic Parallelization of Face Detection

Page 18: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

18

An Image of Static Schedule for Heterogeneous Multi-core with Data Transfer Overlapping and Power Control

TIME

Page 19: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

33 Times Speedup Using OSCAR Compiler and OSCAR API on RP-X

(Optical Flow with a hand-tuned library)

12.29 3.09

5.4

18.85

26.71

32.65

0

5

10

15

20

25

30

35

1SH 2SH 4SH 8SH 2SH+1FE 4SH+2FE 8SH+4FE

Speedu

ps against a single SH

 processor 

3.4[fps]

111[fps]

Page 20: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Power Reduction in a real-time execution controlled by OSCAR Compiler and OSCAR API on RP-X

(Optical Flow with a hand-tuned library)

Without Power Reduction With Power Reductionby OSCAR Compiler

Average:1.76[W] Average:0.54[W]

1cycle : 33[ms]→30[fps]

70% of power reduction

Page 21: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Low-Power Optimization with OSCAR API

21

MT1

VC0

MT2

MT4MT3

Sleep

VC1

Scheduled Resultby OSCAR Compiler void

main_VC0() {

MT1

voidmain_VC1() {

MT2

#pragma oscar fvcontrol ¥(1,(OSCAR_CPU(),100))

#pragma oscar fvcontrol ¥((OSCAR_CPU(),0))

Sleep

MT4MT3

} }

Generate Code Image by OSCAR Compiler

Page 22: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Automatic Power Reduction for MPEG2 Decode on Android Multicore

ODROID X2 ARM Cortex-A94cores

• On 3 cores, Automatic Power Reduction control successfully reduced power to 1/7 against without Power Reduction control.

• 3 cores with the compiler power reduction control reduced power to 1/3 against ordinary 1 core execution.

0.97

1.88

2.79

0.63 0.46 0.37

0.00

0.50

1.00

1.50

2.00

2.50

3.00

1 2 3

Power Con

sumption[W

]

No.  of Processor Cores

電力制御なし 電力制御ありWithout Power Reduction

With Power Reduction

2/3(‐35.0%)

1/4(‐75.5%)

1/7(‐86.7%)

1/3(‐61.9%)

http://www.youtube.com/channel/UCS43lNYEIkC8i_KIgFZYQBQ

22

Page 23: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Power Reduction on Intel Haswell for Real-time Optical Flow

Power was reduced to 1/4 (9.6W) by the compiler power optimization on the same 3 cores(41.6W).Power with 3 core was reduced to 1/3 (9.6W) against 1 core (29.3W) .

Power was reduced to 1/4  by compiler on 3 cores 

1/3  

For HD 720p(1280x720)  moving pictures15fps (Deadline66.6[ms/frame])

29.29

36.5941.58

24.17

12.21 9.60

05101520253035404550

1PE 2PE 3PEaverage po

wer con

sumption[W

]

number of PE

without power control with power control

Intel CPU Core i7 4770K

23

Page 24: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

64cores0.18[s]

1.00  1.96  3.95 7.86 

15.82 

30.79 

55.11 

0.00

10.00

20.00

30.00

40.00

50.00

60.00

1 2 4 8 16 32 64

Speed up

コア数

Speed‐ups on TILEPro64 Manycore

1core10.0[s]

Automatic Parallelization of JPEG-XR for Drinkable Inner Camera (Endo Capsule)

10 times more speedup needed after parallelization for 128 cores of Power 7. Less than 35mW power consumption is required.

Waseda U. & Olympus 24

TILEPro64

Ds

t0X4)

rt 1

al)

nal)

X4)

55 times speedup with 64 cores

Page 25: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Target:

Solar Powered with

compiler power reduction.

Fully automatic

parallelization and

vectorization including

local memory management

and data transfer.

OSCAR Vector Multicore and Compiler for Embedded to Severs with OSCAR Technology

Centralized Shared Memory

Compiler Co-designed Interconnection Network

Compiler co-designed Connection Network

On-chip Shared Memory

Multicore Chip

VectorData

TransferUnit

CPU

Local MemoryDistributed Shared Memory

Power Control Unit

Core

×4chips

Page 26: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Copyright 2008 FUJITSU LIMITED

26

Fujitsu VPP500/NWT: PE Unit CabinetCabinet (open)

Page 27: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

OSCAR TechnologyStarted up on Feb.28, 2013: Licensing the all patents and OSCAR compiler from Waseda Univ.

Copyright 2015 27

CEO: Dr. T. Ono (Ex- CEO of First Section-listed Company, VP of National Univ., Invited Prof. of Waseda U. )

Executives: Mr. T. Ito (Visiting Prof. Tokyo Agricult. and Eng. U.)Prof. K. Shirai (Ex-President of Waseda U

Chairman of Japanese Open Univ.)CTO: Mr. M. Takamura (Ex-Fellow Fujitsu Lab.,

Fujitsu VPP500, 5000 & NWT Development Leader)Mr. K. Ashida(Ex-VP Sumitomo Trading,

Ashida Consult. CEO, A leader of Business WorldAuditor: Dr. S. Matsuda(Prof. Emeritus Waseda U.

Ex-President Ventures and Entrepreneurs Society)Advisors: Dr. T. Sato(Patent Attorney, Ex-President of

Patent Attorneys Assoc., Gov. IP Committee)Ms. K. Ishiguro(Lawyer, Supreme Court Trainer)Mr. A. Fukuda (Leader of Alumni Assoc.)Prof. K. Kimura (Waseda Univ.)Prof. H. Kasahara(Waseda Univ.)

Fujitsu VPP5000

Page 28: Future of Green Multicore Computing...Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy,Trade and Industry: Mr. A. AMARI <R & D Target> Hardware,

Summary Waseda University Green Computing Systems R&D Center supported by METI has 

been researching on low‐power high performance Green Multicore hardware, software and application with government and industry including Hitachi, Fujitsu, NEC, Renesas, Denso, Toyota, Olympus and OSCAR Technology.

OSCAR Automatic Parallelizing and Power Reducing Compiler has succeeded speedup and/or power reduction of scientific applications including “Earthquake Wave Propagation”, medical applications including “Cancer Treatment Using Carbon Ion”, and “Drinkable Inner Camera”, industry application including “Automobile Engine Control”, “Smartphone”, and “Wireless communication Base Band Processing” on various multicores from different vendors including Intel, ARM, IBM, AMD, Qualcomm, Freescale, Renesas and Fujitsu.

In automatic parallelization, 110 times speedup for “Earthquake Wave Propagation Simulation” on 128 cores of IBM Power 7 against 1 core, 55 times speedup for “Carbon Ion Radiotherapy Cancer Treatment” on 64cores IBM Power7, 1.95 times for “Automobile Engine Control” on Renesas 2 cores using SH4A or V850, 55 times for “JPEG‐XR Encoding for Capsule Inner Cameras” on Tilera 64 cores Tile64 manycore.

The compiler will be available on market from OSCAR Technology. In automatic power reduction, consumed powers for real-time multi-media

applications like Human face detection, H.264, mpeg2 and optical flow were reduced to 1/2 or 1/3 using 3 cores of ARM Cortex A9 and Intel Haswell and 1/4 using Renesas SH4A 8 cores against ordinary single core execution.

28