Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based...

38
fakultät für informatik informatik 12 technische universität dortmund Rechnerarchitektur (RA) Sommersemester 2016 Architecture-Aware Optimizations - Hardware-Software co-Optimizations- Jian-Jia Chen Informatik 12 Jian-jia.chen@tu-.. http://ls12-www.cs.tu-dortmund.de/daes/ Tel.: 0231 755 6078 2016/07/19

Transcript of Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based...

Page 1: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

fakultät für informatik informatik 12

technische universität dortmund

Rechnerarchitektur (RA)

Sommersemester 2016

Architecture-Aware Optimizations - Hardware-Software co-Optimizations-

Jian-Jia Chen Informatik 12 Jian-jia.chen@tu-.. http://ls12-www.cs.tu-dortmund.de/daes/ Tel.: 0231 755 6078

2016/07/19

Page 2: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 2 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Outline

Heterogeneous System Architecture (HSA) §  Integrated CPU/GPU platforms §  Recent movement in chip designs

Architecture-aware software designs §  Energy-efficiency issues §  Darkroom §  Halide

Multicore revolutions §  Impact on the “safety-critical” industry sector

Page 3: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 3 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

What is Heterogeneous Computing?

Use processor cores with various type/computing power to achieve better performance/power efficiency

http://www.browndeertechnology.com/docs/BDT_OpenCL_Tutorial_NBody-rev3.html

Tasks

CPU

GPU

Page 4: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 4 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Advantage of Heterogeneous Computing

CPU is ideal for scalar processing §  Out of order x86 cores

with low latency memory access

§  Optimized for sequential and branching algorithms

§  Runs existing applications very well

Serial/Task-parallel workloads → CPU

GPU is ideal for parallel processing §  GPU shaders optimized

for throughput computing §  Ready for emerging

workloads §  Media processing,

simulation, natural UI, etc.

Graphics/Data-parallel workloads → GPU

Heterogeneous Computing -> Fusion, Norm Rubin, SAAHPC 2010

Page 5: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 5 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

CPU/GPU Integration: CPU’s Advancement Meets GPU’s

Throughput Performance

Pro

gram

mab

ility

Single-Thread Era

Multi-Core Era

Heterogeneous Systems Era

Graphics Driver-based

programs

Vertex/Pixel

Shader

System-Level Progammable

Una

ccep

tabl

e E

xper

ts O

nly

Mai

nstre

am

High Performance Task Parallel Execution

Power-efficient Data Parallel

Execution

CPU/GPU Integration

Heterogeneous Computing

Microprocessor Advancement

GP

U A

dvan

cem

ent

Page 6: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 6 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

User Space

Evolution of Heterogeneous Computing

Dedicated GPU §  GPU kernel is launched through the device driver §  Separate CPU/GPU address space §  Separate system/GPU memory §  Data copy between CPU/GPU via PCIe

Core1 Core2 CoreN …

System memory (coherent)

CU1 CU2 CUN …

GPU memory (Non-coherent)

L1/L2 L1/L2 L1/L2 L1 L1 L1

LLC L2

CPU GPU

PCIe

Address space managed by OS

Address space managed by driver OpenCL

Application OpenCL

Runtime Library

Kernel Space

GPU Device Driver

= kernel launch process

data preparation computation

Page 7: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 7 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Evolution of Heterogeneous Computing

Integrated GPU architecture §  GPU kernel is launched through the device driver §  Separate CPU/GPU address space §  Separate system/GPU memory §  Data copy between CPU/GPU via memory bus

Core1 Core2 CoreN …

System memory (coherent)

CU1 CU2 CUN …

GPU memory (Non-coherent)

L1/L2 L1/L2 L1/L2 L1 L1 L1

LLC L2

CPU GPU

PCIe

Address space managed by OS

Address space managed by driver

System memory (coherent)

GPU memory (Non-coherent)

User Space

OpenCL Application

OpenCL Runtime Library

Kernel Space

GPU Device Driver

= kernel launch process

data preparation Computation

Page 8: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 8 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Evolution of Heterogeneous Computing

Integrated CPU/GPU architecture §  GPU kernel is launched through the device driver §  Unified CPU/GPU address space (managed by OS) §  Unified system/GPU memory §  No data copy - data can be retrieved by pointer passing

Core1 Core2 CoreN … CU1 CU2 CUN … L1/L2 L1/L2 L1/L2 L1 L1 L1

LLC L2

CPU GPU

Address space managed by OS

Address space managed by driver

System memory (coherent)

GPU memory (Non-coherent)

L2

LLC

Coherent system memory

Address space managed by OS

User Space

OpenCL Application

OpenCL Runtime Library

Kernel Space

GPU Device Driver

= kernel launch process

data preparation Computation

Page 9: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 9 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Utopia World of Heterogeneous Computing

Processors are architected to operate cooperatively §  Tasks in an application are executed on different types of core §  Unified coherent memory enables data sharing across all processors

Designed to enable the applications to run on different processors at different time §  Capability to translate from high-level language to target binary at run-

time §  User-level task dispatch §  Decision making module

Core1 Core2 CoreN

Coherent system memory

CU1 CU2 CUN … L1/L2 L1/L2 L1/L2 L1 L1 L1

LLC

HW Coherence

Application Task 1 Task 2 Task 3

L2

Task 1 Task 2 Task 3 Task 1 Task 2 Task 3

Page 10: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 10 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

HSA Foundation

Founded in June 2012 Developing a new platform for heterogeneous systems www.hsafoundation.com Specifications under development in working groups to define the platform Membership consists of 43 companies and 16 universities Adding 1-2 new members each month

Page 11: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 11 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Diverse Partners Driving Future of Heterogeneous Computing

Founders

Promoters

Supporters

Contributors

Needs Updating – Add Toshiba Logo

Page 12: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 12 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Application

HSA Intermediate Language (HSAIL)

HSA Run-time ………

Native machine codes

User-level SW

HW

Agent Scheduler

CPU Finalizer

GPU Finalizer

OpenCL Compiler

Task 1

Task 2

Task 3

HSA (Heterogeneous System Architecture) Hardware-Software Stack

Core 1

Shared virtual address space

Core 2 …… CU 1 CU 2 …… AC 1 AC 2 ……

Task 1 Task 2 Task 3

Page 13: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 13 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Intel Haswell

Page 14: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 14 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

AMD Kaveri APU

Page 15: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 15 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

NVIDIA Tegra K1

Page 16: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 16 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Qualcomm Snapdragon

Page 17: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 17 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Outline

Heterogeneous System Architecture (HSA) §  Integrated CPU/GPU platforms §  Recent movement in chip designs

Architecture-aware software designs §  Energy-efficiency issues §  Darkroom §  Halide

Multicore revolutions §  Impact on the “safety-critical” industry sector

Page 18: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 18 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Energy Efficiency of different

target platforms

© Hugo De Man, IMEC, Philips, 2007

Page 19: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 19 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Signal Processing ASICs

Page 20: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 20 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

How about Memory?

© Horowitz, DAC 2016

Page 21: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 21 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Processor Energy with Corrected Cache Sizes

© Horowitz, DAC 2016

Page 22: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 22 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Processor Energy Breakdown

© Horowitz, DAC 2016

Page 23: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 23 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Data Center Energy Specs

© Malladi, ISCA 2012

Page 24: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 24 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

ITRS Power Consumption Projection -Station Systems -

© ITRS, 2010

Page 25: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 25 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

What Is Going On Here?

Page 26: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 26 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Energy Consumption (Approximate, 45nm)

© Horowitz, DAC 2016

Page 27: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 27 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

True Story

It’s more about the algorithm than the hardware The efficiency cannot be achieved unless the algorithm is right!!

(all) Algorithms

GPU Alg.

Page 28: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 28 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Locality, Locality, and Locality!!!

© Hegarty et al. , SIGGraph 2014

Page 29: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 29 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Darkroom (Stanford/MIT)

© Hegarty et al. , SIGGraph 2014

Page 30: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 30 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Locality versus Parallelism

Halide Programming Language: §  http://halide-lang.org/

Performance needs a lot of tradeoffs §  Locality §  Parallelism §  Redundant recomputation

Page 31: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 31 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Outline

Heterogeneous System Architecture (HSA) §  Integrated CPU/GPU platforms §  Recent movement in chip designs

Architecture-aware software designs §  Energy-efficiency issues §  Darkroom §  Halide

Multicore revolutions §  Impact on the “safety-critical” industry sector

Page 32: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 32 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Automotive Software

© Hamann, Kramer, Ziegenbein, Lukasiewycz (Bosch), 2016

Page 33: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 33 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Assessment of Multi-Core Worst-Case Execution Behavior

© Hamann, Kramer, Ziegenbein, Lukasiewycz (Bosch), 2016

Page 34: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 34 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Multi-Core Memory Access Models

© Hamann, Kramer, Ziegenbein, Lukasiewycz (Bosch), 2016

Page 35: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 35 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

An Industrial Challenge (FMTV 2016)

Precise analysis of worst-case end-to-end latencies §  mainly due to different involved periods and time

domains §  What is the effect on memory layout and interconnect on

the execution times? § Automatic optimized application and data

mapping §  Evaluation of digital (multi-core) execution platforms §  Evaluation of software growth scenarios

© Hamann, Kramer, Ziegenbein, Lukasiewycz (Bosch), 2016

Page 36: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 36 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

MPPA-256 Processor Architecture (Kalray)

Kalray, 2016

Page 37: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 37 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

MPPA-256 NoC

Page 38: Rechnerarchitektur (RA) · Era Multi-Core Era Heterogeneous Systems Era Graphics Driver-based programs Vertex/ Pixel ... Developing a new platform for heterogeneous systems ... MPPA-256

- 38 - technische universität dortmund

fakultät für informatik

© j. chen, informatik 12, 2016

Safety-Critical Systems with Multicore Platforms

Goal: deploy multi-core processors for safety-critical real-time applications (avionics, automotive,…) Problem: concurrent use of shared resources (e.g. interconnect, main memory) §  unknown access latency for a concrete resource access §  complicated timing analysis §  hardware platforms may not be predictable §  Many features are designed by computer architects for

average cases only

Solution? §  Maybe it is up to you. §  Did you see the above challenges?