Evaluating Orthogonalitybetween Application Auto tuning ...

53
Evaluating Orthogonality between Application Autotuning and RunTime Resource Management for Adaptive OpenCL Applications Edoardo Paone, Davide Gadioli, Gianluca Palermo, Vittorio Zaccaria, Cristina Silvano Politecnico di Milano

Transcript of Evaluating Orthogonalitybetween Application Auto tuning ...

Page 1: Evaluating Orthogonalitybetween Application Auto tuning ...

Evaluating Orthogonality between Application Auto‐tuning and Run‐Time Resource Management for Adaptive OpenCL ApplicationsEdoardo Paone, Davide Gadioli, Gianluca Palermo, Vittorio Zaccaria, Cristina SilvanoPolitecnico di Milano

Page 2: Evaluating Orthogonalitybetween Application Auto tuning ...

Computer Architecture Evolution

“The number of transistors incorporated in a chip will approximately double every two years” – Gordon Moore, Intel co-founder

Time

2

80863um

3861.5um

Pentium 40.18um

Core2 Duo65nm

Nehalem45nm

Page 3: Evaluating Orthogonalitybetween Application Auto tuning ...

80863um

3861.5um

Pentium 40.8um

Core2 Duo65nm

Nehalem45nm

“Moore’s Law” on Performance

1987 2003 2011 2020

Perf

orm

ance

4

Page 4: Evaluating Orthogonalitybetween Application Auto tuning ...

80863um

3861.5um

Pentium 40.8um

Core2 Duo65nm

Nehalem45nm

“Moore’s Law” on Performance

1987 2003 2011 2020

Perf

orm

ance

The Golden Era:- Single-processor

- 1st Power Wall

3

Page 5: Evaluating Orthogonalitybetween Application Auto tuning ...

80863um

3861.5um

Pentium 40.8um

Core2 Duo65nm

Nehalem45nm

“Moore’s Law” on Performance

1987 2003 2011 2020

Perf

orm

ance

The Multicore Era:- 2 to 16 cores- On-chip shared LL$

- Programmability challenge

3

Page 6: Evaluating Orthogonalitybetween Application Auto tuning ...

80863um

3861.5um

Pentium 40.8um

Core2 Duo65nm

Nehalem45nm

“Moore’s Law” on Performance

1987 2003 2011 2020

Perf

orm

ance

The Manycore Era:- Larger # of cores- Networks on-Chip

- Programmability challenge + Dynamic Resource Management

?

3

Page 7: Evaluating Orthogonalitybetween Application Auto tuning ...

Main Idea 

In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning

Approximate computing

In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning

Approximate computing

Target Platforms

4

Page 8: Evaluating Orthogonalitybetween Application Auto tuning ...

Main Idea 

In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning

Approximate computing

In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning

Approximate computing

Target Platforms

4

Multicore PlatformMulticore Platform

Page 9: Evaluating Orthogonalitybetween Application Auto tuning ...

Run‐Time Resource Management

Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. Mapping on multi/many‐core systems: survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC). 2013. 

RTRM

5

App1 App2 App3

Target Platform

Page 10: Evaluating Orthogonalitybetween Application Auto tuning ...

RTRM ‐ Overview

6

RTRM

App1 App2 App3

Accounting

Mapping

Target Platform

Page 11: Evaluating Orthogonalitybetween Application Auto tuning ...

RTRM ‐ Overview

Resource accounting phase grants resources to critical workloads while optimize resource usage by best‐effort workloads 

6

RTRM

App1 App2 App3

Accounting

Mapping

44 44 66

Target Platform

Page 12: Evaluating Orthogonalitybetween Application Auto tuning ...

RTRM ‐ Overview

Resource accounting phase grants resources to critical workloads while optimize resource usage by best‐effort workloads 

6

RTRM

App1 App2 App3

Accounting

Mapping

Mapping phase maps virtual resources on physical resources

to achieve optimal platform usage to handle run‐time variations

44 44 66

Target Platform

Page 13: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters

7

Page 14: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters

7

Page 15: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters

Parameters:• Color

Parameters:• Color

7

Page 16: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters

Parameters:• Color • Shape

Parameters:• Color • Shape

7

Page 17: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters

Parameters:• Color • Shape

• Size

Parameters:• Color • Shape

• Size

7

Page 18: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning

Run-Time Knobs

Key idea is that most of the applications are configurable thanks to a set of parameters

7

Page 19: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

8

QoR

Performance

Page 20: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

Autonomous Video-surveillance

System

8

Page 21: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

Video Frame Rate

Video Resolution

Autonomous Video-surveillance

System

8

Page 22: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

Video Frame Rate

Video Resolution

Autonomous Video-surveillance

System

8

Page 23: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

Video Frame Rate

Video Resolution

Autonomous Video-surveillance

System

8

Page 24: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

Video Frame Rate

Video Resolution

Autonomous Video-surveillance

System

8

Page 25: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

Video Frame Rate

Video Resolution

Autonomous Video-surveillance

System

8

Page 26: Evaluating Orthogonalitybetween Application Auto tuning ...

Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance

Video Frame Rate

Video Resolution

Autonomous Video-surveillance

System

8

Page 27: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning Framework

9

Page 28: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning Framework

Execution LoopExecution Loop

9

Page 29: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning Framework

MonitoringMonitoring

9

Page 30: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning Framework

Re-ConfigureRe-Configure

9

Page 31: Evaluating Orthogonalitybetween Application Auto tuning ...

Target HW Platform

Orthogonality Concept

Platform OS

Req

uests

Res

ourc

es

10

Page 32: Evaluating Orthogonalitybetween Application Auto tuning ...

Target HW Platform

Orthogonality Concept

Platform OS

Req

uests

Res

ourc

es

Exploitation of OpenCLDevice Fission to limit

resource requests

Exploitation of OpenCLDevice Fission to limit

resource requests

10

Page 33: Evaluating Orthogonalitybetween Application Auto tuning ...

Run-Time Resource Manager

Orthogonality Concept

Platform OS

Target HW PlatformReq

uests

Res

ourc

es

Exploitation of OpenCLDevice Fission to limit

resource requests

Exploitation of OpenCLDevice Fission to limit

resource requests

10

Page 34: Evaluating Orthogonalitybetween Application Auto tuning ...

The Multi‐View Case Study

2 eyes = 3 dimensions

11

Page 35: Evaluating Orthogonalitybetween Application Auto tuning ...

Implementation 1

PR PL QR QL

P

Q

CAM LEFT

CAM RIGHT

PL

PR

QR

QL

DP

DQ

1 Ke Zhang, Jiangbo Lu, and Gauthier Lafruit, “Cross-Based Local Stereo Matching Using Orthogonal Integral Images”,IEEE Transactions On Circuits and Systems For Video Technology, Vol. 19, No. 7, July 2009

1 Ke Zhang, Jiangbo Lu, and Gauthier Lafruit, “Cross-Based Local Stereo Matching Using Orthogonal Integral Images”,IEEE Transactions On Circuits and Systems For Video Technology, Vol. 19, No. 7, July 2009

CAM LEFT CAM RIGHT

12

Page 36: Evaluating Orthogonalitybetween Application Auto tuning ...

Pixel disparity

Left camera Right camera

reference disparity

36

13

Page 37: Evaluating Orthogonalitybetween Application Auto tuning ...

Pixel disparity

Left camera Right camera

reference disparity

1

2

3

QoRDisparity

Error

QoRDisparity

Error

13

5 Application Knobs

Page 38: Evaluating Orthogonalitybetween Application Auto tuning ...

Experimental SetupTarget Platform 

AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD

14

Page 39: Evaluating Orthogonalitybetween Application Auto tuning ...

Experimental SetupTarget Platform 

AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD

Workload Definition:Single application – multiple instancesDynamic workload in terms of start time,amount of data to process, frame‐rate goal

14

Page 40: Evaluating Orthogonalitybetween Application Auto tuning ...

Experimental SetupTarget Platform 

AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD

Workload Definition:Single application – multiple instancesDynamic workload in terms of start time,amount of data to process, frame‐rate goal

Evaluation MetricsNormalized Actual Penalty (Performance/Quality metric)• User satisfaction in terms of Application Frame‐Rate

Normalized Application Error (Quality metric) • User satisfaction in terms of quality of the resulting image (1/QoR)

Difference w.r.t. off‐line profiling (Predictability metric)14

Page 41: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning Effects

15

Page 42: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning Effects

15

Page 43: Evaluating Orthogonalitybetween Application Auto tuning ...

Application Auto‐Tuning Effects

15

Page 44: Evaluating Orthogonalitybetween Application Auto tuning ...

Comparative AnalysisApplication Auto-tuning

Run-T

ime

Res

ourc

e M

anag

emen

tOFF ON

OFF

ON

PLAIN-LINUX ADAPTIVE-LINUX

PLAIN-RTRM ADAPTIVE-RTRM

(No Device Fission)

16

Page 45: Evaluating Orthogonalitybetween Application Auto tuning ...

Run‐Time Results

Missed Deadlines

Missed Deadlines

Design-Time Vs Run-Time

Profiling

Design-Time Vs Run-Time

Profiling

QoRDisparity

Error

QoRDisparity

Error

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17

Page 46: Evaluating Orthogonalitybetween Application Auto tuning ...

Run‐Time Results

Missed Deadlines

Missed Deadlines

Design-Time Vs Run-Time

Profiling

Design-Time Vs Run-Time

Profiling

QoRDisparity

Error

QoRDisparity

Error

- - ++

- + -+

- - ++

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17

Page 47: Evaluating Orthogonalitybetween Application Auto tuning ...

Run‐Time Results

Missed Deadlines

Missed Deadlines

Design-Time Vs Run-Time

Profiling

Design-Time Vs Run-Time

Profiling

QoRDisparity

Error

QoRDisparity

Error

- - ++

- + -+

- - ++

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17

Page 48: Evaluating Orthogonalitybetween Application Auto tuning ...

Run‐Time Results

Missed Deadlines

Missed Deadlines

Design-Time Vs Run-Time

Profiling

Design-Time Vs Run-Time

Profiling

QoRDisparity

Error

QoRDisparity

Error

- - ++

- + -+

- - ++

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17

Page 49: Evaluating Orthogonalitybetween Application Auto tuning ...

Run‐Time Results

Missed Deadlines

Missed Deadlines

Design-Time Vs Run-Time

Profiling

Design-Time Vs Run-Time

Profiling

QoRDisparity

Error

QoRDisparity

Error

- - + +

- + - +

- - + +

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17

Page 50: Evaluating Orthogonalitybetween Application Auto tuning ...

Run‐Time Results

18

APPSAPPS

Page 51: Evaluating Orthogonalitybetween Application Auto tuning ...

Resource‐Aware AS‐RTM

Target HW Platform

Platform OS

Req

uests

Res

ourc

esResource

Availability

19

Page 52: Evaluating Orthogonalitybetween Application Auto tuning ...

Resource‐Aware AS‐RTM

Target HW Platform

Platform OS

Req

uests

Res

ourc

esResource

Availability

19

Page 53: Evaluating Orthogonalitybetween Application Auto tuning ...

ConclusionsWe considered the problem of managing multiple OpenCL applications for server consolidation on multi‐core platforms 

We implemented an approach exploiting run‐time management frameworks operating both at application level or at OS/resource level

Analysis of results:Auto‐tuning is necessary to modulate performance and QoRResource‐awareness is needed for predictability by means of resource isolation (RTRM) or simple monitor (RA‐AS‐RTM)

20