Evaluating Orthogonalitybetween Application Auto tuning ...
Transcript of Evaluating Orthogonalitybetween Application Auto tuning ...
Evaluating Orthogonality between Application Auto‐tuning and Run‐Time Resource Management for Adaptive OpenCL ApplicationsEdoardo Paone, Davide Gadioli, Gianluca Palermo, Vittorio Zaccaria, Cristina SilvanoPolitecnico di Milano
Computer Architecture Evolution
“The number of transistors incorporated in a chip will approximately double every two years” – Gordon Moore, Intel co-founder
Time
2
80863um
3861.5um
Pentium 40.18um
Core2 Duo65nm
Nehalem45nm
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
4
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
The Golden Era:- Single-processor
- 1st Power Wall
3
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
The Multicore Era:- 2 to 16 cores- On-chip shared LL$
- Programmability challenge
3
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
The Manycore Era:- Larger # of cores- Networks on-Chip
- Programmability challenge + Dynamic Resource Management
?
3
Main Idea
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
…
Target Platforms
4
Main Idea
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
…
Target Platforms
4
Multicore PlatformMulticore Platform
Run‐Time Resource Management
Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. Mapping on multi/many‐core systems: survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC). 2013.
RTRM
5
App1 App2 App3
Target Platform
RTRM ‐ Overview
6
RTRM
App1 App2 App3
Accounting
Mapping
Target Platform
RTRM ‐ Overview
Resource accounting phase grants resources to critical workloads while optimize resource usage by best‐effort workloads
6
RTRM
App1 App2 App3
Accounting
Mapping
44 44 66
Target Platform
RTRM ‐ Overview
Resource accounting phase grants resources to critical workloads while optimize resource usage by best‐effort workloads
6
RTRM
App1 App2 App3
Accounting
Mapping
Mapping phase maps virtual resources on physical resources
to achieve optimal platform usage to handle run‐time variations
44 44 66
Target Platform
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
7
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
7
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
Parameters:• Color
Parameters:• Color
7
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
Parameters:• Color • Shape
Parameters:• Color • Shape
7
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
Parameters:• Color • Shape
• Size
Parameters:• Color • Shape
• Size
7
Application Auto‐Tuning
Run-Time Knobs
Key idea is that most of the applications are configurable thanks to a set of parameters
7
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
8
QoR
Performance
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Autonomous Video-surveillance
System
8
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
Application Auto‐Tuning Framework
9
Application Auto‐Tuning Framework
Execution LoopExecution Loop
9
Application Auto‐Tuning Framework
MonitoringMonitoring
9
Application Auto‐Tuning Framework
Re-ConfigureRe-Configure
9
Target HW Platform
Orthogonality Concept
Platform OS
Req
uests
Res
ourc
es
10
Target HW Platform
Orthogonality Concept
Platform OS
Req
uests
Res
ourc
es
Exploitation of OpenCLDevice Fission to limit
resource requests
Exploitation of OpenCLDevice Fission to limit
resource requests
10
Run-Time Resource Manager
Orthogonality Concept
Platform OS
Target HW PlatformReq
uests
Res
ourc
es
Exploitation of OpenCLDevice Fission to limit
resource requests
Exploitation of OpenCLDevice Fission to limit
resource requests
10
The Multi‐View Case Study
2 eyes = 3 dimensions
11
Implementation 1
PR PL QR QL
P
Q
CAM LEFT
CAM RIGHT
PL
PR
QR
QL
DP
DQ
1 Ke Zhang, Jiangbo Lu, and Gauthier Lafruit, “Cross-Based Local Stereo Matching Using Orthogonal Integral Images”,IEEE Transactions On Circuits and Systems For Video Technology, Vol. 19, No. 7, July 2009
1 Ke Zhang, Jiangbo Lu, and Gauthier Lafruit, “Cross-Based Local Stereo Matching Using Orthogonal Integral Images”,IEEE Transactions On Circuits and Systems For Video Technology, Vol. 19, No. 7, July 2009
CAM LEFT CAM RIGHT
12
Pixel disparity
Left camera Right camera
reference disparity
36
13
Pixel disparity
Left camera Right camera
reference disparity
1
2
3
QoRDisparity
Error
QoRDisparity
Error
13
5 Application Knobs
Experimental SetupTarget Platform
AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD
14
Experimental SetupTarget Platform
AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD
Workload Definition:Single application – multiple instancesDynamic workload in terms of start time,amount of data to process, frame‐rate goal
14
Experimental SetupTarget Platform
AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD
Workload Definition:Single application – multiple instancesDynamic workload in terms of start time,amount of data to process, frame‐rate goal
Evaluation MetricsNormalized Actual Penalty (Performance/Quality metric)• User satisfaction in terms of Application Frame‐Rate
Normalized Application Error (Quality metric) • User satisfaction in terms of quality of the resulting image (1/QoR)
Difference w.r.t. off‐line profiling (Predictability metric)14
Application Auto‐Tuning Effects
15
Application Auto‐Tuning Effects
15
Application Auto‐Tuning Effects
15
Comparative AnalysisApplication Auto-tuning
Run-T
ime
Res
ourc
e M
anag
emen
tOFF ON
OFF
ON
PLAIN-LINUX ADAPTIVE-LINUX
PLAIN-RTRM ADAPTIVE-RTRM
(No Device Fission)
16
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - ++
- + -+
- - ++
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - ++
- + -+
- - ++
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - ++
- + -+
- - ++
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - + +
- + - +
- - + +
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
Run‐Time Results
18
APPSAPPS
Resource‐Aware AS‐RTM
Target HW Platform
Platform OS
Req
uests
Res
ourc
esResource
Availability
19
Resource‐Aware AS‐RTM
Target HW Platform
Platform OS
Req
uests
Res
ourc
esResource
Availability
19
ConclusionsWe considered the problem of managing multiple OpenCL applications for server consolidation on multi‐core platforms
We implemented an approach exploiting run‐time management frameworks operating both at application level or at OS/resource level
Analysis of results:Auto‐tuning is necessary to modulate performance and QoRResource‐awareness is needed for predictability by means of resource isolation (RTRM) or simple monitor (RA‐AS‐RTM)
20