AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...
Transcript of AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...
AMD ROCm GPU profiling in
Trace Compass
Arnaud Fiorini with Pr. Michel Dagenais
May 8th, 2020
Polytechnique Montreal
DORSAL Laboratory
POLYTECHNIQUE MONTREAL – Arnaud Fiorini 2
I. Introduction
1. GPU Development
2. Optimization Problems
II. Tracing and profiling of CPU-GPU systems
1. ROC Platform
2. Tracing GPUs
3. Profiling GPUs
Agenda
GPU Development - Introduction
33POLYTECHNIQUE MONTREAL – Arnaud Fiorini 3
• A few definitions :
• Kernel : A small piece of code executed on the device.
GPU Development - Introduction
44POLYTECHNIQUE MONTREAL – Arnaud Fiorini 4
• A few definitions :
• Kernel : A small piece of code executed on the device.• Heterogeneous system : system mixing multiple types of processors
CPU GPUFPGACPU DSP…
Unified Memory
GPU Development - Introduction
55POLYTECHNIQUE MONTREAL – Arnaud Fiorini 5
• SIMD Architecture :
Data pool
Instruction pool
APU
APU
APU
APU
GPU Development - Introduction
6
© 2019 AMD Corporation
6POLYTECHNIQUE MONTREAL – Arnaud Fiorini 6
Optimization Problems - Introduction
77POLYTECHNIQUE MONTREAL – Arnaud Fiorini 7
• Communication Overhead :• Memory synchronisation• Interprocessor Communication
• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels
• Shared Cache :• Cache misses, thrashing
Optimization Problems - Introduction
88POLYTECHNIQUE MONTREAL – Arnaud Fiorini 8
• Communication Overhead :• Memory synchronisation• Interprocessor Communication
• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels
• Shared Cache :• Cache misses, thrashing
Tracing
Profiling (Performance Counters)
POLYTECHNIQUE MONTREAL – Arnaud Fiorini 9
I. Introduction
1. GPU Development
2. Optimization Problems
II. Tracing and profiling of CPU-GPU systems
1. ROC Platform
2. Tracing GPUs
3. Profiling GPUs
Agenda
ROC Platform - Tracing and profiling of CPU-GPU systems
1010POLYTECHNIQUE MONTREAL – Arnaud Fiorini 10
© 2019 AMD Corporation https://rocm.github.io/
ROC Platform - Tracing and profiling of CPU-GPU systems
1111POLYTECHNIQUE MONTREAL – Arnaud Fiorini 11
User Application
ROC runtime
HSA Kernel Agent
GPU
ROCm functioning summarized :
User Mode Command
Queue
User context
ROC Platform - Tracing and profiling of CPU-GPU systems
1212POLYTECHNIQUE MONTREAL – Arnaud Fiorini 12
User Application
ROC runtime
HSA Kernel Agent
User Mode Command
QueueGPU
• Open source• Existing mechanism to insert
trace points• Standardized interface (HSA)
ROCm functioning summarized :
User context
ROC Platform - Tracing and profiling of CPU-GPU systems
1313POLYTECHNIQUE MONTREAL – Arnaud Fiorini 13
• This work has already been done by AMD and is open source : https://github.com/ROCm-Developer-Tools/rocprofilerhttps://github.com/ROCm-Developer-Tools/roctracer
• AMD has released a few other libraries and tools thanks to their Radeon Open Compute initiative.
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1414POLYTECHNIQUE MONTREAL – Arnaud Fiorini 14
TraceCompass ROCm plugin
HSA function calls separated by thread
Kernel executions
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1515POLYTECHNIQUE MONTREAL – Arnaud Fiorini 15
HIP function calls separated by thread
Kernel executions
TraceCompass ROCm plugin
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1616POLYTECHNIQUE MONTREAL – Arnaud Fiorini 16
TraceCompass ROCm plugin running on Theia front-end
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1717POLYTECHNIQUE MONTREAL – Arnaud Fiorini 17
Analyzing this tracing data further, future work includes :
• Critical path analysis of CPU-GPU execution
• Determining whether the program performance is limited by the GPU or the CPU
• Extracting statistics to use in profiling analysis
Profiling GPUs - Tracing and profiling of CPU-GPU systems
1818POLYTECHNIQUE MONTREAL – Arnaud Fiorini 18
TraceCompass ROCm plugin
Performance counters
Profiling GPUs - Tracing and profiling of CPU-GPU systems
1919POLYTECHNIQUE MONTREAL – Arnaud Fiorini 19
β : peak bandwidthI : arithmetic intensityπ : peak performance
Using tracing and profiling data, future work includes :
• Roofline model
Profiling GPUs - Tracing and profiling of CPU-GPU systems
2020POLYTECHNIQUE MONTREAL – Arnaud Fiorini 20
β : peak bandwidthI : arithmetic intensityπ : peak performance
Using tracing and profiling data, future work includes :
• Roofline model
Profiling GPUs - Tracing and profiling of CPU-GPU systems
2121POLYTECHNIQUE MONTREAL – Arnaud Fiorini 21
Using tracing and profiling data, future work includes :
• Top-down analysis
GPU Architectural Model (Top-down
analysis)
Derived MetricsPerformance Counters
GPU Architectural Model (Top-down
analysis)
Profiling GPUs - Tracing and profiling of CPU-GPU systems
2222POLYTECHNIQUE MONTREAL – Arnaud Fiorini 22
Using tracing and profiling data, future work includes :
• Top-down analysis
Derived MetricsPerformance Counters
2323POLYTECHNIQUE MONTREAL – Arnaud Fiorini 23
Thank you for listening !
Questions ?
References
2424POLYTECHNIQUE MONTREAL – Arnaud Fiorini 24
• https://github.com/RadeonOpenCompute/ROCm• https://rocm-documentation.readthedocs.io/en/latest/• http://www.hsafoundation.com/• HSA Runtime Programmer’s Reference Manual, Version 1.2• HSA Programmer's Reference Manual, Version 1.2• HSA Platform System Architecture Specification, Version 1.2• https://github.com/ucb-bar/opencl-
kernels/blob/master/saxpy/kernel.cl• https://medium.com/@smallfishbigsea/basic-concepts-in-gpu-
computing-3388710e9239• https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-
architecture.pdf• https://software.intel.com/content/www/us/en/develop/docu
mentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html