AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...

Post on 25-Jun-2020

2 views 0 download

Transcript of AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...

AMD ROCm GPU profiling in

Trace Compass

Arnaud Fiorini with Pr. Michel Dagenais

May 8th, 2020

Polytechnique Montreal

DORSAL Laboratory

POLYTECHNIQUE MONTREAL – Arnaud Fiorini 2

I. Introduction

1. GPU Development

2. Optimization Problems

II. Tracing and profiling of CPU-GPU systems

1. ROC Platform

2. Tracing GPUs

3. Profiling GPUs

Agenda

GPU Development - Introduction

33POLYTECHNIQUE MONTREAL – Arnaud Fiorini 3

• A few definitions :

• Kernel : A small piece of code executed on the device.

GPU Development - Introduction

44POLYTECHNIQUE MONTREAL – Arnaud Fiorini 4

• A few definitions :

• Kernel : A small piece of code executed on the device.• Heterogeneous system : system mixing multiple types of processors

CPU GPUFPGACPU DSP…

Unified Memory

GPU Development - Introduction

55POLYTECHNIQUE MONTREAL – Arnaud Fiorini 5

• SIMD Architecture :

Data pool

Instruction pool

APU

APU

APU

APU

GPU Development - Introduction

6

© 2019 AMD Corporation

6POLYTECHNIQUE MONTREAL – Arnaud Fiorini 6

Optimization Problems - Introduction

77POLYTECHNIQUE MONTREAL – Arnaud Fiorini 7

• Communication Overhead :• Memory synchronisation• Interprocessor Communication

• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels

• Shared Cache :• Cache misses, thrashing

Optimization Problems - Introduction

88POLYTECHNIQUE MONTREAL – Arnaud Fiorini 8

• Communication Overhead :• Memory synchronisation• Interprocessor Communication

• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels

• Shared Cache :• Cache misses, thrashing

Tracing

Profiling (Performance Counters)

POLYTECHNIQUE MONTREAL – Arnaud Fiorini 9

I. Introduction

1. GPU Development

2. Optimization Problems

II. Tracing and profiling of CPU-GPU systems

1. ROC Platform

2. Tracing GPUs

3. Profiling GPUs

Agenda

ROC Platform - Tracing and profiling of CPU-GPU systems

1010POLYTECHNIQUE MONTREAL – Arnaud Fiorini 10

© 2019 AMD Corporation https://rocm.github.io/

ROC Platform - Tracing and profiling of CPU-GPU systems

1111POLYTECHNIQUE MONTREAL – Arnaud Fiorini 11

User Application

ROC runtime

HSA Kernel Agent

GPU

ROCm functioning summarized :

User Mode Command

Queue

User context

ROC Platform - Tracing and profiling of CPU-GPU systems

1212POLYTECHNIQUE MONTREAL – Arnaud Fiorini 12

User Application

ROC runtime

HSA Kernel Agent

User Mode Command

QueueGPU

• Open source• Existing mechanism to insert

trace points• Standardized interface (HSA)

ROCm functioning summarized :

User context

ROC Platform - Tracing and profiling of CPU-GPU systems

1313POLYTECHNIQUE MONTREAL – Arnaud Fiorini 13

• This work has already been done by AMD and is open source : https://github.com/ROCm-Developer-Tools/rocprofilerhttps://github.com/ROCm-Developer-Tools/roctracer

• AMD has released a few other libraries and tools thanks to their Radeon Open Compute initiative.

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1414POLYTECHNIQUE MONTREAL – Arnaud Fiorini 14

TraceCompass ROCm plugin

HSA function calls separated by thread

Kernel executions

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1515POLYTECHNIQUE MONTREAL – Arnaud Fiorini 15

HIP function calls separated by thread

Kernel executions

TraceCompass ROCm plugin

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1616POLYTECHNIQUE MONTREAL – Arnaud Fiorini 16

TraceCompass ROCm plugin running on Theia front-end

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1717POLYTECHNIQUE MONTREAL – Arnaud Fiorini 17

Analyzing this tracing data further, future work includes :

• Critical path analysis of CPU-GPU execution

• Determining whether the program performance is limited by the GPU or the CPU

• Extracting statistics to use in profiling analysis

Profiling GPUs - Tracing and profiling of CPU-GPU systems

1818POLYTECHNIQUE MONTREAL – Arnaud Fiorini 18

TraceCompass ROCm plugin

Performance counters

Profiling GPUs - Tracing and profiling of CPU-GPU systems

1919POLYTECHNIQUE MONTREAL – Arnaud Fiorini 19

β : peak bandwidthI : arithmetic intensityπ : peak performance

Using tracing and profiling data, future work includes :

• Roofline model

Profiling GPUs - Tracing and profiling of CPU-GPU systems

2020POLYTECHNIQUE MONTREAL – Arnaud Fiorini 20

β : peak bandwidthI : arithmetic intensityπ : peak performance

Using tracing and profiling data, future work includes :

• Roofline model

Profiling GPUs - Tracing and profiling of CPU-GPU systems

2121POLYTECHNIQUE MONTREAL – Arnaud Fiorini 21

Using tracing and profiling data, future work includes :

• Top-down analysis

GPU Architectural Model (Top-down

analysis)

Derived MetricsPerformance Counters

GPU Architectural Model (Top-down

analysis)

Profiling GPUs - Tracing and profiling of CPU-GPU systems

2222POLYTECHNIQUE MONTREAL – Arnaud Fiorini 22

Using tracing and profiling data, future work includes :

• Top-down analysis

Derived MetricsPerformance Counters

2323POLYTECHNIQUE MONTREAL – Arnaud Fiorini 23

Thank you for listening !

Questions ?

References

2424POLYTECHNIQUE MONTREAL – Arnaud Fiorini 24

• https://github.com/RadeonOpenCompute/ROCm• https://rocm-documentation.readthedocs.io/en/latest/• http://www.hsafoundation.com/• HSA Runtime Programmer’s Reference Manual, Version 1.2• HSA Programmer's Reference Manual, Version 1.2• HSA Platform System Architecture Specification, Version 1.2• https://github.com/ucb-bar/opencl-

kernels/blob/master/saxpy/kernel.cl• https://medium.com/@smallfishbigsea/basic-concepts-in-gpu-

computing-3388710e9239• https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-

architecture.pdf• https://software.intel.com/content/www/us/en/develop/docu

mentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html