AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...

24
AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique Montreal DORSAL Laboratory

Transcript of AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...

Page 1: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

AMD ROCm GPU profiling in

Trace Compass

Arnaud Fiorini with Pr. Michel Dagenais

May 8th, 2020

Polytechnique Montreal

DORSAL Laboratory

Page 2: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

POLYTECHNIQUE MONTREAL – Arnaud Fiorini 2

I. Introduction

1. GPU Development

2. Optimization Problems

II. Tracing and profiling of CPU-GPU systems

1. ROC Platform

2. Tracing GPUs

3. Profiling GPUs

Agenda

Page 3: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

GPU Development - Introduction

33POLYTECHNIQUE MONTREAL – Arnaud Fiorini 3

• A few definitions :

• Kernel : A small piece of code executed on the device.

Page 4: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

GPU Development - Introduction

44POLYTECHNIQUE MONTREAL – Arnaud Fiorini 4

• A few definitions :

• Kernel : A small piece of code executed on the device.• Heterogeneous system : system mixing multiple types of processors

CPU GPUFPGACPU DSP…

Unified Memory

Page 5: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

GPU Development - Introduction

55POLYTECHNIQUE MONTREAL – Arnaud Fiorini 5

• SIMD Architecture :

Data pool

Instruction pool

APU

APU

APU

APU

Page 6: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

GPU Development - Introduction

6

© 2019 AMD Corporation

6POLYTECHNIQUE MONTREAL – Arnaud Fiorini 6

Page 7: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Optimization Problems - Introduction

77POLYTECHNIQUE MONTREAL – Arnaud Fiorini 7

• Communication Overhead :• Memory synchronisation• Interprocessor Communication

• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels

• Shared Cache :• Cache misses, thrashing

Page 8: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Optimization Problems - Introduction

88POLYTECHNIQUE MONTREAL – Arnaud Fiorini 8

• Communication Overhead :• Memory synchronisation• Interprocessor Communication

• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels

• Shared Cache :• Cache misses, thrashing

Tracing

Profiling (Performance Counters)

Page 9: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

POLYTECHNIQUE MONTREAL – Arnaud Fiorini 9

I. Introduction

1. GPU Development

2. Optimization Problems

II. Tracing and profiling of CPU-GPU systems

1. ROC Platform

2. Tracing GPUs

3. Profiling GPUs

Agenda

Page 10: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

ROC Platform - Tracing and profiling of CPU-GPU systems

1010POLYTECHNIQUE MONTREAL – Arnaud Fiorini 10

© 2019 AMD Corporation https://rocm.github.io/

Page 11: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

ROC Platform - Tracing and profiling of CPU-GPU systems

1111POLYTECHNIQUE MONTREAL – Arnaud Fiorini 11

User Application

ROC runtime

HSA Kernel Agent

GPU

ROCm functioning summarized :

User Mode Command

Queue

User context

Page 12: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

ROC Platform - Tracing and profiling of CPU-GPU systems

1212POLYTECHNIQUE MONTREAL – Arnaud Fiorini 12

User Application

ROC runtime

HSA Kernel Agent

User Mode Command

QueueGPU

• Open source• Existing mechanism to insert

trace points• Standardized interface (HSA)

ROCm functioning summarized :

User context

Page 13: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

ROC Platform - Tracing and profiling of CPU-GPU systems

1313POLYTECHNIQUE MONTREAL – Arnaud Fiorini 13

• This work has already been done by AMD and is open source : https://github.com/ROCm-Developer-Tools/rocprofilerhttps://github.com/ROCm-Developer-Tools/roctracer

• AMD has released a few other libraries and tools thanks to their Radeon Open Compute initiative.

Page 14: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1414POLYTECHNIQUE MONTREAL – Arnaud Fiorini 14

TraceCompass ROCm plugin

HSA function calls separated by thread

Kernel executions

Page 15: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1515POLYTECHNIQUE MONTREAL – Arnaud Fiorini 15

HIP function calls separated by thread

Kernel executions

TraceCompass ROCm plugin

Page 16: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1616POLYTECHNIQUE MONTREAL – Arnaud Fiorini 16

TraceCompass ROCm plugin running on Theia front-end

Page 17: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Tracing GPUs - Tracing and profiling of CPU-GPU systems

1717POLYTECHNIQUE MONTREAL – Arnaud Fiorini 17

Analyzing this tracing data further, future work includes :

• Critical path analysis of CPU-GPU execution

• Determining whether the program performance is limited by the GPU or the CPU

• Extracting statistics to use in profiling analysis

Page 18: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Profiling GPUs - Tracing and profiling of CPU-GPU systems

1818POLYTECHNIQUE MONTREAL – Arnaud Fiorini 18

TraceCompass ROCm plugin

Performance counters

Page 19: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Profiling GPUs - Tracing and profiling of CPU-GPU systems

1919POLYTECHNIQUE MONTREAL – Arnaud Fiorini 19

β : peak bandwidthI : arithmetic intensityπ : peak performance

Using tracing and profiling data, future work includes :

• Roofline model

Page 20: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Profiling GPUs - Tracing and profiling of CPU-GPU systems

2020POLYTECHNIQUE MONTREAL – Arnaud Fiorini 20

β : peak bandwidthI : arithmetic intensityπ : peak performance

Using tracing and profiling data, future work includes :

• Roofline model

Page 21: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

Profiling GPUs - Tracing and profiling of CPU-GPU systems

2121POLYTECHNIQUE MONTREAL – Arnaud Fiorini 21

Using tracing and profiling data, future work includes :

• Top-down analysis

GPU Architectural Model (Top-down

analysis)

Derived MetricsPerformance Counters

Page 22: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

GPU Architectural Model (Top-down

analysis)

Profiling GPUs - Tracing and profiling of CPU-GPU systems

2222POLYTECHNIQUE MONTREAL – Arnaud Fiorini 22

Using tracing and profiling data, future work includes :

• Top-down analysis

Derived MetricsPerformance Counters

Page 23: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

2323POLYTECHNIQUE MONTREAL – Arnaud Fiorini 23

Thank you for listening !

Questions ?

Page 24: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique

References

2424POLYTECHNIQUE MONTREAL – Arnaud Fiorini 24

• https://github.com/RadeonOpenCompute/ROCm• https://rocm-documentation.readthedocs.io/en/latest/• http://www.hsafoundation.com/• HSA Runtime Programmer’s Reference Manual, Version 1.2• HSA Programmer's Reference Manual, Version 1.2• HSA Platform System Architecture Specification, Version 1.2• https://github.com/ucb-bar/opencl-

kernels/blob/master/saxpy/kernel.cl• https://medium.com/@smallfishbigsea/basic-concepts-in-gpu-

computing-3388710e9239• https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-

architecture.pdf• https://software.intel.com/content/www/us/en/develop/docu

mentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html