NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler...

16
NVIDIA Visual Profiler & CUDA-MEMCHECK

Transcript of NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler...

Page 1: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

NVIDIA Visual Profiler &

CUDA-MEMCHECK

Page 2: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

Visual Profiler – Overview

• Included in CUDA Toolkit

• Visualize and optimize performance of a CUDA application

• Shows timeline on CPU and GPU

• nvvp (GUI)

• nvprof (Terminal)

• Two types: – Executable session

– Imported session (importing data generated by nvprof)

• Generate pdf report

Page 3: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

Getting started

Page 4: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

Timeline View

• CPU activity

• GPU activity

• Shows start & end of

– Threads

– Kernels

– Memcpy

– …

• Zoom, filter, reorder, …

Page 5: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of
Page 6: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of
Page 7: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

Analysis View

• Guided or unguided – For unguided compile with SET(LOCAL_CUDA_NVCC_FLAGS ${LOCAL_CUDA_NVCC_FLAGS] –lineinfo)

• CUDA Application Analysis – Application‘s overall GPU utilization

– Kernel performance (orders kernels according to optimization importance based on execution time and achieved occupancy)

• Performance-Critical Kernels – Detailed analysis of a selected kernel

Page 8: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of
Page 9: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of
Page 10: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

• Compute, Bandwith, or Latency Bound

• Instruction and memory latency

– Examine occupancy

How many warps the kernel has active on the GPU, relative to the maximum number of warps supported by GPU

– Examine stall reasons

Could give insight why latency is still an issue for the kernel

Page 11: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of
Page 12: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of
Page 13: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

• Compute resources

GPU compute resources could limit the performance of a kernel, if they are insufficient or poorly utilized

Page 14: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

CUDA-MEMCHECK

• detects memory access errors

• Run time error detection

• Included in CUDA Toolkit

• Getting started:

– cuda-memcheck executable -options

best case:

Page 15: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

Supported error detection

• Memory access error Errors due to out of bound or misaligned access to memory by global,

local, shared or global atomic access

• Hardware exception Errors reported by hardware error reporting mechanism

• Malloc/Free errors Errors due to incorrect use of malloc or free

• CUDA API errors Failure of CUDA API call

• cudaMalloc memory leaks Allocations of device memory which have not been freed

• Device heap memory leaks Allocations of device memory in device code which have not been freed

Page 16: NVIDIA Visual Profiler - uni-graz.at · NVIDIA Visual Profiler & CUDA-MEMCHECK . Visual Profiler – Overview •Included in CUDA Toolkit •Visualize and optimize performance of

Example

__global__ : for device global memory __shared__ : for per block shared memory __local__ : for per thread local memory Information about type of access (read / write) Size of access in bytes Source file and line number Thread indices and block indices Memory address being accessed and type of access error