CUDA

Rachel MillerResearch Computing Lab

CUDA is a programming language that uses the Graphical Processing Unit (GPU)

Allows calculations to be performed in parallel, giving significant speedup

Used with C programs

http://members.tripod.com/~Michael_Art/Animal_Fun/Baracuda.gif

GPUs are designed to make high speed parallel calculations for displaying graphics, such as games

Use available resources! Over 100 million GPUs are already deployed

30-100x Speed-up over other microprocessors for some applications

GPUs have lots of small Arithmetic Logic Units (ALUs), compared to a few larger ones on the CPU

This allows for many parallel computations, like calculating a color for each pixel on the screen

Image from NVIDIA CUDA Programming Guide

GPUs run one kernel (a group of work) at a time

Each kernel has blocks, which are independent groups of ALUs

Each block is comprised of threads, which are the level of computation

The threads in each block typically work together to compute a value

Kernel 1

Kernel 2

Device

Grid 1

Block(0, 0)

Block(1, 0)

Block(2, 0)

Block(0, 1)

Block(1, 1)

Block(2, 1)

Grid 2

Block (1, 1)

Thread(0, 1)

Thread(1, 1)

Thread(2, 1)

Thread(3, 1)

Thread(4, 1)

Thread(0, 2)

Thread(1, 2)

Thread(2, 2)

Thread(3, 2)

Thread(4, 2)

Thread(0, 0)

Thread(1, 0)

Thread(2, 0)

Thread(3, 0)

Thread(4, 0)

Image from NVIDA

Threads within the same block can share memory

In CUDA, sending information from the CPU to the GPU is often the most expensive part of the calculation

For each thread, local memory is fastest, followed by shared memory; global, constant and texture memory are all slowest

(Device) Grid

ConstantMemory

TextureMemory

GlobalMemory

Block (0, 0)

Shared Memory

LocalMemory

Thread (0, 0)

Registers

LocalMemory

Thread (1, 0)

Registers

Block (1, 0)

Shared Memory

LocalMemory

Thread (0, 0)

Registers

LocalMemory

Thread (1, 0)

Registers

Image from NVIDA

Each thread “knows” the x and y coordinates of the block it is in, and the coordinates of where it is in the block

These positions can be used to compute a unique thread ID for each thread

The computational work done will depend on the value of this thread ID

Example: the thread ID corresponds to a group of matrix elements

All threads in a block will run in parallel IF they are all following the same code; its important to eliminate logical branches, to keep all threads running at the same time

Threads can only reference local memory and shared memory, so any needed information should be put into shared memory

CUDA applications should run parallel operations on lots of data, and be processing intensive

Examples: Molecular Dynamics Simulation Video/Audio Encoding, Manipulation 3D Imaging and Visualization Matrix Operations

These collisions of thousands of tiny balls runs real time on a desktop computer! (And looks better there, too.)

Watch a better version athttp://www.youtube.com/watch?v=RqduA7myZok

Over 170 premade CUDA tools exist, and would be useful building blocks for applications Areas include Imaging, Video & Audio

Processing, Molecular Dynamics, Signal Processing

CUDA can also help an existing application meet its need for speed Process huge datasets faster Can achieve close to real time data processing

Nvidia (the makers of CUDA) created a MATLAB plug-in for accelerating standard MATLAB 2D FFTs

CUDA has a graphics toolbox for MATLAB

More MATLAB plug-ins to come!

CUDA

Technology

Transcript of CUDA

CUDA C BEST PRACTICES GUIDE - Multiprocesorski sistemimups.etf.rs/vezbe/cuda/docs/CUDA_C_Best_Practices_Guide.pdf · 1.3.1 CUDA Runtime API ... CUDA C Best Practices Guide DG-05603-001_v4.0

Chapter 1. Introduction - POLI's homepagepoli.cs.vsb.cz/edu/apps/cuda/cuda-programming.pdf · CUDA C Programming Guide Version 4.0 1 ... NVIDIA introduced CUDA™, ... Chapter 1.

Programming with CUDA · Programming with CUDA ... CUDA C programming guide – CUDA Programming 4 …

Tutorial CUDA - Pascal-Man CUDA © NVIDIA Corporation ... Why GPUs? CUDA programming model, language, and runtime Break CUDA implementation on the GPU ... vec_dot…

GPGPU programming on example of CUDA - Panoramix - …panoramx.ift.uni.wroc.pl/~maq/cuda/prezentacja-cuda-eng.pdf · CPU GPU CUDA Architecture GPU programming Examples Summary GPGPU

Getting Started with CUDA C/C++ · Getting Started with CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator . CPU GPU ... LabVIEW . Programming a CUDA Language CUDA C/C++ Based on industry-standard

CUDA programming Performance considerations (CUDA best practices)

GPUDIRECT, CUDA AWARE MPI, & CUDA IPC€¦ · Steve Abbott, February 12, 2019 GPUDIRECT, CUDA AWARE MPI,& CUDA IPC

v5.0 | October 2012 NVIDIA CUDA SAMPLES Release Notesdirac.ruc.dk/manuals/cuda-5.0/CUDA_Samples_Release_Notes.pdf · NVIDIA CUDA Samples v5.0 | ii CUDA SAMPLES 5.0 NOTES R304 Driver

Parallel programming many-core computing: CUDA ...bal/college11/class3-cuda-introduction.pdf · CUDA CUDA: Scalable parallel programming C/C++ extensions Provide straightforward mapping

CUDA Lecture 4 CUDA Programming Basics

NVIDIA CUDA D CUDA-GDBdeveloper.download.nvidia.com/.../docs/...2.3beta.pdfPG-00000-004_V2.3 1 NVIDIA CHAPTER1 Introduction CUDA‐GDB, the NVIDIA® CUDA™ debugger, is introduced,

CUDA-GDB: The NVIDIA CUDA Debugger · CUDA Debugger User Manual Version 2.1 Beta 1 Chapter 1. Introduction 1.1 CUDA-GDB: The NVIDIA CUDA Debugger CUDA-GDB is a ported version of GDB:

Debugging Experience with CUDA-GBD and CUDA-MEMCHECK · 2012-11-27 · Debugging Experience with CUDA-GDB and CUDA-MEMCHECK ... CUDA Debugging Solutions C UDA-G DB (Linux & Mac) C

CUDA-GDB (NVIDIA CUDA Debugger)

CUDA And GPU Programming - CUDA Teaching Center at UGA

GPU (Graphics Processing Unit) Programming in CUDANVIDIA CUDA Programming Guide) ... CUDA C OpenCL CUDA Fortran ... GPU Computing Applications. Soluzioni alternative a CUDA per GPU

NVIDIA CUDA Best Practices Guide - Virginia Tech€¦ · CUDA Best Practices Guide Version 3.1 Version 3.1 5/19/2010 NVIDIA CUDA™ NVIDIA CUDA C Best Practices Guide . ... CUDA Programming

CUDA Lecture 5 CUDA at the University of Akron

Debugging Experience with CUDA-GDB and CUDA …developer.download.nvidia.com/...GTC2012-Debugging...Debugging Experience with CUDA-GDB and CUDA-MEMCHECK Geoff Gerfin Vyas Venkataraman