Killdevil Running CUDA programs on cluster. Requesting permission bin/unc_id/services ...

20
Killdevil Running CUDA programs on cluster

Transcript of Killdevil Running CUDA programs on cluster. Requesting permission bin/unc_id/services ...

Page 1: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Killdevil

Running CUDA programs on cluster

Page 2: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Requesting permission

• https://onyen.unc.edu/cgi-bin/unc_id/services

Page 3: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Compiling CUDA programs

• module load cuda• Run script : compile.sh– nvcc -o MatrixMul -I/usr/local/cuda/include/

-L/usr/local/lib64 -L/usr/local/cuda/lib64 MatrixMul.cu

Page 4: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Running CUDA programs

• ssh killdevil.unc.edu• module load cuda• Run script : submitjob.sh– bsub –q gpu –a gpuexcl_t –n 1 –o MYGPUJOB.o%J

<myprogramscript>

Page 5: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

CUDA SDK

• https://developer.nvidia.com/cuda-downloads– Download the SDK depending on your OS

• Windows : Requires Visual Studio to compile sample

• Linux :Requires gcc

Page 6: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

CUDA : Threads

Page 7: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Recap

• Kernel program is executed by a grid of threads

Page 8: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Thread Organization

• Organized in two-level hierarchy– Grid composed of Blocks • gridDim : Number of blocks the grid has

– Blocks composed of Threads• blockDim : Number of threads the block has

• Each block gets a unique Id– blockIdx

• Each thread gets a unique Id– threadIdx

Page 9: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Thread Organization

• Each block has equal number of threads– blockDim.x, blockDim.y, blockDim.z

• threadIdx is always local to the block

Page 10: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

1D Example

• Grid = 128 blocks• Block = 32 threads– blockDim.x in kernel returns 32

• Total threads = 128 x 32 = 4096– Each thread has a unique Id• blockIdx.x * blockDim.x + threadId.x

Page 11: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Multi-Dimension Example

Page 12: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Things to Note

• Blocks are organized into 3D arrays of threads– 1D, 2D, 3D depending on your problem– Vector sum : 1D; Matrix multiplication : 2D

• All blocks in a grid have the same dimensions– i.e all blocks have equal number of threads in each

dimension• The total size of a block is limited to 512 threads– blockDim can be (512, 1, 1), (8, 16, 2), (16, 16, 2)– But not (32, 32, 1)

• Total threads : 32 x 32 x 1 = 1024 which exceeds 512

Page 13: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

USING blockIdx AND threadIdx

0, 0 1, 0 2, 0 width-1, 0

0, 1 width–1, 1

0, 2

0, width-1 width – 1, width - 1

Page 14: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Matrix-Multiplication with larger size

Page 15: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Simple example

Page 16: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Updated kernel code

Page 17: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Block scheduling on device

Page 18: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Thread Assignment

Page 19: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

Thread Assignment

Page 20: Killdevil Running CUDA programs on cluster. Requesting permission  bin/unc_id/services  bin/unc_id/services.

QUESTIONS?