PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY...

15
FAN ZHU 2012-11-20 PROGRAMMING GPGPUS USING CUDA

Transcript of PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY...

Page 1: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

F A N Z H U 2 0 1 2 - 1 1 - 2 0

PROGRAMMING GPGPUS USING CUDA

Page 2: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

WHY GPGPUS

•  GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

From NVIDIA: CUDA C Programming Guide

Page 3: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

GPUS VS. CPUS

•  NVIDEA: 10x to 1000x speedups

•  Intel: 2.5x speedups

Page 4: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA

•  CUDA - Compute Unified Device Architecture •  C for CUDA is the programming language •  Fortran for CUDA •  Version 1.0 in 2007 •  Version 5.0 in 2012

•  Shared Memory Architecture

Page 5: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA CODE PORTABILITY

•  Hardware independent. •  Change configuration to achieve best performance

Page 6: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA WORKFLOW

1.  A CPU thread copies data from main memory to GPU memory.

2.  A CPU thread instructs GPU threads to start processing.

3.  GPU threads execute in parallel on different GPU cores.

3.∗ The CPU thread and all of the idle GPU threads wait for completion of the running GPU threads. This step happens at the same time as step 3.

4.  The CPU thread copies the results from GPU memory to main memory.

5.  The CPU thread acts on the results, and may return to step 1 in order to execute another GPU function.

Page 7: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

FUNCTION TYPES

•  __host__ •  Executed on the host (CPU) •  Callable from the host only

•  __global__ •  Executed on the device (GPU) •  Callable from the host only

•  __device__ •  Executed on the device •  Callable from the device only

Page 8: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

FUNCTIONS: MEMORY COPY

•  Executed on CPU

•  Allocate and free GPU memory •  cudaMalloc() and cudaFree()

•  Copy CPU memory to GPU memory •  cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

•  Copy GPU memory to CPU memory •  cudaMemcpy(h_B, d_B, size, cudaMemcpyDeviceToHost);

Page 9: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

FUNCTIONS

•  __syncthreads() •  Called from the host

•  clock(); clock64(); •  Called from device code

Page 10: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA EXAMPLES: VECTOR ADD

On GPU

On CPU

You can ask for memory here. 16 KB limitation

Page 11: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

GRID AND BLOCK

0,0 0,1 0,2 0,3

1,0 1,1 1,2 1,3

2,0 2,1 2,2 2,3

3,0 3,1 3,2 3,3

Grid

Block(0,0)

Block(0,1)

Block(1,0)

Block(1,1)

•  GRID •  Share

memory •  Block (<=1024

threads) •  Share cache

Page 12: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

BLOCKS

0,0 0,1 0,2 0,3

1,0 1,1 1,2 1,3

2,0 2,1 2,2 2,3

3,0 3,1 3,2 3,3

Page 13: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA EXAMPLE: MATRIX ADD

Block = 1x1 Block = 16x16

Page 14: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

COMPLETE CODE

In a same .cu file!

Page 15: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

THANK YOU.

•  CUDA C Programming Guide •  http://docs.nvidia.com/cuda/index.html