Krishnan Suresh (“Suresh”) [email protected]...
Transcript of Krishnan Suresh (“Suresh”) [email protected]...
![Page 1: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/1.jpg)
Popular CUDA Packages
Krishnan Suresh (“Suresh”)
Associate Professor
Mechanical Engineering
![Page 2: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/2.jpg)
2
Take-Home Message
• Don’t reinvent the wheel!
• Minimize custom Kernels
![Page 3: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/3.jpg)
Conjugate Gradient
� Solve Ax = b via CG (Matlab)
GPU algorithms:
� Dot-product: Use CUBLAS
� Ax: Use CUSPARSE
� ax+b: Use CUBLAS
![Page 4: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/4.jpg)
CUDA Libraries & Packages
![Page 5: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/5.jpg)
5
CUDA Libraries & Packages
1. CUBLAS: Dense Linear Algebra
2. Thrust: Parallel sort, …
3. CuSparse: Sparse Linear Algebra Package
4. Jacket: Matlab Wrapper
5. CULA: Dense and sparse linear algebra
6. MAGMA: Multicore linear algebra
7. CUFFT: Fast Fourier Transform
8. …
![Page 6: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/6.jpg)
6
CUDA Libraries & Packages
1. CUBLAS: Dense Linear Algebra
2. Thrust: Parallel sort, …
3. CuSparse: Sparse Linear Algebra Package
4. Jacket: Matlab Wrapper
5. CULA: Dense and sparse linear algebra
6. MAGMA: Multicore linear algebra
7. CUFFT: Fast Fourier Transform
8. …
![Page 7: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/7.jpg)
7
CUBLAS
• CUDA implementation of BLAS (Basic
Linear Algebra Subprograms)
– Vector, vector (Level-1)
– Matrix, vector (Level-2)
– Matrix, matrix (Level-3)
• Precisions
– Single: real & complex
– Double: real & complex (not all functions)
• No kernel calls, shared memory, etc
![Page 8: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/8.jpg)
CUBLAS Library Usage
� No additional downloads needed
– cublas.lib (in CUDA SDK)
– Add cublas.lib to linker
– #include cublas.h
8
![Page 9: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/9.jpg)
9
CUBLAS Code Structure
1. Initialize CUBLAS: cublasInit()2. Create CPU memory and data
3. Create GPU memory: cublasAlloc(6)
4. Copy from CPU to GPU : cublasSetVector(6)
5. Operate on GPU : cublasSgemm(6)
6. Check for CUBLAS error : cublasGetError()
7. Copy from GPU to CPU : cublasGetVector(6)8. Verify results
9. Free GPU memory : cublasFree(6)
10. Shut down CUBLAS : cublasShutDown()
![Page 10: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/10.jpg)
10
CUBLAS BLAS-1 Functions: Vector-vector operations
![Page 11: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/11.jpg)
11
CU(BLAS) Naming Convention
cublasIsamax
Index of
Single
Precision
absolute
cublasIdamax
Find the index of the absolute max
of a vector of single precision reals
cublasIzamax
cublasIcamax
max
![Page 12: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/12.jpg)
12
CU(BLAS) Naming Convention
cublasSaxpy
Single
Precision
alpha*x+y
cublasDaxpy
Compute alpha*x+y where
x &y are single precision reals
& alpha is a scalar
![Page 13: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/13.jpg)
13
CUBLAS Example-1 (CPU)
Ta x y=
![Page 14: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/14.jpg)
CUBLAS Example-1 (GPU)
Ta x y=
• No kernel calls
• No memory mgmt.
Increment of 1
14
![Page 15: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/15.jpg)
15
CUBLAS Example-2 (CPU)
z x yα= +
![Page 16: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/16.jpg)
CUBLAS Example-2 (GPU)
z x yα= +
Output stored
in d_y
16
![Page 17: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/17.jpg)
CUBLAS BLAS-2 Functions: Matrix-Vector Operations
:
z Ax y
A symmetric banded
α β= +
1
( )
x A y
A Upper or Lower
α −=
=17
![Page 18: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/18.jpg)
18
CUBLAS: Caveats
• Solves Ax = b only for Upper/Lower A
• Limited class of sparse matrices
• Column format & 1-indexing (Fortran style)
• C: row format & 0-indexing; use macros
![Page 19: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/19.jpg)
19
CU(BLAS) Naming Convention
cublasSsbmv
Single
symmetric
banded
z Ax yα β= +
xxx
xxxx
xxxxx
xxxx
xxX
![Page 20: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/20.jpg)
Example
z Ax yα β= +
( , )
2 1
1 2 1
1 2 ...
... ... 1
1 2N N
A
− − − = −
− −
It is sufficient to store
( , )
2 1
2 1
2 ...
... 1
2N N
− −
−
(2, )
1 1 ... 1_
2 2 2 ... 2N
Xh A
− − − =
Stored as
Symmetric-Banded
#Super-Diagonals = 1
20
![Page 21: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/21.jpg)
21
CUBLAS Example-3 (CPU)
z Ax yα β= +(2, )
1 1 ... 1_
2 2 2 ... 2N
Xh A
− − − =
Macro for 0-indexing in C
2
1_ :
2
1
...
X
h A
− −
![Page 22: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/22.jpg)
22
CUBLAS Example-3 (CPU)
(2, )
1 1 ... 1_
2 2 2 ... 2N
Xh A
− − − =
1 1 1
2 2 2
3 3 3
2 1
1 2 1
1 2 ...
... ... ...... ... 1
1 2N N N
z x y
z x y
z x y
z x y
α β
− − − = +−
− −
![Page 23: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/23.jpg)
CUBLAS Example-3 (GPU)
z Ax yα β= +(2, )
1 1 ... 1_
2 2 2 ... 2N
Xh A
− − − =
#Rows
Upper
diagonal
#Rows
23
![Page 24: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/24.jpg)
24
CUBLAS Optimal Usage
1. Copy from CPU to GPU : cublasSet 6(6)2. Operate on GPU
� Operation 1
� Operation 2
� 6
� Operation n
3. Copy from GPU to CPU : cublasGet6(6)
![Page 25: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/25.jpg)
25
CUBLAS BLAS-3 Functions: Matrix-Matrix Operations
C AB Cα β= +
1
( )
X A B
A Upper or Lower
α −=
=
![Page 26: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/26.jpg)
26
CUBLAS Performance
![Page 27: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/27.jpg)
27
CUDA Libraries & Packages
1. CUBLAS: Dense Linear Algebra
2. Thrust: Parallel sort, …
3. CuSparse: Sparse Linear Algebra Package
4. Jacket: Matlab Wrapper
5. CULA: Dense and sparse linear algebra
6. MAGMA: Multicore linear algebra
7. CUFFT: Fast Fourier Transform
8. …
![Page 28: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/28.jpg)
28
Thrust
• C++ Template Library using CUDA
• Vector containers:• host_vector & device_vector
• Generalizes std:vector
• Store any type & dynamically resize
• Numerous algorithms• Sort
• Sum
• Max
![Page 29: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/29.jpg)
29
Thrust: Getting started
� Download to (CUDA include directory)
– http://code.google.com/p/thrust/
– Requires CUDA 2.3
� Tutorial:
– http://code.google.com/p/thrust/wiki/Tutorial
![Page 30: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/30.jpg)
30
Thrust: Concept
![Page 31: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/31.jpg)
31
Thrust Algorithms: Prefix Sum
� Given a sequence:
� And an operation
� Output:
{ }1 2 3, , ,..., Nx x x x
⊕
{ }1 1 2 1 2 3 1 2 3, , ,..., ... Nx x x x x x x x x x⊕ ⊕ ⊕ ⊕ ⊕ ⊕
![Page 32: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/32.jpg)
32
Prefix Sum
� Key to numerous algorithms
� Also referred to as “Scan” algorithm
� Different operations result in different results
![Page 33: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/33.jpg)
33
Prefix Sum: Example
� Given a sequence:
� And an operation
� Output
{ }1,2,9,6,...,
+
{ }1 1 2 1 2 3 1 2 3, , ,..., ... Nx x x x x x x x x x+ + + + + +
{ }1,3,11,17,...
![Page 34: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/34.jpg)
34
Prefix Sum: Example
� Given a sequence:
� And an operation
� Output
{ }1,2,9,6,...,
∗
{ }1 1 2 1 2 3 1 2 3, , ,..., ... Nx x x x x x x x x x∗ ∗ ∗ ∗ ∗ ∗
{ }1,2,18,108,...
![Page 35: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/35.jpg)
35
Prefix Sum: Example
� Given a sequence:
� And an operation
� Output
{ }1,2,9,6,...,
max
{ }1 1 2 1 2 3,max( , ),max(max( , ), ),...x x x x x x
{ }1,2,9,9,...
![Page 36: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/36.jpg)
36
Thrust: Examples Set-up
![Page 37: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/37.jpg)
37
Thrust: Examples
![Page 38: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/38.jpg)
38
Thrust: Examples cont.
2 2 2
1 2 ... Na x x x x= = + + +
![Page 39: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/39.jpg)
39
CUDA Libraries & Packages
1. CUBLAS: Dense Linear Algebra
2. Thrust: Parallel sort, …
3. CuSparse: Sparse Linear Algebra Package
4. Jacket: Matlab Wrapper
5. CULA: Dense and sparse linear algebra
6. MAGMA: Multicore linear algebra
7. CUFFT: Fast Fourier Transform
8. …
![Page 40: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/40.jpg)
40
CuSparse
Linear Algebra for sparse matrices using CUDA
![Page 41: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/41.jpg)
41
CuSparse
![Page 42: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/42.jpg)
42
CuSparse
![Page 43: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/43.jpg)
43
CUDA Libraries & Packages
1. CUBLAS: Dense Linear Algebra
2. Thrust: Parallel sort, …
3. CuSparse: Sparse Linear Algebra Package
4. CULA: Dense and sparse linear algebra
5. Jacket: Matlab Wrapper
6. MAGMA: Multicore linear algebra
7. CUFFT: Fast Fourier Transform
8. …
![Page 44: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/44.jpg)
44
CULA Sparse
![Page 45: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/45.jpg)
45
CUFFT
CUDA Implementation of
Fast Fourier Transform
![Page 46: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/46.jpg)
46
Fourier Transform
• Extract frequencies from signal
• Given a function
• 1-D Fourier transform:
• 2-D, 3-D
( );f t t−∞< <∞
2(̂ ) ( ) i tf f t e dtπ ξξ
∞−
−∞
= ∫
![Page 47: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/47.jpg)
47
Fourier Transform
Continuous Signal Fourier Transform
(Wikipedia)
2ˆ( ) ( ) i tf t f e dπ ξξ ξ
∞
−∞
= ∫
![Page 48: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/48.jpg)
48
Discrete Fourier Transform
• Given a sequence
• Discrete Fourier transform (DFT):
6 another sequence
0 1 1, ,..., Nx x x −
21
0
ˆiknN
Nk n
n
x x eπ− −
=
=∑
![Page 49: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/49.jpg)
49
DFT Examples
Highest frequency
that can be captured
correctly
![Page 50: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/50.jpg)
50
Fast Fourier Transform
• DFT: Naïve O(N2) operation
• FFT: Fast DFT, O(NlogN)
• Key to signal processing, PDE, 6
0 1 1, ,..., Nx x x − 0 1 1ˆ ˆ ˆ, ,..., Nx x x −
21
0
ˆiknN
Nk n
n
x x eπ− −
=
=∑
![Page 51: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/51.jpg)
51
CUFFT
� Fast CUDA library for FFT
� No additional downloads needed
– cufft.lib (in CUDA SDK)
– Add cufft.lib to linker
– #include cufft.h
![Page 52: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/52.jpg)
52
CUFFT: Features
• 1-D, 2-D, 3-D
• Precisions
– Single: real & complex
– Double: real & complex (not all functions)
• Uses CUDA memory calls & fft data
• Requires a ‘plan’
• Based on FFTW model
![Page 53: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/53.jpg)
53
CUFFT Example
![Page 54: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/54.jpg)
54
CUFFT Example (cont.)
Complex to
complex
1 data
(batch)
![Page 55: Krishnan Suresh (“Suresh”) suresh@engr.wisc.edu …outreach.sbel.wisc.edu/Workshops/GPUworkshop/2012/... · CUBLAS Library Usage No additional downloads needed](https://reader031.fdocuments.net/reader031/viewer/2022013101/5aeaddd07f8b9a36698d922a/html5/thumbnails/55.jpg)
Acknowledgements
� Graduate Students
� NSF
� UW-Madison
� Kulicke and Soffa
� Luvata
� Trek Bicycles
Publications available at
www.ersl.wisc.edu