Fast Fourier Transform with BrookGPU
Transcript of Fast Fourier Transform with BrookGPU
![Page 1: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/1.jpg)
Fast Fourier Transform with BrookGPU
CS594 GPU ProgrammingJulian Yu-Chung Chen
2006-04-25
![Page 2: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/2.jpg)
GPGPU
• Modern GPU is fast & programmable
• Out-performs CPU in some cases
![Page 3: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/3.jpg)
Fast Fourier Transform
• Frequently used in signal processing, compression etc.
• Computation intensive
• Moreland & Angel’s GPU implementation
• Cannot have branch in frag. program
• Need multiple frag. program switch
![Page 4: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/4.jpg)
![Page 5: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/5.jpg)
Tangle+Untangle
• Perform FFT on h(x) = f(x) + j g(x)
• FFT is linear: H(u) = F(u) + j G(u)
![Page 6: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/6.jpg)
FFTImplementation
ImaginaryTangled
RealTangled
RealG
RealF
Imag.F
Imag.G
Scal
e
Scal
e
Rea
lU
ntan
gled
Rea
l, Ta
ngle
d
Imag
., Ta
ngle
d
Imag
inar
yU
ntan
gled
Scale Scale
R, F
I, F
R, G
I, G
ImaginaryTangled
RealTangled
RealG
RealF
Imag.F
Imag.G
Pass
Pass
Rea
lU
ntan
gled
Rea
l, Ta
n gle
d
Imag
., Ta
ngle
d
Imag
inar
yU
ntan
gled
Pass Pass
R, F
I, F
R, G
I, G
FFT FFTUntangle Untangle
FFT FFTUntangle Untangle
Frequency SpectraIm
ages
![Page 7: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/7.jpg)
BrookGPU
• From Stanford Univ.
• Compiler & runtime of Brook stream programming language
• Ease GPGPU programming
• Research program, still beta
![Page 8: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/8.jpg)
BrookGPU
• stream function: kernel
• stream datatype: float2, float4
• can do something like: streamSwap(s, s_out);
• Translate to embedded Cg program
• Rendering to offscreen pbuffer
• Nice to have in cluster environment
![Page 9: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/9.jpg)
Results & Issues
• export BRT_RUNTIME=[ogl|cpu]
• Brook’s CPU backend is way slow!
• 0.5 sec vs. 1 min 3.7 sec on 512x512 input
• Should compared to state-of-art CPU FFT implementation: http://www.fftw.org/
• Program complains when input size larger than 1024x1024: cannot allocate pbuffer
![Page 10: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/10.jpg)
GPU vs. CPU
![Page 11: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/11.jpg)
![Page 12: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/12.jpg)
Results
0
3.75
7.50
11.25
15.00
7 8 9 10 11 12
1 GPU 2 GPUs4 GPUs 8 GPUs16 GPUs
nVidia Quadro 3000
![Page 13: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/13.jpg)
Multiple GPUs?
• MPI + BrookGPU
• Can do bigger problem size
• Significant speedup?
• Meaningful for using GPUs?
![Page 14: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/14.jpg)
Some thoughts
• Use nVidia cards + Cg!
• Beneficial under certain situations
• Graphics clusters
• Graphics resource monitoring
• Available pbuffer size?
![Page 15: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/15.jpg)
Issues
• GPU today: Single-precision floating-point
• Need IEEE-compliant, double, even 64-bit?
• Exceptions: eg.divide-by-zero
• Size limitation
• Programming model
![Page 16: Fast Fourier Transform with BrookGPU](https://reader031.fdocuments.net/reader031/viewer/2022020702/61fb0ffc2e268c58cd59b60d/html5/thumbnails/16.jpg)
References• http://www.gpgpu.org/
• http://graphics.stanford.edu/projects/brookgpu/
• http://www.cs.unm.edu/~kmorel/documents/fftgpu/
• http://www.umiacs.umd.edu/research/GPU/
• http://libgpufft.sourceforge.net/
• http://graphics.stanford.edu/projects/gpubench/