FIR filter on GPU
Click here to load reader
-
Upload
alexey-smirnov -
Category
Technology
-
view
4.090 -
download
2
description
Transcript of FIR filter on GPU
![Page 1: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/1.jpg)
An Implementation of a FIR Filter on a GPU
Alexey Smirnov and Tzi-cker Chiueh
ECSL Research Seminar9/13/05
![Page 2: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/2.jpg)
Outline
Introduction GPU Computing Overview Related Work FIR Filter Definition FIR Filter Implementation on GPU Performance Evaluation Conclusion
![Page 3: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/3.jpg)
Introduction
Numerical algorithms often perform repeated computations on vectors of elements.
Parallel computation improves performance.
x86: MMX, SSE, SSE2, SSE3. Video cards are now
programmable.
![Page 4: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/4.jpg)
Computation and Bandwidth Rates Video cards have higher GFLOPs
rate and memory bandwidth compared to CPU.
However, data copying between main memory and video memory can reduce performance.
![Page 5: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/5.jpg)
GPU Computing Background Rendering pipeline:
User program defines vertex and texture coordinates.
Vertex processor converts vertex attributes from world coordinate system into screen coordinate system.
Fragment processor computes color of each output pixel using textures and color.
Interpolation defines coordinates and color for each pixel.
Vertex and fragment processors are programmable for example in C-like language Cg.
![Page 6: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/6.jpg)
Rendering APIs OpenGL (Linux, Windows, MacOS)
and DirectX (Windows). OpenGL extensions allow to use
advanced features of a video card. NV_float_buffer supports floating-
point textures. ARB_render_texture allows to
render to a texture instead of the screen.
![Page 7: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/7.jpg)
GPU Program Architecture Create floating-point textures that contain
input data and load them into video memory; Load the fragment program and enable multi-
texturing; Define vertex and texture coordinates; Draw the figure to an off-screen buffer; If the results were rendered to an off-screen
buffer then copy the image to a texture using glCopyTexSubImage2D().
Go to step 3 if more iterations needed. Use glGetTexImage() to copy data from video
memory to main memory.
![Page 8: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/8.jpg)
Input Data Representation Matrices are represented as textures
naturally. Four elements per pixel (R, G, B, A).
Vectors are wrapped into matrices. Textures have maximum dimensions.
![Page 9: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/9.jpg)
Related Work Four papers describing matrix
multiplication; Linear algebra operations; Array sorting; FFT; Earlier papers concluded that the CPU is
more efficient then GPU. Recent video cards, e.g. GeForce 7800
and ATI X800 XT do better than CPU.
![Page 10: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/10.jpg)
FIR Filter Definition
Finite Impulse Response (FIR) filter is used in audio processing.
We modified GNU Radio – an open-source software implementing Software Defined Radio.
![Page 11: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/11.jpg)
Other Relevant Transformations
Hilbert transformation:
Frequency translation FIR filter:
![Page 12: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/12.jpg)
FIR Filter on a GPU
![Page 13: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/13.jpg)
FIR Filter’s Loop Initialization:
Loop iteration:
![Page 14: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/14.jpg)
FIR Filter’s Loop
O(j+1)=O(j)+MI
Final output value is computed as
![Page 15: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/15.jpg)
Fragment Program
![Page 16: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/16.jpg)
Optimizations Break loop into two to get rid of
conditional expression; Unroll loop body w/ and w/o
conditional expression; Process two rows of input and
textures; Use different texture units in
unrolled loops; Nothing of the above improved
performance.
![Page 17: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/17.jpg)
Performance Evaluation: FIR Filter
![Page 18: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/18.jpg)
Performance of FreqXlating FIR Filter
![Page 19: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/19.jpg)
Performance of Hilbert Transformation
![Page 20: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/20.jpg)
Conclusion Not everything improves from GPU
optimization. CPU optimization tricks do not work on
GPU. Texture upload/download takes up to
60% of total time. GPU computation can take several
seconds compared to millisecond time to render a frame in a game.
![Page 21: FIR filter on GPU](https://reader038.fdocuments.net/reader038/viewer/2022102421/546a0049af7959e8488b504f/html5/thumbnails/21.jpg)
Future Work QoS for GPU: can application
specify maximum latency or share of GPU resources?
Work offload from CPU to GPU: is it possible to build a compiler that can automatically decide what is worth GPU optimization?
Debugging support: a lot of tools for Windows, none for Linux.