GPU Processing for Distributed Live Video Database Jun Ye [email protected] Data Systems Group.
-
Upload
rebecca-snow -
Category
Documents
-
view
220 -
download
2
Transcript of GPU Processing for Distributed Live Video Database Jun Ye [email protected] Data Systems Group.
Outline
• Introduction to GPU
• GPU language (OpenCL or CUDA)
• OpenCL programming
• Case Study: Live Video Database Management System (LVDBMS)
Introduction• Current GPUs are more than graphics cards for rendering the
images for video games.
• They are used for more general purposes of all kinds of parallel computing. (e.g. mining the Bitcoin, training the deep neural network in deep learning)
• GPGPU: general purpose GPU.
nVidia Tesla K20
nVidia Gforce GTX 580
GPU language
• Two main components: CUDA and OpenCL
• CUDA (2007)
• Compute Unified Device Architecture created and owned by nVidia
• OpenCL (2009)
• Open Computing Language. Designed by Apple and Khronos, public standard.
CUDA or OpenCL ?
• Proprietary
• Only work in nVidia’s card
• Normally has a higher performance without any tuning
• Open Standards
• A lot of hardware support: ATI, intel, Apple, nVidia, Qualcomm, Xilinx, and more…
• Heterogeneous: PC, mobile device, FPGA, DSP ….
• Performance is generally not as good as CUDA
• Needs knowledge of the hardware to tune the performance
Tip
One thing for sure:
ATI has a better support for OpenCL than nVidia.
OpenCL+ATI seems a better option than OpenCL+nVidia.
Brief intro to OpenCL Programming
• Best fit for problems of parallel computing (1D, 2D, 3D data)
• A big number of simple computations
• E.g. Array addition, matrix multiplication, image processing (e.g. Gaussian blur)
• Greatly enhance the speed by orders of magnitude (hardware specific)
• Overhead, resource initialization, GPU/CPU memory swap
OpenCL programmingGPU memory model
• NDrange configuration
• Global work size
• Local work size
• Thread
http://gpgpu-computing4.blogspot.com/2009/09/matrix-multiplication-2-opencl.html
OpenCL programmingcoding
• Host code: runs in the CPU (can be c/c++, python, matlab, javascript)
• Initialize resources,
• Config environment (global, local work item size, )
• Buffer swapping
• Kernel code: runs in the device (GPU) (kernel language: .cl)
• Execute the parallel computing
OpenCL programmingAn example (C)
• Matrix multiplication
• A,B are all 1024by 1024 square matrix,
• Compute C=AxB
OpenCL programming
Hosting code:
• #include <CL/cl.h>
• Initialize device
• clGetPlatformIDs
• clGetDeviceIDs
• clCreateContext
• clCreateCommandQueue
• Create program
• LoadOpenCLKernel(“*.cl”)
• clCreateProgramWithSource
• clBuildProgram
• clCreateKernel
OpenCL programmingHosting code: (opencl binding code)
• Create buffer
• clCreateBuffer
• clSetKernelArg
• Set localworksize (must consider the hardware specs)
• Set globalworksize (the dimension of your problem)
• Buffer enque
• clEnqueueNDRangeKernel
• Read result from kernel
• clEnqueueReadBuffer
OpenCL programming/* kernel.cl Matrix multiplication: C = A * B. */// OpenCL Kernel__kernel voidmatrixMul(__global float* C, __global float* A, __global float* B, int wA, int wB){ int tx = get_global_id(0); int ty = get_global_id(1); // value stores the element that is computed by the thread float value = 0; for (int k = 0; k < wA; ++k) { float elementA = A[ty * wA + k]; float elementB = B[k * wB + tx]; value += elementA * elementB; } // Write the matrix to device memory each // thread writes one element C[ty * wA + tx] = value;}
Demo
• I will show you the execution of the program
• And compare it against a naive CPU solution
• Source code available at http://www.es.ele.tue.nl/~mwijtvliet/5KK73/?page=mmopencl
Case Study
• 1. Realistic ray tracing rendering
• http://webcl.nokiaresearch.com/
• 2. Real-time 3D spatial-query in live video database
• http://www.eecs.ucf.edu/~jye/demo.html
• Jun Ye and Kien A. Hua, "Octree-based 3D Logic and Computation of Spatial Relationships in Live Video Query Processing," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 11 (2), December 2014.
• Jun Ye and Kien A. Hua, "Exploiting Depth Camera for 3D Spatial Relationship Interpretation," in proceedings of ACM Multimedia Systems 2013, Oslo, Norway.
Real-time 3D spatial-query in live video database
• Background: A live video database management system
• Technique: Distributed Live video computing
• Components:Distributed 3D cameras (Microsoft Kinect)
Camera servers
Query processing servers
Real-time 3D spatial-query in live video database
• 3D spatial operators
• GPU-accelerated computing algorithm
Real-time 3D spatial-query in live video database
• Spatial-temporal event query
• E.g. a person walks out of a room and enter the room next door