GPU Processing for Distributed Live Video Database Jun Ye [email protected] Data Systems Group.

GPU Processing for Distributed Live Video

DatabaseJun Ye

[email protected]

Data Systems Group

Outline

• Introduction to GPU

• GPU language (OpenCL or CUDA)

• OpenCL programming

• Case Study: Live Video Database Management System (LVDBMS)

Introduction• Current GPUs are more than graphics cards for rendering the

images for video games.

• They are used for more general purposes of all kinds of parallel computing. (e.g. mining the Bitcoin, training the deep neural network in deep learning)

• GPGPU: general purpose GPU.

nVidia Tesla K20

nVidia Gforce GTX 580

GPU language

• Two main components: CUDA and OpenCL

• CUDA (2007)

• Compute Unified Device Architecture created and owned by nVidia

• OpenCL (2009)

• Open Computing Language. Designed by Apple and Khronos, public standard.

CUDA or OpenCL ?

• Proprietary

• Only work in nVidia’s card

• Normally has a higher performance without any tuning

• Open Standards

• A lot of hardware support: ATI, intel, Apple, nVidia, Qualcomm, Xilinx, and more…

• Heterogeneous: PC, mobile device, FPGA, DSP ….

• Performance is generally not as good as CUDA

• Needs knowledge of the hardware to tune the performance

Tip

One thing for sure:

ATI has a better support for OpenCL than nVidia.

OpenCL+ATI seems a better option than OpenCL+nVidia.

Brief intro to OpenCL Programming

• Best fit for problems of parallel computing (1D, 2D, 3D data)

• A big number of simple computations

• E.g. Array addition, matrix multiplication, image processing (e.g. Gaussian blur)

• Greatly enhance the speed by orders of magnitude (hardware specific)

• Overhead, resource initialization, GPU/CPU memory swap

OpenCL programmingGPU memory model

http://de.wikipedia.org/wiki/Datei:OpenCL_Memory_model.svg

OpenCL programmingGPU memory model

• NDrange configuration

• Global work size

• Local work size

• Thread

http://gpgpu-computing4.blogspot.com/2009/09/matrix-multiplication-2-opencl.html

OpenCL programmingcoding

• Host code: runs in the CPU (can be c/c++, python, matlab, javascript)

• Initialize resources,

• Config environment (global, local work item size, )

• Buffer swapping

• Kernel code: runs in the device (GPU) (kernel language: .cl)

• Execute the parallel computing

OpenCL programmingAn example (C)

• Matrix multiplication

• A,B are all 1024by 1024 square matrix,

• Compute C=AxB

OpenCL programming

Hosting code:

• #include <CL/cl.h>

• Initialize device

• clGetPlatformIDs

• clGetDeviceIDs

• clCreateContext

• clCreateCommandQueue

• Create program

• LoadOpenCLKernel(“*.cl”)

• clCreateProgramWithSource

• clBuildProgram

• clCreateKernel

OpenCL programmingHosting code: (opencl binding code)

• Create buffer

• clCreateBuffer

• clSetKernelArg

• Set localworksize (must consider the hardware specs)

• Set globalworksize (the dimension of your problem)

• Buffer enque

• clEnqueueNDRangeKernel

• Read result from kernel

• clEnqueueReadBuffer

OpenCL programming/* kernel.cl Matrix multiplication: C = A * B. */// OpenCL Kernel__kernel voidmatrixMul(__global float* C, __global float* A, __global float* B, int wA, int wB){ int tx = get_global_id(0); int ty = get_global_id(1); // value stores the element that is computed by the thread float value = 0; for (int k = 0; k < wA; ++k) { float elementA = A[ty * wA + k]; float elementB = B[k * wB + tx]; value += elementA * elementB; } // Write the matrix to device memory each // thread writes one element C[ty * wA + tx] = value;}

Demo

• I will show you the execution of the program

• And compare it against a naive CPU solution

• Source code available at http://www.es.ele.tue.nl/~mwijtvliet/5KK73/?page=mmopencl

Case Study

• 1. Realistic ray tracing rendering

• http://webcl.nokiaresearch.com/

• 2. Real-time 3D spatial-query in live video database

• http://www.eecs.ucf.edu/~jye/demo.html

• Jun Ye and Kien A. Hua, "Octree-based 3D Logic and Computation of Spatial Relationships in Live Video Query Processing," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 11 (2), December 2014.

• Jun Ye and Kien A. Hua, "Exploiting Depth Camera for 3D Spatial Relationship Interpretation," in proceedings of ACM Multimedia Systems 2013, Oslo, Norway.

http://webcl.nokiaresearch.com/

http://www.eecs.ucf.edu/~jye/demo.html

http://www.eecs.ucf.edu/~jye/demo.html

Real-time 3D spatial-query in live video database

• Background: A live video database management system

• Technique: Distributed Live video computing

• Components:Distributed 3D cameras (Microsoft Kinect)

Camera servers

Query processing servers


• 3D spatial operators

• GPU-accelerated computing algorithm


• Spatial-temporal event query

• E.g. a person walks out of a room and enter the room next door

Thank you.

• Questions?

GPU Processing for Distributed Live Video Database Jun Ye [email protected] Data Systems Group.

Documents

Transcript of GPU Processing for Distributed Live Video Database Jun Ye [email protected] Data Systems Group.