DeepStream SDK: Towards Large Scale Deployment of...

42
November 1, 2017 DeepStream SDK: Towards Large Scale Deployment of Intelligent Video Analytics Systems Jeremy Furtek

Transcript of DeepStream SDK: Towards Large Scale Deployment of...

Page 1: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

November 1, 2017

DeepStream SDK:Towards Large Scale Deployment of Intelligent Video Analytics Systems

Jeremy Furtek

Page 2: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

2

MOTIVATION

Page 3: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

3

Proliferation of Video Data

Content Filtering

Retail Analytics

Traffic Engineering

Law Enforcement

Ad Injection

Securing Critical Infrastructure

Parking Management

In-Vehicle Analytics

Page 4: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

4

Advances in Deep Learning: Classification and Beyond

Image Classification(AlexNet, 2012)

Object Detection(R-FCN, 2016)

Semantic Image Segmentation(ENet, 2016)

Image Compression(WaveOne, 2017)

Structure From Motion(SfM-Net, 2017)

Pose-Aware Face Recognition(USC, 2017)

Page 5: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

5

WHAT IS DEEPSTREAM?

Page 6: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

HOW IT WORKS

DEEP LEARNING

Page 7: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

HOW IT WORKS

VIDEO

DEEP LEARNING WITH DEEPSTREAM

DeepStream EnabledApp or Service

Page 8: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

VIDEO ANALYTICS WITH DEEP LEARNING ON NVIDIA GPUS: AN OVERVIEW

Page 9: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

9

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

(YCbCr image)

Video

Decode

1 0 1 1 0…

Page 10: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

10

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

(YCbCr image)

Video

Decode

1 0 1 1 0…

Video Decode

Fully-accelerated hardware decoding on NVIDIAGPUs (NVDEC) is accessible via the Video DecodeAPI.

Page 11: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

11

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

BGRimage

(YCbCr image)

Image

Conversion

Video

Decode

1 0 1 1 0…

Page 12: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

12

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

BGRimage

(YCbCr image)

Image

Conversion

Video

Decode

1 0 1 1 0…

Image Conversion

Custom CUDA kernels and/or optimized libraries (e.g. NVIDIA Performance Primitives, or NPP) enable high performance image conversions.

__global__ void resize(uint8_t* dst,

const uint8_t* src,

int dst_w,

int dst_h,

int src_w,

int src_h)

{

}

Page 13: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

13

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

BGRimage

(YCbCr image)

Image

Conversion

Video

Decode

1 0 1 1 0…

inferenceresults

cat

dog

tree

0.9

0.05

0.05

Inference

Page 14: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

14

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

BGRimage

(YCbCr image)

Image

Conversion

Video

Decode

1 0 1 1 0…

inferenceresults

cat

dog

tree

0.9

0.05

0.05

Inference

InferenceTensorRT* provides a neural network optimizer and a high performance runtime library for inference deployment on NVIDIA GPUs.

Step 1: Optimize a trained neural network Step 2: Perform real-time inference with the TensorRT runtime

*The current DeepStream release supports TensorRT 2.1.

Page 15: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

15

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

BGRimage

inferenceresults

(YCbCr image)

cat

dog

tree

0.9

0.05

0.05

Image

ConversionInference

Display /

StorageVideo

Decode

1 0 1 1 0…

Page 16: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

16

From Video to Analytics With Deep Learning…

compressed video

bitstream

decoded videoframe

BGRimage

inferenceresults

(YCbCr image)

cat

dog

tree

0.9

0.05

0.05

Image

ConversionInference

Display /

StorageVideo

Decode

1 0 1 1 0…

Display/StorageProcessing of inference output is application dependent.

Some common options are:

• Real-time display with overlay• Alert generation• Database insertion• Augmented video encoding• “Slew-to-Cue” camera control• Tracker update• Custom post-processing algorithms (CUDA, OpenCV, …)

Page 17: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

17

Video to Analytics With the DeepStream SDK

DeepStream runtime library (libdeepstream)

compressed video

bitstream

decoded videoframe

BGRimage

inferenceresults

(YCbCr image)

cat

dog

tree

0.9

0.05

0.05

Image

ConversionInference

Display /

StorageVideo

Decode

1 0 1 1 0…

Page 18: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

18

USING THE DEEPSTREAM LIBRARY

Page 19: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

19

DeepStream from the Ground Up

DeepStream tensors are objects that implement the IStreamTensor interface and represent a 4-dimensional array of values stored in NCHW order.

Tensors

compressed video

bitstream

W

H

C

N

typedef enum {

FLOAT_TENSOR, // float32

NV12_FRAME, // decoded YCbCr

OBJ_COORD, // detection bounding box

CUSTOMER_TYPE // user defined

} TENSOR_TYPE;

typedef enum {

GPU_DATA, // device memory

CPU_DATA, // host memory

} MEMORY_TYPE;

IStreamTensor* createStreamTensor(int maxlen,

size_t elemsize,

TENSOR_TYPE Ttype,

MEMORY_TYPE Mtype,

int deviceID);

Page 20: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

20

DeepStream from the Ground Up

DeepStream modules are objects that implement the IModule interface and operate on one or more input tensors to produce one or more output tensors.

Modules

void IModule::execute(const ModuleContext& ctx,

const std::vector<IStreamTensor*>& in,

const std::vector<IStreamTensor*>& out) = 0;

DeepStream

Module

input[0]

input[1]

input[M -1]

output[0]

output[1]

output[N -1]

Page 21: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

21

DeepStream from the Ground Up

DeepStream modules maintain connectivity information using a std::vector of <IModule*, int> pairs that represent the inputs to the module.

Building the DeepStream dataflow graph

Module CModule A A.output[0]

A.output[1]

Module B

output[0]

B.output[0]

B.output[1]

C.input[0]

C.input[1]

IModule* index

&A 1

&B 0

Inputs:

Page 22: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

22

Using the DeepStream Library

// Create a device worker

IDeviceWorker* pDW = createDeviceWorker(1, // num channels

0); // GPU device ID

Step 1: Create a device worker object

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Page 23: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

23

Using the DeepStream Library

// Create a device worker

IDeviceWorker* pDW = createDeviceWorker(1, 0);

// Add a decode task, specifying the CODEC type

pDW->addDecodeTask(cudaVideoCodec_H264);

Step 2: Add a decode task

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

compressed video

bitstream

decoded videoframe

(YCbCr image)

Video

Decode

1 0 1 1 0…

Page 24: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

24

Using the DeepStream Library

// Create a device worker

IDeviceWorker* pDW = createDeviceWorker(1, 0);

// Add a decode task, specifying the CODEC type

pDW->addDecodeTask(cudaVideoCodec_H264);

// Add a module for color conversion.

// Color conversion module outputs:

// 0: BGR_PLANAR

// 1: NV12 (YCbCr)

IModule* pCC = pDW->addColorSpaceConvertorTask(BGR_PLANAR);

Step 3: Add an image conversion module

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

compressed video

bitstream

decoded videoframe

BGRimage

(YCbCr image)

Image

Conversion

Video

Decode

1 0 1 1 0…

Caffe uses BGR ordering(not RGB)

Page 25: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

25

Inference in DeepStream: CAFFE and TensorRT

solver

parameters

network

description

(training)

CAFFE

caffe train –solver solver.prototxt

trained

weightssolver.prototxt

network.prototxt

trained.caffemodel(BINARYPROTO)

TensorRT

network

description

(deploy)

deploy.protoxt

CUDA

Engine

in-memory(serializable)

Page 26: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

26

Using the DeepStream Library

// Add a module for color conversion.

// Color conversion module outputs:

// 0: BGR_PLANAR

// 1: NV12 (YCbCr)

IModule* pCC = pDW->addColorSpaceConvertorTask(BGR_PLANAR);

// Add an inference module using output 0 from color conversion

std::vector<std::pair<IModule*,int>> inf_inputs{std::make_pair(pCC, 0)};

IModule* pInf = pDW->addInferenceTask(inf_inputs,

prototxt_file.c_str(),

model_file.c_str(),

input_layer_name,

output_layer_names_vector,

num_channels,

&inference_params);

Step 4: Add an inference module

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

compressed video

bitstream

decoded videoframe

BGRimage

inferenceresults

(YCbCr image)

cat

dog

tree

0.9

0.05

0.05

Image

ConversionInference

Video

Decode

1 0 1 1 0…

Page 27: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

27

Using the DeepStream Library

// Add a parser module using output 0 from inference

std::vector<std::pair<IModule*,int>> parser_inputs;

parser_inputs.push_back(std::make_pair(pInf, 0));

IModule* pParse = new UserInferenceParser(parser_inputs);

pDW->addCustomerTask(pParse);

// Add a display module using image and parser outputs

std::vector<std::pair<IModule*,int>> display_inputs;

display_inputs.push_back(std::make_pair(pCC, 0));

display_inputs.push_back(std::make_pair(pParse, 0));

IModule* pDisp = new UserDisplayModule(display_inputs);

pDW->addCustomerTask(pDisp);

Step 5: Add custom output processor(s)

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

compressed video

bitstream

decoded videoframe

BGRimage

inferenceresults

(YCbCr image)

Image

ConversionInference Parser

Video

Decode

1 0 1 1 0…

Display /

Storage

Page 28: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

28

Using the DeepStream Library

// Add a display module using image conversion and parser outputs

std::vector<std::pair<IModule*,int>> display_inputs;

display_inputs.push_back(std::make_pair(pCC, 0)); // BGR output

display_inputs.push_back(std::make_pair(pParse, 0));

IModule* pDisp = new UserDisplayModule(display_inputs);

pDW->addCustomerTask(pDisp);

// Start the device worker...

pDW->start();

Step 6: Start the device worker

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

Page 29: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

29

Using the DeepStream Library

// Start the device worker...

pDW->start();

// Initiate data input by pushing data to the device worker

while(1)

{

// Read data from file/socket/database...

pDW->pushPacket(data, num_bytes, channel_num);

}

Step 7: Initiate data input

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

Page 30: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

30

SCALING WITH DEEPSTREAM

Page 31: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

31

DECODE PERFORMANCE ON NVIDIA GPUS

Page 32: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

32

Scaling Video Analytics With DeepStream: Multiple Channels

GPU 0DeepStream runtime library (libdeepstream)

Display /

Storage

Image Conversion InferenceVideo Decode

Image Conversion InferenceVideo Decode

Image Conversion InferenceVideo Decode

Image Conversion InferenceVideo Decode

… … … ……

channel 0

channel 1

channel 2

channel 3

Page 33: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

33

Faster Inference with INT8 Integers

GPUs with Compute Capability 6.1 and higher support dot product instructions with 8-bit operands. (Examples of GPUs with this capability are Tesla P4 and P40.)

Page 34: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

34

Inference with INT8: Accuracy

From: “8-bit Inference with TensorRT,” Szymon Migacz, NVIDIA, GTC2017.

Page 35: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

35

Inference with INT8: Performance

From: “8-bit Inference with TensorRT,” Szymon Migacz, NVIDIA, GTC2017.

Page 36: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

36

Using INT8 Inference in DeepStreamStep 1: Run TensorRT offline to generate a calibration table file

trained

weights

trained.caffemodel(BINARYPROTO)

TensorRT

network

description

(deploy)

deploy.protoxt

CUDA

Engine

calibration

table

Page 37: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

37

Using int8 Inference in DeepStreamStep 2: Provide the path to the calibration file when creating the

inference task

// Add an inference module using output 0 from color conversion

std::vector<std::pair<IModule*,int>> inf_inputs{std::make_pair(pCC, 0)};

inference_params.calibrationTableFile = calibration_file.c_str();

IModule* pInf = pDW->addInferenceTask(inf_inputs,

prototxt_file.c_str(),

model_file.c_str(),

input_layer_name,

output_layer_names_vector,

num_channels,

&inference_params);

Page 38: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

38

GPU 1

Scaling Video Analytics With DeepStream: Multiple GPUs

GPU 0DeepStream device worker (device ID = 0)

Display /

Storage

Image Conversion InferenceVideo Decode

Image Conversion InferenceVideo Decode

… … … ……

channel 0

channel 1

DeepStream device worker (device ID = 1)

Image Conversion InferenceVideo Decode

Image Conversion InferenceVideo Decode

… … … ……

channel 0

channel 1

Page 39: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

39

Examples (Videos)

Page 40: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

40

DeepStream Resources• DeepStream SDK Download

https://developer.nvidia.com/deepstream-sdk

• NVIDIA Parallel For All Blog:

DeepStream: Next Generation Video Analytics for Smart Citieshttps://devblogs.nvidia.com/parallelforall/deepstream-video-analytics-smart-cities/

• NVIDIA Developer Forums

https://devtalk.nvidia.com/

• Caffe Model Zoo

https://github.com/BVLC/caffe/wiki/Model-Zoo

• NVIDIA Video Encode and Decode GPU Support Matrix

https://developer.nvidia.com/video-encode-decode-gpu-support-matrix

Page 41: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain

Fundamentals

Parallel Computing

Game Development & Digital Content

Finance

NVIDIA DEEP LEARNING INSTITUTE

Training available as online self-paced labs and instructor-led workshops

Take self-paced labs at www.nvidia.com/dlilabs

Find or request an instructor-led workshop at www.nvidia.com/dli

Educators can download the Teaching Kit at developer.nvidia.com/teaching-kit and contact [email protected] for info on the University Ambassador Program

Intelligent Video Analytics

Healthcare

Robotics

Autonomous Vehicles

Virtual Reality

Page 42: DeepStream SDK: Towards Large Scale Deployment of ...on-demand.gputechconf.com/gtcdc/2017/presentation/dc7198-jerem… · DeepStream from the Ground Up DeepStream modules maintain