POWERING THE DEEP LEARNING ECOSYSTEM

2

POWERING THE DEEP LEARNING ECOSYSTEM

COMPUTER VISION

OBJECT DETECTION IMAGE CLASSIFICATION

SPEECH & AUDIO

VOICE RECOGNITION LANGUAGE TRANSLATION

NATURAL LANGUAGE

PROCESSINGRECOMMENDATION

ENGINESSENTIMENT ANALYSIS

DEEP LEARNING FRAMEWORKS

Mocha.jl

NVIDIA DEEP LEARNING SDK

developer.nvidia.com/deep-learning-software

3

INFERENCE WITH TENSORRT

❑ Introduction of TensorRT

❑ TensorRT 7: What’s New

❑ TensorRT workflow

❑ TensorRT Plugin

❑ Plugin Sample

4

TENSORRT: GPU INFERENCE

ENGINE

A COMPLETE DL PLATFORM

MANAGE TRAIN DEPLOY

DIGITS

DATACENTER AUTOMOTIVE

TRAINTEST

MANAGE / AUGMENTEMBEDDED

GPU INFERENCE ENGINE

6

TensorRT works

at deploy stage

7

Why TensorRT ?

1.6L Engine

8

Why TensorRT ?

9

Why TensorRT ?

10

•

•

•

ONNX: Added ConstantOfShape, DequantizeLinear, Equal, Erf, Expand, Greater, GRU,

Less, Loop, LRN, LSTM, Not, PRelu, QuantizeLinear, RandomUniform,

RandomUniformLike, Range, RNN, Scan, Sqrt, Tile, and Where

1311

OPTIMIZATION ENGINE

EXECUTION ENGINE

PLANNEURAL NETWORK

Pre-trained FP32 model and network

Input

●

Output

● Optimized execution engine on GPU for deployment

Serialized a PLAN can be reloaded from the disk into the TensorRT runtime. There is no need to perform the optimization step again.

15

1

5

concat

max pool

next input

3x3 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias5x5 conv.

relu

bias

relu

bias

1x1 conv.

input

concat

16

1

6

concat

max pool

next input

1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR 1x1 CBR

input

concat

17

1

7

concat

max pool

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

input

concat

18

1

8

Concat elision

max pool

input

next input

1x1 CBR

1x1 CBR

3x3 CBR 5x5 CBR

19

1

9

Concurrency

max pool

input

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

21

main()

Build engine:

• create parser

• setWrokSpace

• setkType

• serialize

• free

PluginFactory:

• createPlugin

• deserialization plugin

implementation

• isPlugin

• free

MyPlugin:

• MyPlugin()/~Myplugin()

• getNbOutputs()

• getOutputDimensions()

• initialize()

• terminate()

• enqueue()--→ Cuda kernel function

• serialize()/deserialize()

do_inference:

• bind the buffers

• create GPU buffers

and a stream

• transfer data

• enqueue

• release the stream

and the buffers

24

IPluginV2Ext

IPluginV2IOExt

IPluginV2DynamicExt

IPluginCreator

25

•

•

•

•

26

•

•

•

•

•

• →

•

34

MyPlugin enqueueCuda

Kernel

39

•

•

•

•

••

•

•

40

https://developer.nvidia-china.com

THANK YOU

POWERING THE DEEP LEARNING ECOSYSTEM

Documents

Transcript of POWERING THE DEEP LEARNING ECOSYSTEM