POWERING THE DEEP LEARNING ECOSYSTEM
Transcript of POWERING THE DEEP LEARNING ECOSYSTEM
2
POWERING THE DEEP LEARNING ECOSYSTEM
COMPUTER VISION
OBJECT DETECTION IMAGE CLASSIFICATION
SPEECH & AUDIO
VOICE RECOGNITION LANGUAGE TRANSLATION
NATURAL LANGUAGE
PROCESSINGRECOMMENDATION
ENGINESSENTIMENT ANALYSIS
DEEP LEARNING FRAMEWORKS
Mocha.jl
NVIDIA DEEP LEARNING SDK
developer.nvidia.com/deep-learning-software
3
INFERENCE WITH TENSORRT
❑ Introduction of TensorRT
❑ TensorRT 7: What’s New
❑ TensorRT workflow
❑ TensorRT Plugin
❑ Plugin Sample
4
TENSORRT: GPU INFERENCE
ENGINE
A COMPLETE DL PLATFORM
MANAGE TRAIN DEPLOY
DIGITS
DATACENTER AUTOMOTIVE
TRAINTEST
MANAGE / AUGMENTEMBEDDED
GPU INFERENCE ENGINE
6
TensorRT works
at deploy stage
7
Why TensorRT ?
1.6L Engine
8
Why TensorRT ?
9
Why TensorRT ?
10
•
•
•
ONNX: Added ConstantOfShape, DequantizeLinear, Equal, Erf, Expand, Greater, GRU,
Less, Loop, LRN, LSTM, Not, PRelu, QuantizeLinear, RandomUniform,
RandomUniformLike, Range, RNN, Scan, Sqrt, Tile, and Where
11
12
1311
OPTIMIZATION ENGINE
EXECUTION ENGINE
PLANNEURAL NETWORK
Pre-trained FP32 model and network
Input
●
Output
● Optimized execution engine on GPU for deployment
Serialized a PLAN can be reloaded from the disk into the TensorRT runtime. There is no need to perform the optimization step again.
14
15
1
5
concat
max pool
next input
3x3 conv.
relu
bias
1x1 conv.
relu
bias
1x1 conv.
relu
bias
1x1 conv.
relu
bias5x5 conv.
relu
bias
relu
bias
1x1 conv.
input
concat
16
1
6
concat
max pool
next input
1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR 1x1 CBR
input
concat
17
1
7
concat
max pool
next input
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR
input
concat
18
1
8
Concat elision
max pool
input
next input
1x1 CBR
1x1 CBR
3x3 CBR 5x5 CBR
19
1
9
Concurrency
max pool
input
next input
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR
20
21
main()
Build engine:
• create parser
• setWrokSpace
• setkType
• serialize
• free
PluginFactory:
• createPlugin
• deserialization plugin
implementation
• isPlugin
• free
MyPlugin:
• MyPlugin()/~Myplugin()
• getNbOutputs()
• getOutputDimensions()
• initialize()
• terminate()
• enqueue()--→ Cuda kernel function
• serialize()/deserialize()
do_inference:
• bind the buffers
• create GPU buffers
and a stream
• transfer data
• enqueue
• release the stream
and the buffers
22
23
24
IPluginV2Ext
IPluginV2IOExt
IPluginV2DynamicExt
IPluginCreator
25
•
•
•
•
26
•
•
•
•
•
• →
•
27
28
29
30
31
32
33
34
MyPlugin enqueueCuda
Kernel
35
36
37
38
39
•
•
•
•
••
•
•
40
https://developer.nvidia-china.com
THANK YOU