April 4-7, 2016 | Silicon Valley NVIDIA GIE: HIGH-PERFORMANCE … · 2016. 4. 6. · April 4-7,...

April 4-7, 2016 | Silicon Valley

Michael Andersch, 7th April 2016

NVIDIA GIE: HIGH-PERFORMANCE GPU INFERENCE ENGINE

2

WHAT IS INFERENCE, ANYWAYS?

Building a deep neural network based application

Step 1: Use data to train the neural network - training

Step 2: Use the neural network to process unseen data - inference

3

INFERENCE VS TRAININGHow is inference different from training?

1. No backpropagation / static weights

enables graph optimizations, simplifies memory management

2. Tendency towards smaller batch sizes

harder to amortize weight loading, achieve high GPU utilization

3. Reduced precision requirements

provides opportunity for BW savings and accelerated arithmetic

4

OPTIMIZING SOFTWARE FOR INFERENCEExtracting every bit of performance

What’s running on the GPU: cuDNN optimizations

Support for standard tensor layouts and major frameworks

Available automatically and “for free”

How you use it: Framework optimizations

Every last bit of performance matters

Challenging due to framework structure

Changes to one framework don’t propagate to others

5

OPTIMIZING SOFTWARE FOR INFERENCEChallenge: Efficient small batch convolutions

Optimal convolution algorithm depends on convolution layer dimensions

Meta-parameters (data layouts, texture memory) afford higher performance

Using texture memory for convolutions: 13% inference speedup

(GoogLeNet, batch size 1)

0.73

1.84 1.83 2.03 2.07 2.261.92 1.98

1.25

conv 1.1 conv 1.2 conv 2.1 conv 2.2 conv 3.1 conv 3.2 conv 4.1 conv 4.2 conv 5.0

Winograd speedup over GEMM-based convolution (VGG-E layers, N=1)

6

OPTIMIZING SOFTWARE FOR INFERENCEChallenge: Graph optimization

tensor

concat

1x1 conv.3x3 conv. 5x5 conv. 1x1 conv.

1x1 conv. 1x1 conv. max pool

input

7

OPTIMIZING SOFTWARE FOR INFERENCEChallenge: Graph optimization

concat

max pool

input

next input

3x3 conv.

relu

bias

1x1 conv.

relu

bias

3x3 conv.

relu

bias

3x3 conv.

relu

bias

concat

1x1 conv.

relu

bias3x3 conv.

relu

bias

8

OPTIMIZING SOFTWARE FOR INFERENCEGraph optimization: Vertical fusion

concat

max pool

input

next input

concat

1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR 1x1 CBR

9

OPTIMIZING SOFTWARE FOR INFERENCEGraph optimization: Horizontal fusion

concat

max pool

input

next input

concat

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

10

OPTIMIZING SOFTWARE FOR INFERENCEGraph optimization: Concat elision

max pool

input

next input


1x1 CBR

11

OPTIMIZING SOFTWARE FOR INFERENCEGraph optimization: Concurrency

max pool

input

next input


1x1 CBR

12

OPTIMIZING SOFTWARE FOR INFERENCEChallenge: Effective use of cuBLAS intrinsics

Run GEMV instead of GEMM

Small batch sizes degrade N dimension

B matrix becomes narrow

Pre-transpose weight matrices

Allows using NN/NT GEMM, where NT > NN > TN

13

ACCELERATED INFERENCE ON PASCALSupport for fast mixed precision arithmetic

Inference products will support a new dedicated vector math instruction

Multi-element dot product, 8-bit integer inputs, 32-bit accumulator

4x the rate of equivalent FP32 operations

Full-speed FP32 processing for any layers that require higher precision

14

BUT WHO WILL IMPLEMENT IT?Introducing NVIDIA GIE: GPU Inference Engine

STRATEGYOPTIMIZATION ENGINE EXECUTION ENGINE

15

GPU INFERENCE ENGINE WORKFLOW

DIGITS TRAINING TOOLS

OPTIMIZATION ENGINE

EXECUTION ENGINE

STRATEGY

16

SUMMARYInference on the GPU

GPUs are a great platform for inference

Efficiency: great performance/watt

Scalability: from 3W to 300W

GPU-based inference affords …

… same performance in much tighter power envelope

… freeing up the CPU to do other work

Questions: [email protected], or find me after the talk!

Tesla M4 Hyperscale Accelerator

mailto:[email protected]

April 4-7, 2016 | Silicon Valley

THANK YOU

JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join

developer.nvidia.com/join

April 4-7, 2016 | Silicon Valley NVIDIA GIE: HIGH-PERFORMANCE … · 2016. 4. 6. · April 4-7,...

Documents

Transcript of April 4-7, 2016 | Silicon Valley NVIDIA GIE: HIGH-PERFORMANCE … · 2016. 4. 6. · April 4-7,...