E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15...

24
nn p z p L n ng F E8270 – INSIDE NVIDIA GPU CLOUD CONTAINERS

Transcript of E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15...

Page 1: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

Chris Kawalek, NVIDIA GPU Cloud Product Team, NVIDIAMichael O’Connor, Optimized Deep Learning Frameworks, NVIDIA

E8270 – INSIDE NVIDIA GPU CLOUD CONTAINERS

Page 2: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

2

AGENDA

The Difficulty With Complex Software

Running In Different Environments

Why Containers

Diving Into NGC Deep Learning Containers

Q&A

Page 3: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

3

CHALLENGES WITH COMPLEX SOFTWARE

Current DIY GPU-accelerated AI and HPC deployments are complex and time consuming to build, test and maintain

Development of software frameworks by the community is moving very fast

Requires high level of expertise to manage driver, library, framework dependencies

NVIDIA Libraries

NVIDIA Container

Runtime for Docker

NVIDIA Driver

NVIDIA GPU

Applications or

Frameworks

Page 4: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

4

NVIDIA GPU CLOUD (NGC)

Over 35 GPU-Accelerated ContainersDeep learning, HPC applications, HPC visualization tools, and partner applications

Innovate in Minutes, Not WeeksPre-configured, ready-to-run

Run AnywhereNVIDIA GPUs on the top cloud providers, NVIDIA DGX Systems, and PCs and workstations

Simple Access to GPU-Accelerated Software

Page 5: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

5

A CONSISTENT, HYBRID CLOUD EXPERIENCE ACROSS COMPUTE PLATFORMS

Page 6: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

9

������ ���������� ����%������������������ ��� ����� �������!������+��.�������.�����

Page 7: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

7

WORK AT SCALE ON AI SUPERCOMPUTERSNGC Containers Run on NVIDIA DGX Systems

Page 8: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

8

DEVELOP ON NVIDIA TITAN & NVIDIA QUADRONGC Containers Run on PCs and Workstations with Select NVIDIA GPUs

Page 9: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

9

WHY CONTAINERS?

Benefits of Containers:

Simplify deployment of GPU-accelerated software, eliminating time-consuming software integration work

Isolate individual deep learning frameworks and HPC applications

Share, collaborate, and test applications across different environments

9

Page 10: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

10

CONTINUAL EXPANSION

October 2017 October 2018

36 containers

10 containers

bigdft

candle

chroma

gamess

gromacs

lammps

lattice-microbes

MILC

namd

pgi

picongpu

relion

vmd

caffe

caffe2

cntk

cuda

digits

mxnet

pytorch

tensorflow

tensorrt

tensorrtserver

theano

torch

index

paraview-holodeck

paraview-index

paraview-optix

chainer

h20ai-driverless

kinetica

mapd

matlab

paddlepaddle

Deep Learning HPC HPC Visualization PartnersNVIDIA/K8s

Kubernetes

on NVIDIA GPUs

10 containersNEW

CONTAINERS

Page 11: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

11

USING NGC CONTAINERS

Data Scientists and

ResearchersDevelopers

Eliminate setup time, focus on

science and research

Work with the latest software with

a known good starting point

Sysadmins

Deploy to production

immediately

Page 12: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

12

VIRTUAL MACHINES VS. CONTAINERS

Packaging and deployment mechanism for applications

▶ Consistent and reproducible deployment

▶ Lightweight and lower overhead than VMs

▶ Logical isolation from other applications

Motivation

Image credits

Page 13: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

13

EXAMPLE NGC CONTAINER WORKFLOW

NVIDIA builds application image composed of layers of files

Image(s) tested and released to NGC repository hosted at URLs like nvcr.io/nvidia/tensorflow

User pulls image to a machine and runs it

Image cached and OS isolated set of resources allocated (container) in which to execute

Data & results accessed as a filesystem volume

NGC

$ docker run nvcr.io/…

101010

Page 14: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

14

ANATOMY OF AN NGC CONTAINER IMAGE

ubuntu:16.04

Image Layers (R/O)

f2233041f557

145c1bf7947a

0c395732af81

fb91e851e672

R/W Layer

NVIDIA DeepLearning SDK

NVIDIA CUDA SDK

DL Framework & Source

Examples & Scripts

Page 15: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

15

ALWAYS UP-TO-DATEMonthly Releases from NVIDIA

18.09 18.08

Supported Platform DGX OS 4.0.1 and 3.1.2+ 3.1.2+ and 2.1.1+

NVIDIA Driver 410 and 384 384

Base Image Ubuntu 16.04 16.04

CUDA 10.0.130 9.0.176

cuBLAS 10.0.130 9.0.425 (aka Patch 4)

cuDNN 7.3.0 7.2.1

NCCL 2.3.4 2.2.1

NVIDIA Optimized Frameworks NVCaffe 0.17.1 for Python 3.5 0.17.1 for Python 2.7

DIGITS 6.1.1 6.1.1

MXNet 1.3 for Python 3.51.2.0+ for Python 2.7 and

Python 3.5

PyTorch 0.4.1++ for Python 3.6 0.4.1+ for Python 3.6

TensorFlow1.10 for Python 2.7 and Python

3.5 (TensorRT 5.0.0)

1.9.0 for Python 2.7 and

Python 3.5 (TensorRT 4.0.1)

TensorRT 5.0.0 4.0.1

TensorRT Server 0.6 0.5

TensorFlow for Jetson 1.10 on JetPack 4.1 for Xavier 1.9.0 on Jetpack 3.2 for TX2

Page 16: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

16

CUDA COMPATIBILITY – UPGRADE PATHS

NEW Forward Compatibility Option

Upgrade only user-mode CUDA components*

CUDA Toolkit

and Runtime

CUDA Toolkit

and RuntimeUpgrade

CUDA 9.0

GPU Kernel

Mode Driver –

nvidia.ko

GPU Kernel

Mode Driver –

nvidia.ko

CUDA User

Mode Driver –

libcuda.so

CUDA User

Mode Driver –

libcuda.so

R384 Driver R410 Driver

CUDA 10.0

Upgrade

New compatibility platform upgrade path available

� Use newer CUDA toolkits on older driver installs

� Compatibility only with specific older driver versions

System requirements

� Tesla GPU support only – no Quadro or GeForce

� Only available on Linux

Starting with CUDA 10.0

*requires new ‘cuda-compat-10-0’ package

Page 17: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

17

BEST NVIDIA PERFORMANCEOver 12 months, up to 1.8X improvement with mixed-precision on ResNet-50

Page 18: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

18

BEST NVIDIA PERFORMANCE2.0X improvement with mixed-precision on ResNet-50 from DGX-1 to DGX2

Page 19: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

19

TARGET SYSTEM SETUP

NGC Virtual Machine ImagesNVIDIA Deep Learning for Volta (AWS–EC2 AMI)

Pre-installedUp-to-date Ubuntu Server OS

CUDA DriversNVIDIA Container Runtime

NGC Container Ready BaseOSOn all DGX Systems

Self-Install Setup Guide

NGC Examples and Management Scriptshttps://github.com/nvidia/ngc-examples

Page 20: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

20

LOG INTO NGC, PULL AND RUN11

33 Browse For Image

Create Account / Log In

22 Get API Key

Log in on Machine & Run

$ docker login nvcr.io

Username: $oauthtoken

Password: *******

$ docker run -it nvcr.io/nvidia/tensorflow:18.09

44

Page 21: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

21

RUNNING CONTAINERS WITH DATA

101010

nvcri.io/nvidia/tensorflow:18.02

/mnt/ssd/large_dataset/workspace/large_dataset

$ docker run –-rm –it nvcr.io/… -volume /mnt/ssd/large_dataset:/workspace/large_dataset

Page 22: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

22

NVIDIA GPU CLOUDGPU-Accelerated Containers for Deep Learning, HPC, and HPC Visualization

Innovate In Minutes, Not Weeks

Run Anywhere

Comprehensive Library of GPU-Accelerated Containers

Page 23: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX

23

Q & A

Page 24: E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15 ALWAYS UP-TO-DATE Monthly Releases from NVIDIA 18.09 18.08 Supported Platform DGX