E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15...

Chris Kawalek, NVIDIA GPU Cloud Product Team, NVIDIAMichael O’Connor, Optimized Deep Learning Frameworks, NVIDIA

E8270 – INSIDE NVIDIA GPU CLOUD CONTAINERS

AGENDA

The Difficulty With Complex Software

Running In Different Environments

Why Containers

Diving Into NGC Deep Learning Containers

CHALLENGES WITH COMPLEX SOFTWARE

Current DIY GPU-accelerated AI and HPC deployments are complex and time consuming to build, test and maintain

Development of software frameworks by the community is moving very fast

Requires high level of expertise to manage driver, library, framework dependencies

NVIDIA Libraries

NVIDIA Container

Runtime for Docker

NVIDIA Driver

NVIDIA GPU

Applications or

Frameworks

NVIDIA GPU CLOUD (NGC)

Over 35 GPU-Accelerated ContainersDeep learning, HPC applications, HPC visualization tools, and partner applications

Innovate in Minutes, Not WeeksPre-configured, ready-to-run

Run AnywhereNVIDIA GPUs on the top cloud providers, NVIDIA DGX Systems, and PCs and workstations

Simple Access to GPU-Accelerated Software

A CONSISTENT, HYBRID CLOUD EXPERIENCE ACROSS COMPUTE PLATFORMS

�� %�� !��+��.��.��

WORK AT SCALE ON AI SUPERCOMPUTERSNGC Containers Run on NVIDIA DGX Systems

DEVELOP ON NVIDIA TITAN & NVIDIA QUADRONGC Containers Run on PCs and Workstations with Select NVIDIA GPUs

WHY CONTAINERS?

Benefits of Containers:

Simplify deployment of GPU-accelerated software, eliminating time-consuming software integration work

Isolate individual deep learning frameworks and HPC applications

Share, collaborate, and test applications across different environments

CONTINUAL EXPANSION

October 2017 October 2018

36 containers

10 containers

bigdft

candle

chroma

gamess

gromacs

lammps

lattice-microbes

picongpu

relion

caffe2

digits

pytorch

tensorflow

tensorrt

tensorrtserver

theano

paraview-holodeck

paraview-index

paraview-optix

chainer

h20ai-driverless

kinetica

matlab

paddlepaddle

Deep Learning HPC HPC Visualization PartnersNVIDIA/K8s

Kubernetes

on NVIDIA GPUs

10 containersNEW

CONTAINERS

USING NGC CONTAINERS

Data Scientists and

ResearchersDevelopers

Eliminate setup time, focus on

science and research

Work with the latest software with

a known good starting point

Sysadmins

Deploy to production

immediately

VIRTUAL MACHINES VS. CONTAINERS

Packaging and deployment mechanism for applications

▶ Consistent and reproducible deployment

▶ Lightweight and lower overhead than VMs

▶ Logical isolation from other applications

Motivation

Image credits

EXAMPLE NGC CONTAINER WORKFLOW

NVIDIA builds application image composed of layers of files

Image(s) tested and released to NGC repository hosted at URLs like nvcr.io/nvidia/tensorflow

User pulls image to a machine and runs it

Image cached and OS isolated set of resources allocated (container) in which to execute

Data & results accessed as a filesystem volume

$ docker run nvcr.io/…

101010

ANATOMY OF AN NGC CONTAINER IMAGE

ubuntu:16.04

Image Layers (R/O)

f2233041f557

145c1bf7947a

0c395732af81

fb91e851e672

R/W Layer

NVIDIA DeepLearning SDK

NVIDIA CUDA SDK

DL Framework & Source

Examples & Scripts

ALWAYS UP-TO-DATEMonthly Releases from NVIDIA

18.09 18.08

Supported Platform DGX OS 4.0.1 and 3.1.2+ 3.1.2+ and 2.1.1+

NVIDIA Driver 410 and 384 384

Base Image Ubuntu 16.04 16.04

CUDA 10.0.130 9.0.176

cuBLAS 10.0.130 9.0.425 (aka Patch 4)

cuDNN 7.3.0 7.2.1

NCCL 2.3.4 2.2.1

NVIDIA Optimized Frameworks NVCaffe 0.17.1 for Python 3.5 0.17.1 for Python 2.7

DIGITS 6.1.1 6.1.1

MXNet 1.3 for Python 3.51.2.0+ for Python 2.7 and

Python 3.5

PyTorch 0.4.1++ for Python 3.6 0.4.1+ for Python 3.6

TensorFlow1.10 for Python 2.7 and Python

3.5 (TensorRT 5.0.0)

1.9.0 for Python 2.7 and

Python 3.5 (TensorRT 4.0.1)

TensorRT 5.0.0 4.0.1

TensorRT Server 0.6 0.5

TensorFlow for Jetson 1.10 on JetPack 4.1 for Xavier 1.9.0 on Jetpack 3.2 for TX2

CUDA COMPATIBILITY – UPGRADE PATHS

NEW Forward Compatibility Option

Upgrade only user-mode CUDA components*

CUDA Toolkit

and Runtime

CUDA Toolkit

and RuntimeUpgrade

CUDA 9.0

GPU Kernel

Mode Driver –

nvidia.ko

GPU Kernel

Mode Driver –

nvidia.ko

CUDA User

Mode Driver –

libcuda.so

CUDA User

Mode Driver –

libcuda.so

R384 Driver R410 Driver

CUDA 10.0

Upgrade

New compatibility platform upgrade path available

� Use newer CUDA toolkits on older driver installs

� Compatibility only with specific older driver versions

System requirements

� Tesla GPU support only – no Quadro or GeForce

� Only available on Linux

Starting with CUDA 10.0

*requires new ‘cuda-compat-10-0’ package

BEST NVIDIA PERFORMANCEOver 12 months, up to 1.8X improvement with mixed-precision on ResNet-50

BEST NVIDIA PERFORMANCE2.0X improvement with mixed-precision on ResNet-50 from DGX-1 to DGX2

TARGET SYSTEM SETUP

NGC Virtual Machine ImagesNVIDIA Deep Learning for Volta (AWS–EC2 AMI)

Pre-installedUp-to-date Ubuntu Server OS

CUDA DriversNVIDIA Container Runtime

NGC Container Ready BaseOSOn all DGX Systems

Self-Install Setup Guide

NGC Examples and Management Scriptshttps://github.com/nvidia/ngc-examples

LOG INTO NGC, PULL AND RUN11

33 Browse For Image

Create Account / Log In

22 Get API Key

Log in on Machine & Run

$ docker login nvcr.io

Username: $oauthtoken

Password: *******

$ docker run -it nvcr.io/nvidia/tensorflow:18.09

RUNNING CONTAINERS WITH DATA

101010

nvcri.io/nvidia/tensorflow:18.02

/mnt/ssd/large_dataset/workspace/large_dataset

$ docker run –-rm –it nvcr.io/… -volume /mnt/ssd/large_dataset:/workspace/large_dataset

NVIDIA GPU CLOUDGPU-Accelerated Containers for Deep Learning, HPC, and HPC Visualization

Innovate In Minutes, Not Weeks

Run Anywhere

Comprehensive Library of GPU-Accelerated Containers

E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15...

Documents

Transcript of E8270 –INSIDE NVIDIA GPU CLOUD CONTAINERSon-demand.gputechconf.com/gtc-eu/2018/pdf/e8270... · 15...

Metro Katalog Akcija 18.08.-31.08.2011.D

Current Affairs Quiz 18.09 - Cranbourne School

DL210 short manual English from V1.00, 18.09

Clipping Sebrae 18.08

INDICACIONES PRESENTACION MINISTRA. 18.08 2015.pptx

18.09. bis Do, 21.09 - api.meleven.deapi.meleven.de/out/kodi/b7.6b.ed.KODi_KW38_K3_FINAL_interaktiv_01_62... · gesamtlänge 2,20 meter (ohne vase/deko.) 18.09. bis do, 21.09.17 alle

Reza fedayavatan 18.09

Teraz Rzeszów 18.08

Date : 18.09

Festival der Künste 18.08. - 25.08 - Arnsberg

18.09 au 08.10 2013

Keith percy 18.09

Alabama Office of EMSalabamapublichealth.gov/ems/assets/Operational...18.06 Medical Management of the Scene 18.07 Patient Rights and Refusal of Care 18.08 Time at the Scene 18.09 Cancellation/Slow

Lisa warth 18.09 morning rus

MITTWOCH, 18.08. - biomarkt-biotop.de

Lisa warth 18.09 afternoon

Richard semanda 18.09

Liste principale -Génie Informatique- Filière physique … · 1513089101 elamrazi nadia 18.09 1513098888 delmaki adnane 18.08 1513032843 tkarkib fatima 18.07 ... 1513152623 ahansal

Ponts & Chaussées Bertrange 18.09

Mercator Akcija Katalog 18.08.-11.09.2011.