GPU Virtualization: Doing Much More with GPUs

20
GPU Virtualization: Doing Much More with GPUs Mazhar Memon, CTO Bitfusion SC16 Salt Lake City, UT 1

Transcript of GPU Virtualization: Doing Much More with GPUs

Page 1: GPU Virtualization: Doing Much More with GPUs

GPU Virtualization: Doing Much More with GPUsMazhar Memon, CTO BitfusionSC16 Salt Lake City, UT

1

Page 2: GPU Virtualization: Doing Much More with GPUs

Quick Poll: Any GPU users?

2

Page 3: GPU Virtualization: Doing Much More with GPUs

3

GPU Users: Much variety

• Learners• Application developers• HPC Scientist• CAD/CAE• Artist + Designers• Data Analyst

One size doesn’t fit all

Manufacturing

Retail & Finance

Media & Entertainment

Pharma & Healthcare

Oil & Gas

Deep Learning

Page 4: GPU Virtualization: Doing Much More with GPUs

4

Variety of GPU Sizes

• TX1• GTX• Tesla• Quadro

Page 5: GPU Virtualization: Doing Much More with GPUs

Problem: How to do more with your (static) GPUs?

5

Page 6: GPU Virtualization: Doing Much More with GPUs

Virtualization 101

6

Page 7: GPU Virtualization: Doing Much More with GPUs

Server

7

Hypervisor

VM

VM

VM

VM

Server

Hypervisor

vGPU

VM

vGPU

VM

vGPU

VM

vGPU

VM

Virtualization: Support More Users or Applications on a Single Server

Page 8: GPU Virtualization: Doing Much More with GPUs

Many users, small problems

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

GPUGPU

Page 9: GPU Virtualization: Doing Much More with GPUs

9

One user, one big problemGPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

Page 10: GPU Virtualization: Doing Much More with GPUs

10

Large data, small device memory

GPUGPU

App requiremen

tAvailable memory

App demand >> GPU memory

Page 11: GPU Virtualization: Doing Much More with GPUs

11

GPU Virtualization on Steroids

Use your favorite GPUapplications as-is

Bitfusion Boost Layer

Your existing GPU infrastructure

Page 12: GPU Virtualization: Doing Much More with GPUs

Solve Small Problems Cheaply

GPUGPU

Slice GPUs into arbitrary fractionsMemory and process isolation

Available on Nimbix Today: $0.49 GPU instances

Page 13: GPU Virtualization: Doing Much More with GPUs

Logi

cally

atta

ched

GPU

s

Solve Large Problems Dynamically

CPU-only Node

48 Cores3 TB Memory

72 TB SSD Storage

BoostMassive Virtual NodeGPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

GPUGPUGPUGPU

Racks with GPUs

GPU GPUGPU GPU

GPU GPUGPU GPU

GPU GPUGPU GPU

GPU GPUGPU GPU

GPU GPUGPU GPU

Creating the largest virtual GPU machines on demand

Page 14: GPU Virtualization: Doing Much More with GPUs

14

GPUGPU

Host Memory

Solve Large Data Problems Efficiently

Available memory

Dynamic paging of GPU memory backed by host memoryWorks for non-Pascal GPUs as well

Page 15: GPU Virtualization: Doing Much More with GPUs

15

Monitoring and Managing GPUs Easily

• Use your favorite tools:All common tools e.g. nvidia-smi work across

virtual clusters

Page 16: GPU Virtualization: Doing Much More with GPUs

16

Handling Faults Automatically

GPUGPU

GPUGPU

App

Failover to any other available GPU server uponCatastrophic, memory, intermittent faults

Page 17: GPU Virtualization: Doing Much More with GPUs

Bitfusion Boost: Software Stack

application

remote servers

local server

System view

Hardware

VM Hypervisor

Drivers

Operating system

SDI

User Space

Intercepts applications and applies a variety of rules including automatic scale-out, resource pooling, high availability, etc.

Hardware

VM Hypervisor

Drivers

Operating system

SDI

Hardware

VM Hypervisor

Drivers

Operating system

SDI

Open APIs

Custom APIs

Libraries

Application

Core Functions

Hardware

VM Hypervisor

Drivers

Operating system

SDI

Deploy on bare metal, containers, VMs. Secure, Portable, Frictionless

Page 18: GPU Virtualization: Doing Much More with GPUs

18

App Specific Instance Configurations as Machine

Images

Resource Pooling:• Consolidate use of compute resources• Increase utilization• Lower capital costs

Resource Provisioning:• Enforce CPU, memory, utilization quotas• Effect QoS policy and guarantees• Maximize utilization and reduce costs

High availability:• Detect failures at app level• Rollback, failover, error detection• Events for higher level reporting

Heterogeneous Offload:• Leverage HPC hardware• Interpose vendor libraries• Retarget hot functions to efficient specialized devices

Scale-out:• Distribute and load balance load across systems• Scale performance on demand• Take advantage of runtime optimizations

Advanced Profiling:• Understand application

demands of the datacenter• Fine-grained data provides

unique insight• Precise recommendations for

capacity planning

Deep Learning Caffe Deep Learning Torch

Deep Learning TensorflowMedia Transcoding

Rendering Scientific Computing

Boost: Add broad set of features to your application

http://www.bitfusion.io/boost-machine-images

Page 19: GPU Virtualization: Doing Much More with GPUs

19

Boost Available on Nimbix todayDeveloper-optimized machine configurations:

Page 20: GPU Virtualization: Doing Much More with GPUs

20

Learn more about Bitfusion Boost at boost.bitfusion.io