GPU Virtualization: Doing Much More with GPUs
-
Upload
maciej-bajkowski -
Category
Software
-
view
275 -
download
0
Transcript of GPU Virtualization: Doing Much More with GPUs
GPU Virtualization: Doing Much More with GPUsMazhar Memon, CTO BitfusionSC16 Salt Lake City, UT
1
Quick Poll: Any GPU users?
2
3
GPU Users: Much variety
• Learners• Application developers• HPC Scientist• CAD/CAE• Artist + Designers• Data Analyst
One size doesn’t fit all
Manufacturing
Retail & Finance
Media & Entertainment
Pharma & Healthcare
Oil & Gas
Deep Learning
4
Variety of GPU Sizes
• TX1• GTX• Tesla• Quadro
Problem: How to do more with your (static) GPUs?
5
Virtualization 101
6
Server
7
Hypervisor
VM
VM
VM
VM
Server
Hypervisor
vGPU
VM
vGPU
VM
vGPU
VM
vGPU
VM
Virtualization: Support More Users or Applications on a Single Server
Many users, small problems
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
GPUGPU
9
One user, one big problemGPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
10
Large data, small device memory
GPUGPU
App requiremen
tAvailable memory
App demand >> GPU memory
11
GPU Virtualization on Steroids
Use your favorite GPUapplications as-is
Bitfusion Boost Layer
Your existing GPU infrastructure
Solve Small Problems Cheaply
GPUGPU
Slice GPUs into arbitrary fractionsMemory and process isolation
Available on Nimbix Today: $0.49 GPU instances
Logi
cally
atta
ched
GPU
s
Solve Large Problems Dynamically
CPU-only Node
48 Cores3 TB Memory
72 TB SSD Storage
BoostMassive Virtual NodeGPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
GPUGPUGPUGPU
Racks with GPUs
GPU GPUGPU GPU
GPU GPUGPU GPU
GPU GPUGPU GPU
GPU GPUGPU GPU
GPU GPUGPU GPU
Creating the largest virtual GPU machines on demand
14
GPUGPU
Host Memory
Solve Large Data Problems Efficiently
Available memory
Dynamic paging of GPU memory backed by host memoryWorks for non-Pascal GPUs as well
15
Monitoring and Managing GPUs Easily
• Use your favorite tools:All common tools e.g. nvidia-smi work across
virtual clusters
16
Handling Faults Automatically
GPUGPU
GPUGPU
App
Failover to any other available GPU server uponCatastrophic, memory, intermittent faults
Bitfusion Boost: Software Stack
application
remote servers
local server
System view
Hardware
VM Hypervisor
Drivers
Operating system
SDI
User Space
Intercepts applications and applies a variety of rules including automatic scale-out, resource pooling, high availability, etc.
Hardware
VM Hypervisor
Drivers
Operating system
SDI
Hardware
VM Hypervisor
Drivers
Operating system
SDI
Open APIs
Custom APIs
Libraries
Application
Core Functions
Hardware
VM Hypervisor
Drivers
Operating system
SDI
Deploy on bare metal, containers, VMs. Secure, Portable, Frictionless
18
App Specific Instance Configurations as Machine
Images
Resource Pooling:• Consolidate use of compute resources• Increase utilization• Lower capital costs
Resource Provisioning:• Enforce CPU, memory, utilization quotas• Effect QoS policy and guarantees• Maximize utilization and reduce costs
High availability:• Detect failures at app level• Rollback, failover, error detection• Events for higher level reporting
Heterogeneous Offload:• Leverage HPC hardware• Interpose vendor libraries• Retarget hot functions to efficient specialized devices
Scale-out:• Distribute and load balance load across systems• Scale performance on demand• Take advantage of runtime optimizations
Advanced Profiling:• Understand application
demands of the datacenter• Fine-grained data provides
unique insight• Precise recommendations for
capacity planning
Deep Learning Caffe Deep Learning Torch
Deep Learning TensorflowMedia Transcoding
Rendering Scientific Computing
Boost: Add broad set of features to your application
http://www.bitfusion.io/boost-machine-images
19
Boost Available on Nimbix todayDeveloper-optimized machine configurations: