S7281: Device Lending: Dynamic Sharing of GPUs in a...

25
S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster Jonas Markussen PhD student Simula Research Laboratory

Transcript of S7281: Device Lending: Dynamic Sharing of GPUs in a...

Page 1: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster

Jonas MarkussenPhD studentSimula Research Laboratory

Page 2: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

• Motivation

• PCIe Overview

• Non-Transparent Bridges

• Device Lending

Outline

Page 3: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Distributed applications may need to access and use IO resources that are physically located inside remote hosts

Front-end

Interconnect

... ......

...Control+Signaling+

Data

ComputenodeComputenodeComputenode

…… …

Page 4: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

• rCUDA

• CUDA-awareOpenMPI

• CustomGPUDirect RDMAimplementation

• ...

Software abstractions simplify the use and allocation of resources in a cluster and facilitate development of distributed applications

Front-end

Logicalviewofresources

...Control+Signaling+

Data

Handled in software

Page 5: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Local

Remote

Application Application

Localresource Remoteresourceusingmiddleware

CUDAlibrary+driver CUDA– middlewareintegration

PCIe IObus

CUDAdriver

Interconnecttransport(RDMA)

Interconnecttransport(RDMA)

Middlewareservice/daemon

Interconnect

Middlewareservice

PCIe IObus

Page 6: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

In PCIe clusters, the same fabric is used both as local IO bus within a single node and as the interconnect between separate nodes

ExternalPCIecable

PCIe interconnectswitch RAM Memorybus

PCIe bus

PCIe interconnecthostadapter

PCIe IOdevice

CPUandchipset

Interconnectswitch

Page 7: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Local

Remote

Application

Localresource

CUDAlibrary+driver

PCIe IObus

PCIe IObus

PCIe-basedinterconnect

Remoteresourceovernativefabric

Application

CUDAlibrary+driver

PCIe IObus

Page 8: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

PCIe Overview

Page 9: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

PCIe is the dominant IO bus technology in computers today, and can also be used as a high-bandwidth low-latency interconnect

PCI-SIG.PCIExpress3.1BaseSpecification,2010.http://www.eetimes.com/document.asp?doc_id=1259778

0

5

10

15

20

25

30

35

Gen2 Gen3 Gen4

Gigabytesp

erse

cond

(GB/s)

PCIex4PCIex8PCIex16

Page 10: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Memory reads and writes are handled by PCIe as transactions that are packet-switched through the fabric depending on the address

RAM

PCIe device PCIe device

PCIe device

CPUandchipset

• Upstream• Downstream• Peer-to-peer(shortestpath)

Page 11: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

IO devices and the CPU share the same physical address space, allowing devices to access system memory and other devices

RAM

PCIe device PCIe device

PCIe device

CPUandchipset

Addressspace0x00000…

0xFFFFF…

IOdevice

IOdevice

IOdevice

Interruptvecs

• Memory-mappedIO(MMIO/PIO)• DirectMemoryAccess(DMA)• Message-SignaledInterrupts(MSI-X)

RAM

0xfee00xxx

Page 12: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Non-Transparent Bridges

Page 13: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

RAM

Remote address space can be mapped into local address space by using PCIe Non-Transparent Bridges (NTBs)

PCIe NTBadapter

CPUandchipset

PCIe NTBadapter

CPUandchipset

Addressspace

NTB

LocalhostRemotehost

LocalRAM

Local Remote0xf000 0x9000

. . . . . .

NTBaddr mapping

RAM

Page 14: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Using NTBs, each node in the cluster take part in a shared address space and have their own “window” into the global address space

NTB-basedinterconnect

Globaladdr space

A B C

Addr spaceinB

Addr spaceinC

LocalRAM

LocalIOdevices

Globaladdr space

A’saddr space

LocalRAM

LocalIOdevices

C’saddr space

Exportedaddressrange

Addr spaceinA

Page 15: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Device Lending

Page 16: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

A remote IO device can be “borrowed” by mapping it into local address space, making it appear locally installed in the system

RAM

NTBadapter0xe000

CPUandchipset

NTBadapter0x1000

CPUandchipset

RAM

Physicaldevice0xb000

Inserteddevice0x2000

Devicedriver

Remote Local0xb000 0x2000

. . . . . .

NTBaddr mapping

Owner Borrower

PCIe hot-plug

Page 17: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

By intercepting DMA API calls to set up IOMMU mappings and inject reverse NTB mappings, physical location is completely transparent

RAM

NTBadapter0xe000

CPUandchipset

NTBadapter0x1000

CPUandchipset

RAM

Physicaldevice0xb000

Inserteddevice0x2000

Devicedriver

Local Remote0xf000 0x5000

. . . . . .

NTBaddr mapping

Owner Borrowerdma_addr = dma_map_page(0x9000);

IOMMU

IOV Phys0x5000 0x9000

. . . . . .

Use addr0xf000

Page 18: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Local

Remote

Application

Borrowed remoteresource

CUDAlibrary+driver

PCIe IObus

PCIe IObus

PCIe NTBinterconnect

Unmodifiedlocaldriver(withhot-plugsupport)

ResourceappearslocaltoOS,driver,andapp

Hardwaremappingsensurefastdatapath

WorkswithanyPCIe device(evenindividualSR-IOVfunctions)

Page 19: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Local

Remote

Application Application

Borrowed remoteresource Remoteresourceusingmiddleware

CUDAlibrary+driver CUDA– middlewareintegration

CUDAdriver

Interconnecttransport(RDMA)

Interconnecttransport(RDMA)

Middlewareservice/daemon

Interconnect

Middlewareservice

PCIe IObus

PCIe IObus

PCIe IObus

PCIe NTBinterconnect

Page 20: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Local

Remote

Application Application

Borrowed remoteresource Localresource

CUDAlibrary+driver

PCIe IObus

PCIe IObus

PCIe NTBinterconnect

CUDAlibrary+driver

PCIe IObus

Page 21: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

0

2

4

6

8

10

12

14

4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB

Gigabytesp

erse

cond

(GB/s)

Transfersize

bandwidthTest(Local) bandwidthTest(Borrowed) PXH830DMA(GPUDirectRDMA)

1. Nvidia CUDA8.0SamplesbandwidthTest2. GPUDirect RDMAbenchmarkusingDolphinNTBDMA

https://github.com/Dolphinics/cuda-rdma-bench

Device-to-host memory transfer

GPU: Quadro P400 Nvidia driver: Version 375.26(Centos7)

CPU: XeonE5-1630 3.7 GHz Memory: DDR42133MHz

Page 22: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Devicepool

Using Device Lending, nodes in a PCIe cluster can share resources through a process of borrowing and giving back devices

TaskA TaskB TaskC

GPU

SSD SSD SSD

NIC FPGA GPU

GPU GPU GPU

CPU+chipsetRAM

CPU+chipsetRAM

CPU+chipsetRAM

NTB

NTB

NTB

SSDSSDSSD

NIC

FPGA

TaskB

TaskA

GPU GPU GPU SSD

FPGA NIC

GPU SSD

SSD

GPUGPUGPUTaskC

Page 23: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

EIR – Efficient computer aided diagnosis framework for gastrointestinal examination

Examinationroom Examinationroom

Serverroomhttp://mlab.no/blog/2016/12/eir/

Page 24: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Moving forward

• Strategy-based management

• Fail-over mechanisms

• VFIO and other API integration (“SmartIO”)

• Borrowing vGPU functions

Page 25: S7281: Device Lending: Dynamic Sharing of GPUs in a ...on-demand.gputechconf.com/gtc/2017/presentation/s7281-jonas... · S7281: Device Lending: Dynamic Sharing of GPUs in a PCIeCluster

Thank you!“Device Lending in PCI Express Networks”ACM NOSSDAV 2016

“Efficient Processing of Video in a Multi Auditory Environment using Device Lending of GPUs”ACM Multimedia Systems 2016 (MMSys’16)

“PCIe Device Lending”University of Oslo 2015

DeviceLendingdemoandmoreVisit Dolphin in exhibition area (booth 625)

[email protected]

MyemailaddressSelected

publications