rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European...

70
rCUDA: a ready to use remote GPU rCUDA: a ready-to-use remote GPU virtualizacion framework Rafael Mayo Universitat Jaume I de Castelló (Spain) Universitat Jaume I de Castelló (Spain) Joint research effort

Transcript of rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European...

Page 1: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA: a ready to use remote GPUrCUDA: a ready-to-use remote GPU virtualizacion framework

Rafael MayoUniversitat Jaume I de Castelló (Spain)Universitat Jaume I de Castelló (Spain)

Joint research effort

Page 2: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

Outline

GPU computing GPU computing rCUDA framework Other GPU virtualization frameworks rCUDA SLURM integration rCUDA SLURM integration SDK and real applications tests

HPC Advisory Council European Conference, Leipzig 2013 2/72

Page 3: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

Outline

GPU computing GPU computing rCUDA framework Other GPU virtualization frameworks rCUDA SLURM integration rCUDA SLURM integration SDK and real applications tests

HPC Advisory Council European Conference, Leipzig 2013 3/72

Page 4: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing GPU computing: defines all the technological issues for

using the GPU computational power for executing l dgeneral purpose code

GPU computing has experienced remarkable growth in th l tthe last years

Top500pNovember 2012

HPC Advisory Council European Conference, Leipzig 2013 4/72

Page 5: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing GPU computing: defines all the technological issues for

using the GPU computational power for executing l dgeneral purpose code

GPU computing has experienced remarkable growth in th l tthe last years

Top500pNovember 2012

HPC Advisory Council European Conference, Leipzig 2013 5/72

Page 6: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing

The basic building block is a node with 1 or more GPUsPU em

Main Memory

PU em

CPU

GPU

GP

me

Net

wor

k

GPU

GP

me

PCI-e

U m U m U m U m Main Memory

GPU

GPU

mem

GPU

GPU

mem

GPU

GPU

mem

GPU

GPU

mem Main Memory

CPU

Net

wor

kN

PCI-e

HPC Advisory Council European Conference, Leipzig 2013 6/72

Page 7: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing

From the programming point of view: A set of nodes, each one with:

one or more CPUs (with several cores per CPU) one or more GPUs (1-4) disjoint memory spaces for each GPU and CPU

An interconnection network

GPU GPUmem GPU GPU

memGPU GPUmem GPU GPU

mem GPU GPUmem GPU GPU

mem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

PCI-e C

PU

Main

Mem

o

GPU GPUmem

ory

Network

Interconnection Network

ory

Network

ory

Network

ory

Network

ory

Network

ory

Network

HPC Advisory Council European Conference, Leipzig 2013 7/72

Interconnection Network

Page 8: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing

Development tools have been introduced in porder to ease the programming of the GPUsTwo main approaches in GPU computing Two main approaches in GPU computing development environments: CUDA NVIDIA proprietary OpenCL open standard OpenCL open standard

HPC Advisory Council European Conference, Leipzig 2013 8/72

Page 9: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing

Basically CUDA and OpenCL have the same working scheme:working scheme: Compilation: Separate CPU code from GPU

code (GPU kernel)code (GPU kernel) Execution:

D t t f CPU d GPU Data transfers: CPU and GPU memory spaces1.Before GPU kernel execution: data from CPU

memory space to GPU memory space2.Computation: Kernel execution3.After GPU kernel execution: results from GPU

memory space to CPU memory space

HPC Advisory Council European Conference, Leipzig 2013 9/72

Page 10: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing

Time spent on data transfers may not be negligible

90

100Influence of data transfers for SGEMM

Pinned MemoryNon-Pinned Memory

%)

60

70

80y

rans

fers

(%

40

50

60

ed to

dat

a tr

10

20

30

Tim

e de

vote

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

10

Matrix Size

T

HPC Advisory Council European Conference, Leipzig 2013 10/72

Matrix Size

Page 11: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU computing

For the right kind of code the use of GPUs brings huge benefits in terms of performance and energybenefits in terms of performance and energy

There must be data parallelism in the code: this is the l t t k b fit f th h d d fonly way to take benefit from the hundreds of processors

in a GPU Different scenarios from the point of view of the

application: Low level of data parallelism High level of data parallelism Moderate level of data parallelism Applications for multi-GPU computing

HPC Advisory Council European Conference, Leipzig 2013 11/72

Applications for multi GPU computing

Page 12: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

Outline

GPU computing GPU computing rCUDA framework Other GPU virtualization frameworks rCUDA SLURM integration rCUDA SLURM integration SDK and real applications tests

HPC Advisory Council European Conference, Leipzig 2013 12/72

Page 13: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

rCUDArCUDAA framework enabling that a CUDA-based

application running in one node can access GPUs in other nodesaccess GPUs in other nodes

It is useful when you have:

Moderate level of data parallelism Moderate level of data parallelism

Applications for multi GPU computing

HPC Advisory Council European Conference, Leipzig 2013 13/72

pp p g

Page 14: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

Virtualized Remote GPUs

Node GPUNode GPU GPUNode GPU

Node GPUnetw

orkNode GPU

Nodenetw

ork GPU

GPUNode GPU

Node GPU

nnec

tionNode

Node GPU

nnec

tion GPU

GPUGNode GPU

Node GPUGPU

Inte

rcon

Node GPU

NodeInte

rcon GPU

GPUGPU

GPU

VirtualGPUs

GPUs

GPU

HPC Advisory Council European Conference, Leipzig 2013 14/72

GPUs

Page 15: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

CUDA application

Application

CUDAlibraries

HPC Advisory Council European Conference, Leipzig 2013 15/72

Page 16: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

CUDA applicationClient side Server side

ApplicationApplication

CUDAdriver + runtime

CUDAlibraries

HPC Advisory Council European Conference, Leipzig 2013 16/72

Page 17: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

CUDA applicationClient side Server side

Application rCUDA daemon

CUDAlibraries

rCUDA library

Networkdevice

Networkdevicedevice device

HPC Advisory Council European Conference, Leipzig 2013 17/72

Page 18: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

CUDA applicationClient side Server side

Application rCUDA daemon

CUDAdriver + runtime

rCUDA library

Networkdevice

Networkdevicedevice device

HPC Advisory Council European Conference, Leipzig 2013 18/72

Page 19: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

CUDA applicationClient side Server side

Application rCUDA daemon

CUDAdriver + runtime

rCUDA library

Networkdevice

Networkdevicedevice device

HPC Advisory Council European Conference, Leipzig 2013 19/72

Page 20: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

rCUDA uses a proprietary communication protocolcommunication protocol

Example:

1) initialization2) memory allocation on the

remote GPUremote GPU3) CPU to GPU memory

transfer of the input data 4) kernel execution4) kernel execution5) GPU to CPU memory

transfer of the results 6) GPU memory release) y7) communication channel

closing and server process finalization

HPC Advisory Council European Conference, Leipzig 2013 20/72

p

Page 21: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

Application

Client side Server side

Interception library Server daemon

pp

Network interface Network interface

CUDA libraryCommunications Communications

TCP IB TCP IBNetwork interface Network interfaceTCP IB TCP IB

HPC Advisory Council European Conference, Leipzig 2013 21/72

Page 22: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

Optimized communications

Use GPUDirect to avoid memory copies Pipeline to improve performance Pipeline to improve performance Preallocated pinned memory buffers Optimal pipeline block size

HPC Advisory Council European Conference, Leipzig 2013 22/72

Page 23: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 23/72

MemoryMain Memory Memory

Page 24: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 24/72

MemoryMain Memory Memory

Page 25: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 25/72

MemoryMain Memory Memory

Page 26: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 26/72

MemoryMain Memory Memory

Page 27: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 27/72

MemoryMain Memory Memory

Page 28: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 28/72

MemoryMain Memory Memory

Page 29: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 29/72

MemoryMain Memory Memory

Page 30: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 30/72

MemoryMain Memory Memory

Page 31: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Synchronous transfers Pageable memory Synchronous transfers Pageable memory

ES

T

ES

TNetwork

e

RE

QU

E

RE

QU

E

ent s

ide Server s

SEND / RECV

Clie

side

GPU Main Memory MainPinned Pinned

HPC Advisory Council European Conference, Leipzig 2013 31/72

MemoryMain Memory Memory

Page 32: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain Main

HPC Advisory Council European Conference, Leipzig 2013 32/72

GPU Memory

Main Memory

Main Memory

Page 33: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain Main

HPC Advisory Council European Conference, Leipzig 2013 33/72

GPU Memory

Main Memory

Main Memory

Page 34: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain MainRDMA READ

HPC Advisory Council European Conference, Leipzig 2013 34/72

GPU Memory

Main Memory

Main Memory

RDMA READ

Page 35: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain MainRDMA READ

HPC Advisory Council European Conference, Leipzig 2013 35/72

GPU Memory

Main Memory

Main Memory

RDMA READ

Page 36: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain MainRDMA READ

HPC Advisory Council European Conference, Leipzig 2013 36/72

GPU Memory

Main Memory

Main Memory

RDMA READ

Page 37: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain MainRDMA READ

HPC Advisory Council European Conference, Leipzig 2013 37/72

GPU Memory

Main Memory

Main Memory

RDMA READ

Page 38: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain MainRDMA READ

HPC Advisory Council European Conference, Leipzig 2013 38/72

GPU Memory

Main Memory

Main Memory

RDMA READ

Page 39: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain MainRDMA READ

HPC Advisory Council European Conference, Leipzig 2013 39/72

GPU Memory

Main Memory

Main Memory

RDMA READ

Page 40: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework Efficient pipeline implementation

Asynchronous transfers Pinned memory Asynchronous transfers Pinned memory

ES

T

ES

TNetworkR

EQ

U

RE

QU

SEND / RECV

een

t sid

e Server sC

lieside

GPUMain Main

HPC Advisory Council European Conference, Leipzig 2013 40/72

GPU Memory

Main Memory

Main Memory

Page 41: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

Pipeline block size for Infiniband FDR

• NVIDIA Tesla K20

HPC Advisory Council European Conference, Leipzig 2013 41/72

• Mellanox ConnectX-3 + SX6025 Mellanox switch

Page 42: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

Host to device copy with pinned memory

HPC Advisory Council European Conference, Leipzig 2013 42/72

Page 43: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

Host to device copy with pageable memory

HPC Advisory Council European Conference, Leipzig 2013 43/72

Page 44: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA framework

Copy a small dataset for latency study (64 bytes)

HPC Advisory Council European Conference, Leipzig 2013 44/72

Page 45: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

Outline

GPU computing GPU computing rCUDA framework Other GPU virtualization frameworks rCUDA SLURM integration rCUDA SLURM integration SDK and real applications tests

HPC Advisory Council European Conference, Leipzig 2013 45/72

Page 46: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU virtualization

Several efforts have been done regarding to GPU virtualization in the recent yearsvirtualization in the recent years. rCUDA (CUDA 5.0) GVirtuS (CUDA 3.2) DS-CUDA (CUDA 4.1) vCUDA (CUDA 1.1) GViM (CUDA 1.1) GViM (CUDA 1.1) GridCUDA (CUDA 2.3)

V GPU (CUDA 4 0) V-GPU (CUDA 4.0)

HPC Advisory Council European Conference, Leipzig 2013 46/72

Page 47: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU virtualization

Comparison of virtualization solutions.L t (t f 64 b t ) Latency (transfer 64 bytes) Intel Xeon E5-2620 (6 cores) 2,0GHz GPU NVIDIA Tesla K20 GPU NVIDIA Tesla K20 Mellanox ConnectX-3 single-port InfiniBand Adapter (FDR) CentOS 6.3 + Mellanox OFED 1.5.3

Pageable Pinned Pageable PinnedPageableH2D

PinnedH2D

PageableD2H

PinnedD2H

CUDA 34,3 4,3 16,2 5,2

rCUDA 94,5 23,1 292,2 6,0

GVirtuS 184,2 200,3 168,4 182,8

DS-CUDA 45 9 - 26 5 -

HPC Advisory Council European Conference, Leipzig 2013 47/72

DS CUDA 45,9 26,5

Page 48: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU virtualization

• Bandwidth host to device using pageable memoryec

MB

/se

Transfer size (MB)

HPC Advisory Council European Conference, Leipzig 2013 48/72

Transfer size (MB)

Page 49: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU virtualization

• Bandwidth device to host using pageable memoryec

MB

/se

Transfer size (MB)

HPC Advisory Council European Conference, Leipzig 2013 49/72

Transfer size (MB)

Page 50: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU virtualization

• Bandwidth host to device using pinned memoryec

MB

/se

Transfer size (MB)

HPC Advisory Council European Conference, Leipzig 2013 50/72

Transfer size (MB)

Page 51: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

GPU virtualization

• Bandwidth device to host using pinned memoryec

MB

/se

Transfer size (MB)

HPC Advisory Council European Conference, Leipzig 2013 51/72

Transfer size (MB)

Page 52: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

Outline

GPU computing GPU computing rCUDA framework Other GPU virtualization frameworks rCUDA SLURM integration rCUDA SLURM integration SDK and real applications tests

HPC Advisory Council European Conference, Leipzig 2013 52/72

Page 53: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integration

• SLURM job scheduler.j• Does not understand about virtualized GPUs.

Add GRES ( l ) i d t• Add a new GRES (general resource) in order to manage the virtualized GPUs.

• Where the GPUs are in the system is completely transparent to the user.

• In the job script or in the submision command, the user specifies the number of rGPUS for thethe user specifies the number of rGPUS for the job.

HPC Advisory Council European Conference, Leipzig 2013 53/72

Page 54: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integration

HPC Advisory Council European Conference, Leipzig 2013 54/72

Page 55: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integration

slurm.cof

ClusterName=rcu…GresTypes=gpu rgpuGresTypes=gpu,rgpu…NodeName=rcu16 NodeAddr=rcu16 CPUs=8 Sockets=1

CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7990 Gres=rgpu:4,gpu:4

HPC Advisory Council European Conference, Leipzig 2013 55/72

Page 56: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integration

scontrol show node

NodeName=rcu16 Arch=x86_64 CoresPerSocket=4CPUAlloc 0 CPUErr 0 CPUTot 8 Features (null)CPUAlloc=0 CPUErr=0 CPUTot=8 Features=(null)Gres=rgpu:4,gpu:4NodeAddr=rcu16 NodeHostName=rcu16OS=Linux RealMemory=7682 Sockets=1OS=Linux RealMemory=7682 Sockets=1State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1BootTime=2013-04-24T18:45:35SlurmdStartTime=2013-04-30T10:02:04

HPC Advisory Council European Conference, Leipzig 2013 56/72

Page 57: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integration

gres.conf

Name=rgpu File=/dev/nvidia0 Cuda=2.1 Mem=1535m

g

gpName=rgpu File=/dev/nvidia1 Cuda=3.5 Mem=1535mName=rgpu File=/dev/nvidia2 Cuda=1.2 Mem=1535mName=rgpu File=/dev/nvidia3 Cuda=1.2 Mem=1535m

/ /Name=gpu File=/dev/nvidia0Name=gpu File=/dev/nvidia1Name=gpu File=/dev/nvidia2Name=gpu File=/dev/nvidia3Name=gpu File=/dev/nvidia3

HPC Advisory Council European Conference, Leipzig 2013 57/72

Page 58: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integration

Submit a job$$ srun -N1 --gres=rgpu:4:512M script.sh...$

j

$

• Environment variables are initialized by SLURM and used byrCUDA client (transparently to the user)

RCUDA_DEVICE_COUNT=4

RCUDA_DEVICE_0=rcu16:0 Server name/IP address : GPURCUDA_DEVICE_1=rcu16:1RCUDA_DEVICE_2=rcu16:2RCUDA_DEVICE_3=rcu16:3

Server name/IP address : GPU

HPC Advisory Council European Conference, Leipzig 2013 58/72

Page 59: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integration

HPC Advisory Council European Conference, Leipzig 2013 59/72

Page 60: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

SLURM integrationno

des

ed fr

om

deco

uple

PUs

are

dG

P

HPC Advisory Council European Conference, Leipzig 2013 60/72

Page 61: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

Outline

GPU computing GPU computing rCUDA framework Other GPU virtualization frameworks rCUDA SLURM integration rCUDA SLURM integration SDK and real applications tests

HPC Advisory Council European Conference, Leipzig 2013 61/72

Page 62: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA test

Test systemI t l X E5 2620 (6 ) 2 0GH Intel Xeon E5-2620 (6 cores) 2,0GHz

GPU NVIDIA Tesla K20M ll C tX 3 i l t I fi iB d Ad t (FDR) Mellanox ConnectX-3 single-port InfiniBand Adapter (FDR)

Mellanox switch SX6025 (FDR)Ci it h SLM2014 (1G b Eth t) Cisco switch SLM2014 (1Gpbs Ethernet)

CentOS 6.3 + Mellanox OFED 1.5.3

HPC Advisory Council European Conference, Leipzig 2013 62/72

Page 63: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA test

Test CUDA SDK examples

8

9

CUDArCUDA FDR

5

6

7

rCUDA FDRrCUDA Gb

ed ti

me

3

4

5

Nor

mal

iz

1

2

N

BT IE RE AT TA FF SK CB MC LU MB SC FI HE BN IS LC CG TT IN F2 SN FD BF VR RG MI DI0

HPC Advisory Council European Conference, Leipzig 2013 63/72

Page 64: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA test

CUDASW++Please, don’t ask

me about this

Bioinformatics software for Smith-Waterman protein database searches

45 18

FDR Overhead QDR Overhead GbE Overhead CUDArCUDA FDR rCUDA QDR rCUDA GbE

)

2530354045

1012141618

erhe

ad (%

)

Tim

e (s

)

5101520

2468

rCU

DA

Ove

Exe

cutio

n

144 189 222 375 464 567 657 729 850 1000 1500 2005 2504 3005 3564 4061 4548 4743 5147 54780 0

Sequence Length

r

HPC Advisory Council European Conference, Leipzig 2013 64/72

Page 65: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA testPlease don’t ask

CUDABLASTaccelerated version of the NCBI-BLAST . Basic Local Alignment

Please, don t ask me about this

Search Tool, a widely used bioinformatics tool.

350 60

FDR Overhead QDR Overhead GbE Overhead CUDArCUDA FDR rCUDA QDR rCUDA GbE

%)

150

200

250

300

30

40

50

Ove

rhea

d (%

n Ti

me

(s)

0

50

100

150

0

10

20

rCU

DA

O

Exe

cutio

n

2 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400

Sequence Length

HPC Advisory Council European Conference, Leipzig 2013 65/72

Page 66: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA test

Test system for MultiGPUF CUDA t t d ith For CUDA tests one node with: 2 Quad-Core Intel Xeon E5440 processors Tesla S2050 (4 Tesla GPUs) Tesla S2050 (4 Tesla GPUs) Each thread (1-4) uses one GPU

For rCUDA tests 8 nodes with: 2 Quad-Core Intel Xeon E5520 processors 1 NVIDIA Tesla C2050 Infiniband QDR Test running in one node and using up to all the GPUs of the others Each thread (1-8) uses one GPU

HPC Advisory Council European Conference, Leipzig 2013 66/72

Page 67: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA test

MonteCarlo MultiGPU (from NVIDIA SDK)

HPC Advisory Council European Conference, Leipzig 2013 67/72

Page 68: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA test

GEMM MultiGPU using libflame: high performance dense linear algebra library

http://www.cs.utexas.edu/~flame/web/

HPC Advisory Council European Conference, Leipzig 2013 68/72

Page 69: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

rCUDA work in progress

Scheduling in SLURM taking into account theactual GPU utilizationactual GPU utilization

Test in ARM platforms CARMA board as client and rCUDA daemon on Intel Platform

equipped with NVIDIA Tesla Fermi Functional test with NVIDIA SDK and LAMMPS Functional test with NVIDIA SDK and LAMMPS Limited performance due to 1Gbps ethernet network

Test more applications from Test more applications from“Popular GPU-accelerated Applications”

S t t CUDA 5 5 ( l did t ) Support to CUDA 5.5 (release candidate)

HPC Advisory Council European Conference, Leipzig 2013 69/72

Page 70: rCUDA: a readyrCUDA: a ready-to-use remote GPUuse remote ... · HPC Advisory Council European Conference, Leipzig 2013 2/72. Outline ... Top500 November 2012 HPC Advisory Council

http://www.rcuda.netMore than 300 requests

Thanks to Mellanox for its support to this work

Antonio PeñaC l R ñ Adrian Castelló

rCUDA people(1)

Carlos ReañoFederico Silla

José Duato

Adrian CastellóRafael MayoEnrique S. Quintana-Ortí

(1) Now at Argonne National Lab. (USA)