Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling Extracting...
-
Upload
neal-peters -
Category
Documents
-
view
231 -
download
0
Transcript of Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling Extracting...
Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling
Extracting performance models at runtime
Memory management Asymmetric Distributed Shared Memory
StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, Cédric Augonnet, Samuel Thibault, and Raymond Namyst. TR-7240, INRIA, March 2010. [link]
An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro, Wen-mei Hwu, ASPLOS’10 [pdf]
Today: Bridging runtime and language support ‘Virtualizing GPUs’
Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf]
Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011
Today: Bridging runtime and language support ‘Virtualizing GPUs’
Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf]
Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011 best paper!
Context: clouds shift to support HPC applications
initially tightly coupled applications not suited for could applications
today Chinese – cloud with 40Gbps infiniband Amazaon HPC instance GPU instances: Amazon, Nimbix
Challenge: make GPUs shared resources in the could.
Challenge: make GPUs a shared resource in the could.
Why do this? GPUs are costly resources
Multiple VMs on a node with a single GPU Increase utilization
app level: some apps might not use GPUs much; kernel level: some kernels can be collocatd
1. The ‘How?’
Preamble: Concurrent kernels are supported by today’s GPUs Each kernel can execute a different task Tasks can be mapped to different streaming multiprocessors
(using thread-block configuration) Problem: concurrent execution limited to the set of kernels
invoked within a single processor context
Past virtualization solutions API rerouting / intercept library
1. The ‘How?’
Preamble: Concurrent kernels are supported by today’s GPUs Each kernel can execute a different task Tasks can be mapped to different streaming multiprocessors
(using thread-block configuration) Problem: concurrent execution limited to the set of kernels
invoked within a single processor context
2. Evaluation – The opportunity
The opportunity Key assumption: Under-utilization of GPUs
Space-sharing Kernels occupy different SP
Time-sharing Kernels time-share same SP (benefit form harware
support form context switces) Note: is it not always possible
2. Evaluation – The opportunity
The opportunity Key assumption: Under-utilization of GPUs
Sharing Space-sharing
Kernels occupy different SP Time-sharing
Kernels time-share same SP (benefit form harware support form context switces) Note: resource conflicts may prevent this
Molding – change kernel configuration (different number of thread blocks / threads per block) to improve collocation