EUCARD2/WP4:Applications Medium Energy Accelerators/Accelerators for Medicine Introduction
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer...
-
Upload
damian-hancock -
Category
Documents
-
view
222 -
download
0
Transcript of Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer...
HotPar 20121
Operating Systems Should Manage
Accelerators
Sankaralingam Panneerselvam Michael M. Swift
Computer Sciences DepartmentUniversity of Wisconsin, Madison, WI
HotPar 20122
Accelerators Specialized execution
units for common tasks Data parallel tasks,
encryption, video encoding, XML parsing, network processing etc.
Task offloaded to accelerators High performance,
energy efficient (or both)
GPU
Wire speed processor
APU
DySER
Crypto accelerator
HotPar 20123
Why accelerators? Moore’s law and Dennard’s scaling
Single core to Multi-core Dark Silicon: More transistors but limited
power budget [Esmaeilzadeh et. al - ISCA ‘11]
Accelerators Trade-off area for specialized logic Performance up by 250x and energy
efficiency by 500x [Hameed et. al – ISCA ’10]
Heterogeneous architectures
HotPar 20124
Heterogeneity everywhere Focus on diverse heterogeneous
systems Processing elements with different properties
in the same system
Our work How can OS support this architecture?
1. Abstract heterogeneity2. Enable flexible task execution3. Multiplex shared accelerators among
applications
HotPar 20125
Outline Motivation Classification of accelerators Challenges Our model - Rinnegan Conclusion
HotPar 20127
Classification of accelerators Accelerators classified based on their
accessibilityAcceleration devicesCo-ProcessorAsymmetric cores
HotPar 20128
ClassificationProperties Acceleration
devicesCo-Processor Asymmetric
cores
Systems GPU, APU, Crypto accelerators in SunNiagara chips etc.
C-cores, DySER, IBM wire-speed processor
NVIDIA’s Kal-El processor, Scalable cores like WiDGET
Location Off/On-chip device
On-chip device Regular core that can execute threads
Accessibility
Control Device drivers ISA instructions Thread migration, RPC
Data DMA or zero copy System memory Cache coherency
Resource contention
Multiple applications might want to make use of the device
Sharing of resource across multiple cores could result in contention
Multiple threads might want toaccess the special core
1) Application must deal with different classes of accelerators with different access mechanisms
2) Resources to be shared among multiple applications
HotPar 20129
Challenges Task invocation
Execution on different processing elements
Virtualization Sharing across multiple applications
Scheduling Time multiplexing the accelerators
HotPar 201210
Task invocation Current systems decide
statically on where to execute the task
Programmer problems where system can help Data granularity Power budget limitation Presence/availability of
accelerator
AES
Crypto accelerator
HotPar 201211
Virtualization Data addressing for
accelerators Accelerators need to
understand virtual addresses
OS must provide virtual-address translations
Process preemption after launching a computation System must be aware of
computation completion
0x0
00 0
x1
000
x2
00 0
x5
00 0
x4
000x4
80
00x4
90
0 0x5
00
0
Virtual memor
y
crypto
XML
cryp
t
0x240
Acc
ele
rat
or
unit
s
CPU cores
HotPar 201212
Scheduling Scheduling needed to
prioritize access to shared accelerators
User-mode access to accelerators complicates scheduling decisions OS cannot interpose on
every request
HotPar 201213
Outline Motivation Classification of Accelerators Challenges Our model - Rinnegan Conclusion
HotPar 201214
Rinnegan Goals
Application leverage the heterogeneity OS enforces system wide policies
Rinnegan tries to provide support for different class of accelerators Flexible task execution by Accelerator
stub Accelerator agent for safe multiplexing Enforcing system policy through
Accelerator monitor
HotPar 201215
Accelerator stub Entry point for accelerator
invocation and a dispatcher Abstracts different
processing elements Binding - Selecting the best
implementation for the task
Static: during compilation Early: at process start Late: at call
Process making use
of accelerator
Stub
{Encrypt through
AES instructio
ns}
{Offload
to crypto Accelerat
or}
HotPar 201216
Accelerator agent Manages an
accelerator Roles of an agent
Virtualize accelerator Forwarding requests
to the accelerator Implement
scheduling decisions Expose accounting
information to the OS
Resource allocator
Process trying to offload a parallel task
Stub
User Space
Kernel Space
Asymmetric core
Thread migration
Device driver
Process can communicate directly with accelerator
Stub
Acc
ele
rato
r ag
ents
Hardware
HotPar 201217
Accelerator monitor Monitors information
like utilization of resources
Responsible for system-wide energy and performance goals
Acts as an online modeling tool Helps in deciding where
to schedule the task
Accelerator agent
Process making use of
accelerator
Stub
Accelerator
monitor
User Space
Kernel Space
Accelerator
Hardware
HotPar 201218
Programming model Based on task level
parallelism Task creation and
execution Write: different
implementations Launch: similar to function
invocation Fetching result: supporting
synchronous/asynchronous invocation
Example: Intel TBB/Cilk
application(){ task = cryptStub(inputData, outputData); ... waitFor(task);}
<TaskHandle> cryptStub(input, output){ if(factor1 == true) output = implmn_AES(input) //or// else if(factor2 == true) output = implmn_CACC(input)}
HotPar 201219
Summary
Accelerator agent
Process making use of accelerator
Stub
Accelerator monitor
Other kernel components
Process User Space
Kernel Space
CPU cores Accelerator
Direct communication with accelerator
HotPar 201220
Related work Task invocation based on data
granularityMerge, Harmony
Generating device specific implementation of same task
GPUOcelot Resource scheduling: Sharing
between multiple applicationsPTask, Pegasus
HotPar 201221
Conclusions Accelerators common in future
systems Rinnegan embraces accelerators
through stub mechanism, agent and monitoring components
Other challenges Data movement between devices Power measurement