Work Stealing Scheduler
description
Transcript of Work Stealing Scheduler
Work Stealing Scheduler 1
WORK STEALING SCHEDULER
6/16/2010
Work Stealing Scheduler 2
Announcements• Text books• Assignment 1• Assignment 0 results• Upcoming Guest lectures
6/16/2010
Work Stealing Scheduler 3
Recommended Textbooks
6/16/2010
The Art of Multiprocessor Programming
Maurice Herlihy, Nir Shavit
Patterns for ParallelProgramming
Timothy Mattson, et.al.
Parallel Programming with Microsoft .NET
http://parallelpatterns.codeplex.com/
Work Stealing Scheduler 4
Available on Website• Assignment 1 (due two weeks from now)• Paper for Reading Assignment 1 (due next week)
6/16/2010
Work Stealing Scheduler 5
Assignment 0• Thanks for submitting • Provided important feedback to us
6/16/2010
Work Stealing Scheduler 6
Percentage Students Answering Yes
6/16/2010
Vector C
lock
Empiric
al Measu
rement
Fence
s
Functi
onal Progra
mming
Lock
Ordering
Recurre
nceACID
Race Condition
0
20
40
60
80
100
Work Stealing Scheduler 7
Number of Yes Answers Per Student
6/16/2010
0 1 2 3 4 5 6 7 80
2
4
6
8
10
12
14
Work Stealing Scheduler 8
Upcoming Guest Lectures• Apr 12: Todd Mytkowicz • How to (and not to) measure performance
• Apr 19: Shaz Qadeer• Correctness Specifications and Data Race Detection
6/16/2010
Work Stealing Scheduler 9
Last Lecture Recap
6/16/2010
Work Stealing Scheduler 10
Last Lecture Recap• Parallel computation can be represented as a DAG• Nodes represent sequential computation• Edges represent dependencies
6/16/2010
Split
Sequential Sort
Merge
Sequential Sort
Time
Work Stealing Scheduler 11
Last Lecture Recap• Parallel computation can be represented as a DAG
• = Work = time on a single processor
• = Depth = time assuming infinite processors
• = time on P processor using optimal scheduler
6/16/2010
Work Stealing Scheduler 12
Last Lecture Recap• Work law:
• Depth law:
6/16/2010
Work Stealing Scheduler 13
Last Lecture Recap• Work law:
• Depth law:
• Greedy scheduler is optimal within a factor of 2
6/16/2010
Work Stealing Scheduler 14
This Lecture• Design of a greedy scheduler• Task abstraction• Translating high-level abstractions to tasks
6/16/2010
Work Stealing Scheduler 15
This Lecture• Design of a greedy scheduler• Task abstraction• Translating high-level abstractions to tasks
6/16/2010
.NET.NETProgram
IntelTBB
TBBProgram
… Scheduler
Work Stealing Scheduler 16
(Simple) Tasks• A node in the DAG• Executing a task generates dependent subtasks
6/16/2010
Split(16) Merge(16)
Split(8) Merge(8)Sort(4)
Sort(4)
Split(8) Merge(8)Sort(4)
Sort(4)
Work Stealing Scheduler 17
(Simple) Tasks• A node in the DAG• Executing a task generates dependent subtasks• Note: Task C is generated by A or B, whoever finishes last
6/16/2010
Split(16) Merge(16)
Split(8) Merge(8)Sort(4)
Sort(4)
Split(8) Merge(8)Sort(4)
Sort(4)
CA
B
Work Stealing Scheduler 18
Design Constraints of the Scheduler• The DAG is generated dynamically• Based on inputs and program control flow• The graph is not known ahead of time
• The amount of work done by a task is dynamic• The weight of each node is not know ahead of time
• Number of processors P can change at runtime• Hardware processors are shared with other processes, kernel
6/16/2010
Work Stealing Scheduler 19
Design Requirements of the Scheduler
6/16/2010
Work Stealing Scheduler 20
Design Requirements of the Scheduler• Should be greedy• A processor cannot be idle when tasks are pending
• Should limit communication between processors
• Should schedule related tasks in the same processor• Tasks that are likely to access the same cachelines
6/16/2010
Work Stealing Scheduler 21
Attempt 0: Centralized Scheduler• “Manager distributes tasks to others”
• Manager: assigns tasks to workers, ensures no worker is idle
• Workers: On task completion, submit generated tasks to the manager
6/16/2010
Work Stealing Scheduler 22
Attempt 1: Centralized Work Queue• “All processors share a common work queue”
• Every processor dequeues a task from the work queue
• On task completion, enqueue the generated tasks to the work queue
6/16/2010
Work Stealing Scheduler 23
Attempt 2: Work Sharing• “Loaded workers share”
• Every processor pushes and pops tasks into a local work queue
• When the work queue gets large, send tasks to other processors
6/16/2010
Work Stealing Scheduler 24
Disadvantages of Work Sharing• If all processors are busy, each will spend time trying to
offload• “Perform communication when busy”
• Difficult to know the load on processors• A processor with two large tasks might take longer than a processor
with five small tasks
• Tasks might get shared multiple times before being executed
• Some processors can be idle while others are loaded• Not greedy
6/16/2010
Work Stealing Scheduler 25
Attempt 3: Work Stealing• “Idle workers steal”
• Each processor maintains a local work queue• Pushes generated tasks into the local queue• When local queue is empty, steal a task from another
processor
6/16/2010
Work Stealing Scheduler 26
Nice Properties of Work Stealing• Communication done only when idle• No communication when all processors are in full throttle
• Each task is stolen at most once
• This scheduler is greedy, assuming stealers always succeed
• Limited communication• steals on average for some stealing strategies
6/16/2010
Work Stealing Scheduler 27
Nice Properties of Work Stealing• Communication done only when idle• No communication when all processors are in full throttle
• Each task is stolen at most once
• This scheduler is greedy, assuming stealers always succeed
• Limited communication• steals on average for some stealing strategies
• Assignment 1 explores different stealing strategies
6/16/2010
Work Stealing Scheduler 28
Work Stealing Queue Datastructure• A specialized deque (Double-Ended Queue) with three
operations:• Push : Local processor adds newly created tasks • Pop : Local processor removes task to execute• Steal : Remote processors remove tasks
6/16/2010
Push
Work Stealing Scheduler 29
Work Stealing Queue Datastructure• A specialized deque (Double-Ended Queue) with three
operations:• Push : Local processor adds newly created tasks • Pop : Local processor removes task to execute• Steal : Remote processors remove tasks
6/16/2010
Push Pop?
Steal?
Pop?
Steal?
Work Stealing Scheduler 30
Work Stealing Queue Datastructure• A specialized deque (Double-Ended Queue) with three
operations:• Push : Local processor adds newly created tasks • Pop : Local processor removes task to execute• Steal : Remote processors remove tasks
6/16/2010
Push
Steal
Pop
Work Stealing Scheduler 31
Advantages• Stealers don’t interact with local processor when the
queue has more than one task• Popping recently pushed tasks improves locality• Stealers take the oldest tasks, which are likely to be
the largest (in practice)
6/16/2010
Push
Steal
Pop
Work Stealing Scheduler 32
For Assignment 1• We provide an implementation of Work Stealing
Queue• This implementation is thread-safe• Clients can concurrently call the push, pop, steal operations• You don’t need to use additional locks or other
synchronization• Implement the scheduler logic and stealing strategy• Hint: this implementation is single threaded
6/16/2010