Implementing a Task Decomposition
description
Transcript of Implementing a Task Decomposition
INTEL CONFIDENTIAL
Implementing a Task DecompositionIntroduction to Parallel Programming – Part 9
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
2
Review & Objectives
Previously:Described how the OpenMP task pragma is different from
the for pragmaShowed how to code task decomposition solutions for while
loop and recursive tasks, with the OpenMP task construct
At the end of this part you should be able to:Design and implement a task decomposition solution
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Case Study: The N Queens Problem
3
Is there a way to placeN queens on an N-by-Nchessboard such thatno queen threatens
another queen?
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
A Solution to the 4 Queens Problem
4
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Exhaustive Search
5
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Design #1 for Parallel Search
Create threads to explore different parts of the search tree simultaneously
If a node has childrenThe thread creates child nodesThe thread explores one child node itselfThread creates a new thread for every other
child node
6
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Design #1 for Parallel Search
7
Thread W
Thread W NewThread X
NewThread Y
NewThread Z
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Pros and Cons of Design #1
ProsSimple design, easy to implementBalances work among threads
ConsToo many threads createdLifetime of threads too shortOverhead costs too high
8
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Design #2 for Parallel Search
One thread created for each subtree rooted at a particular depth
Each thread sequentially explores its subtree
9
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Design #2 in Action
10
Thread1
Thread2
Thread3
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Pros and Cons of Design #2
ProsThread creation/termination time minimized
ConsSubtree sizes may vary dramaticallySome threads may finish long before othersImbalanced workloads lower efficiency
11
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Design #3 for Parallel Search
Main thread creates work pool—list of subtrees to explore
Main thread creates finite number of co-worker threads
Each subtree exploration is done by a single threadInactive threads go to pool to get more work
12
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Work Pool Analogy
More rows than workersEach worker takes an
unpicked row and picks the crop
After completing a row, the worker takes another unpicked row
Process continues until all rows have been harvested
13
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Design #3 in Action
14
Thread1
Thread2
Thread3
Thread3
Thread1
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Pros and Cons of Strategy #3
ProsThread creation/termination time minimizedWorkload balance better than strategy #2
ConsThreads need exclusive access to data
structure containing work to be done, a sequential component
Workload balance worse than strategy #1Conclusion
Good compromise between designs 1 and 2
15
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Implementing Strategy #3 for N Queens
Work pool consists of N boards representing N possible placements of queen on first row
16
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Parallel Program Design
One thread creates list of partially filled-in boardsFork: Create one thread per coreEach thread repeatedly gets board from list, searches
for solutions, and adds to solution count, until no more board on list
Join: Occurs when list is emptyOne thread prints number of solutions found
17
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Search Tree Node Structure
/* The ‘board’ struct contains information about a node in the search tree; i.e., partially filled- in board. The work pool is a singly linked list of ‘board’ structs. */
struct board { int pieces; /* # of queens on board*/ int places[MAX_N]; /* Queen’s pos in each row
*/ struct board *next; /* Next search tree node */};
18
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Key Code in main Function
struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;search_for_solutions (n, stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,
num_solutions);
19
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Insertion of OpenMP Code
struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;
#pragma omp parallel search_for_solutions (n, stack, &num_solutions);
printf ("The %d-queens puzzle has %d solutions\n", n, num_solutions);
20
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Original C Function to Get Work
void search_for_solutions (int n, struct board *stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);
while (stack != NULL) { ptr = stack; stack = stack->next; search (n, ptr, num_solutions); free (ptr); }}
21
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
C/OpenMP Function to Get Work
void search_for_solutions (int n, struct board *stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);
while (stack != NULL) {#pragma omp critical{ ptr = stack; stack = stack->next; }
search (n, ptr, num_solutions); free (ptr); }}
22
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Original C Search Function
void search (int n, struct board *ptr,int *num_solutions)
{ int i; int no_threats (struct board *);
if (ptr->pieces == n) { (*num_solutions)++; } else { ptr->pieces++; for (i = 0; i < n; i++) { ptr->places[ptr->pieces-1] = i; if (no_threats(ptr))
search (n, ptr, num_solutions); } ptr->pieces--; }}
23
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
C/OpenMP Search Function
void search (int n, struct board *ptr,int *num_solutions)
{ int i; int no_threats (struct board *);
if (ptr->pieces == n) { #pragma omp critical (*num_solutions)++; } else { ptr->pieces++; for (i = 0; i < n; i++) { ptr->places[ptr->pieces-1] = i; if (no_threats(ptr))
search (n, ptr, num_solutions); } ptr->pieces--; }}
24
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Only One Problem: It Doesn’t Work!
OpenMP program throws an exceptionCulprit: Variable stack
25
Heap
stack
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Problem Site
int main (){ struct board *stack; ... #pragma omp parallel search_for_solutions(n, stack, &num_solutions); ...}
void search_for_solutions (int n, struct board *stack, int *num_solutions){ ... while (stack != NULL) ...
26
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
1. Both Threads Point to Top
27
stack stack
Thread 1 Thread 2
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
2. Thread 1 Grabs First Element
28
stack
Thread 1 Thread 2
stackptr
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
3. Thread 2 Grabs “Next” Element
29
Thread 1 Thread 2
stackptr
stackptr
Error #1Thread 2
grabs same element
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
4. Thread 1 Deletes Element
30
stack
Thread 1 Thread 2
stackptr
?
Error #2Thread 2’s stack pointer dangles
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Demonstrate error #2
31
stack
Thread 1 Thread 2
stackptr
Thread 1 gets hits critical
region & reads stack
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Demonstrate error #2
32
stack
Thread 1 Thread 2
stackptr
Thread 1 copies stack to ptr
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Demonstrate error #2
33
stack
Thread 1 Thread 2
stackptr
Thread 1 advances stack
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Demonstrate error #2
34
stack
Thread 1 Thread 2
stackptr
Thread 1 exits critical region
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Demonstrate error #2
35
stack
Thread 1 Thread 2
stackptr
?Thread 1 frees ptr
Thread 2 stack points to
undefined value
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 1: Make stack Static
36
Thread 1 Thread 2
stack
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 1: Make stack Static
37
Thread 2
stack
Thread 1
stackptr
stackptr
Why would this work?
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 1: Make stack Static
38
Thread 2
stack
Thread 1
stackptr
stackptr
Thread 1 enters critical region
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 1: Make stack Static
39
Thread 2
stack
Thread 1
stackptr
stackptr
Thread 1 copies stack to ptr
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 1: Make stack Static
40
Thread 2
stack
Thread 1
stackptr
stackptr
Thread 1 advances stack
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 1: Make stack Static
41
Thread 2
stack
Thread 1
stackptr
stackptr
Thread 1 exits critical region
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 1: Make stack Static
42
Thread 2
stack
Thread 1
stackptr
stackptr
Thread 1 frees ptr – no dangling
memory
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Remedy 2: Use Indirection (Best choice)
43
Thread 1 Thread 2
&stack
Now data is encapsulated inside function calls and no longer susceptible to
overwriting “global/static” variable
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Corrected main Function
struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;#pragma omp parallel search_for_solutions (n, &stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,
num_solutions);
44
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Corrected Stack Access Function
void search_for_solutions (int n, struct board **stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);
while (*stack != NULL) {#pragma omp critical{ ptr = *stack;
*stack = (*stack)->next; } search (n, ptr, num_solutions); free (ptr); }}
45
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
References
Rohit Chandra, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon, Parallel Programming in OpenMP, Morgan Kaufmann (2001).
Barbara Chapman, Gabriele Jost, Ruud van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, MIT Press (2008).
Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).
46