Basic Loop Parallelization
description
Transcript of Basic Loop Parallelization
Basic Loop ParallelizationBasic Loop Parallelization
Taewook Eom
RTOS Lab., School of EECS, SNU
2
OutlineOutline
Data dependences
Loop parallelization
How to extract data dependence information
3
Loop Parallelization GoalLoop Parallelization Goal
DoAll loopsLoops whose iterations can execute in parallel
Example
For (i = 0; i < n; i++)
A[i] = B[i] + C[i]
Statically assign each iteration to a processor
4
Data Dependence of ScalarsData Dependence of Scalars
Data DependencesTrue dependence: write read
Anti dependence: read write
Output dependence: write write
Data dependence exists from a dynamic instance i to i’ iff
either i or i’ is a write operation
i and i’ refer to the same variable
i executes before i’
5
Data Dependences for Arrays (1)Data Dependences for Arrays (1)
DoAll loop vs. DoAcross loop
Loop-carried vs. Loop-independent dependenceDependence exists across iterations vs. within one iteration
Recognize DoAll loops (intuitively)Find data dependences in loops
No dependences crossing iteration boundaries ⇒ Parallelizable
for (i = 2; i < 5; i++)A[i] = A[i-2] + 1
for (i = 0; i < 3; i++)A[i] = A[i] + 1
6
Data Dependences for Arrays (2)Data Dependences for Arrays (2)
AssumptionsA loop is in canonical form
• Index runs from 1 to N by increments of 1
Perfectly nested loops• Only the innermost loop has statements other than for
statements
ExampleFor (i1 = 1; i1 <= n1; i1++)
For (i2 = 1; i2 <= n2; i2++)
For (i3 = 1; i3 <= n3; i3++)
A[i1, i2, i3] = A[i1, i2, i3] + 1;
7
Iteration Space (1)Iteration Space (1)
Iteration is represented as coordinates in iteration space
n-dimensional discrete space for n-deep loops
Iteration space diagrams show dependences between all iterations of a loop
Vs. Dependence graph shows all dependences between two statements in a loop
Example
i
j
Iteration Space Traversal
FOR (i = 0; i < 5; i++) FOR (j = 0; j < 7; j++) A[i, j] = A[i-1, j-1] + 1
8
Iteration Space (2)Iteration Space (2)
Sequential execution order of iterationLexicographic order
[0,0], [0,1], … , [0,6], [1,1], … , [1,6], [2,2], …
Iteration is lexicographically less than , iff
Examples• (1,6,0) < (2, 0, 0)
• (0,3,1) < (0,5,0)
cccc iiandiiiitsc )..,()..,.(. 1111
i
i
9
Distance Vectors (1)Distance Vectors (1)
A loop has a distance vectorIf there exists data dependence from a node to a later node and
Indicates how many iterations apart are source and sink of dependence
Example
Distance vector = [1,1]
FOR (i = 0; i < 5; i++) FOR (j = 0; j < 7; j++) A[i, j] = A[i-1, j-1] + 1
iid
i
i
d
i
j
Iteration Space
10
Distance Vectors (2)Distance Vectors (2)
Since then is lexicographically
greater than or equal to 0
All dependence must point forward in time
Distance vector (infinitely large set)[0,0], [0,1], … , [0,n], [1,-n], … , [1,0], … , [1,n], … , [n,-n], … , [n,0], … , [n,n]
Summary: Direction vectors0 or lexicographically positive
[0,0], [0,+], [+,-], [+,0], [+,+]
ii
d 0d
11
Direction Vectors (1)Direction Vectors (1)
The sign of the distance
There are different notations< or +: dependence from earlier to later iteration
= or 0: dependence within the same iteration
> or -: dependence from later to earlier iteration
ExampleFor (i = 0; i < n; i++) For (j = 0; j < n; j++) A[i, j] = A[i-2, j+3]
i = (1, 4)i’ = (3, 1)d = i’ - i = (2, -3)Direction = (<, >) or (+, -)
12
Direction Vectors (2)Direction Vectors (2)
What can the direction vectors for 3-D loops be like?
[? ? ?]
[0 ? ?]
[0 0 ?]
[0 0 0]
[0 0 - ]
[0 0 +]
[0 + ?]
[+ ? ?]
[0 - ?]
[ - ? ?]
Backwarddirection
…
…
13
Dependence VectorsDependence Vectors
Dependence VectorSet of distance vectors
ExampleFor (I = 0; i < n; i++)
For (j = 0; j < n; j++) {
A[i][j] = A[i][j] + 1;
B[i][j] = B[i][j-1] + 1;
}
Test for parallelization:A loop with data dependence is parallelizable if for all = (0)d
Dd
Distance vectors : [0, 0], [0, 1]Dependence vectors : [0,[0, 1]]
14
n-Dimensional Loopsn-Dimensional Loops
1
0FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i, j-1] + 1
FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i-1, j] + 1
FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i-1, j-1] + 1
0
1
1
1
j
i
j
i
j
i
Loop i isparallelizable
Loop j isparallelizable
Loop j isparallelizable
15
Test for ParallelizationTest for Parallelization
is loop-carried at level iif di is the first non-zero element
⇒ does not affect the parallelizability of inner loop
Ex) (0,0,2,0,1, …) : level 3 dependence
The i th loop of an n-dimensional loop is parallelizable if there does not exist any level i dependences
The i th loop of an n-dimensional loop is parallelizable for all dependences , either
(d1, … ,di-1) > 0 (= lexicographically forward)
(d1, … ,di) = 0
)...,,( 1 nddd
),...,,( 21 ndddd
16
Improving ParallelizabilityImproving Parallelizability
Scalar Privatization
Loop-carried dependence is an anti dependence• No value is really carried across the loop
Data is local to each loop iteration• Eliminating loop carried dependences
Scalar privatization should be applied before any serious parallelization attempt
float t;for (i = 0; i < n; i++) { t = A[i]; b[i] = t * t;}
for (i = 0; i < n; i++) { float t; t = A[i]; b[i] = t * t;}
True dependence(loop-indep.)
Anti dependence(loop-carried)
Not parallelizable!! Parallelizable!!
17
SummarySummary
Data Dependenceread/write
same variable
in direction of sequential ordering
RepresentationIteration space
Dependence vectors• Distance vectors
• Direction vectors
Parallelization Testing