Loop Scheduling and Software Pipelining
-
Upload
dorian-morris -
Category
Documents
-
view
46 -
download
3
description
Transcript of Loop Scheduling and Software Pipelining
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 1
Loop Scheduling and Software Pipelining
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 2
Reading List
• Slides: Topic 7 and 7a
• Other papers as assigned in class
or homework:
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 3
ABET Outcome
• Ability to apply knowledge of basic code generation techniques, e.g. Loop
scheduling e.g. software pipelining techniques to solve code generation
problems.
• An ability to identify, formulate and solve loops scheduling problems using
software pipelining techniques
• Ability to analyze the basic algorithms on the above techniques and
conduct experiments to show their effectiveness.
• Ability to use a modern compiler development platform and tools for the
practice of above.
• A Knowledge on contemporary issues on this topic.
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 4
Outline
• Brief overview
• Problem formulation of the modulo
scheduling problem
• Solution methods
• Summary
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 5
• Good IPO• Good LNO• Good global optimization• Good integration of
IPO/LNO/OPT• Smooth information passing
between FE and CG• Complete and flexible support of
inner-loop scheduling (SWP), instruction scheduling and register allocation
Inter-ProceduralOptimization (IPA)
Loop NestOptimization (LNO)
Global Optimization(OPT)
Source
InnermostLoop
scheduling
Global instscheduling
Reg alloc
Local instscheduling
Executable
ArchModels
BE/CG
ME
General Compiler Framework
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 6
• How to formulate the loop
scheduling problem?
• How to model it?
• How to solve it?
Questions ?
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 7
Questions (cont’d)
• Instruction scheduling for code without loops – a review
• Dependence graphs may become cyclic! So, “critical path
length” is less obvious!
• Is it becoming harder!(?)
• What new insights are required to formulate and solve it?
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 8
Challenges of Loop Scheduling
Right strategy?
A DDG With Cycles
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 9
Observations
• Execution of “good loops” tend to be “regular”
and “repetitive” - a “pattern” may appear
• This gives “cyclic scheduling” problem a new
twist!
• How to efficiently derive a “pattern”?
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 10
Problem Formulation (I)
Given a weighted dependence graph, derive a schedule
which is “time-optimal” under a machine model M.
Def: A schedule S of a loop L is time-optimal if among
all “legal” schedules of L, no other schedule that is faster
than S.
Note: There may be more than one time-optimal
schedule.
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 11
A Short Tour on Data A Short Tour on Data
Dependence Graphs for LoopsDependence Graphs for Loops
A Short Tour on Data A Short Tour on Data
Dependence Graphs for LoopsDependence Graphs for Loops
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 12
Basic Concept and MotivationBasic Concept and MotivationBasic Concept and MotivationBasic Concept and Motivation
• Data dependence between 2 accesses
The same memory location
Exist an execution path between them
One of them is a write
• Three types of data dependence
• Dependence graphs
• Things are not simple when dealing with loops
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 13
Types of Data DependenceTypes of Data DependenceTypes of Data DependenceTypes of Data Dependence
• Flow dependence
• Anti-dependence
• Output dependenceX =
X =
... 0
= X
X =
... -1
X =
= X...
--
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 14
Data DependenceData DependenceData DependenceData Dependence
Example 1:
S1: A = 0
S2: B = A
S3: C = A + D
S4: D = 2
Sx Sy Sy depends on Sx
S1
S2
S3
S4
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 15
Example 2:
S1: A = 0S2: B = AS3: A = B + 1S4: C = A
S1 0 S3 : Output-depS2 -1 S3 : anti-dep
Con’d
S1
S2
S3
S4
Data DependenceData DependenceData DependenceData Dependence
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 16
Should we consider input Should we consider input dependence?dependence?Should we consider input Should we consider input dependence?dependence?
= X
= X
Is the reading of the same X important?
Well, it may be!(if we intend to group the 2 reads together for cache optimization!)
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 17
Subscript VariablesSubscript VariablesSubscript VariablesSubscript Variables
• Extension of def-use chains to employ a more precise treatment of arrays, especially in iterative loops.
DO I = 1, N A(I + 1) = X(I + 1) + B(I) X(I) = A(I) * 5
ENDDO
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 18
Applications- register allocation- instruction scheduling- loop scheduling- vectorization- parallelization - memory hierarchy optimization
Con’d
Dependence GraphDependence GraphDependence GraphDependence Graph
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 19
Data Dependence in LoopsData Dependence in LoopsData Dependence in LoopsData Dependence in Loops
An Example
Find the dependence relations due to the array X in the program below:
(1) for I = 2 to 9 do
(2) X[I] = Y[I] + Z[I]
(3) A[I] = X[I-1] + 1
(4) end for
Solution
I = 2 I = 3 I = 4
(2) X[2]=Y[2]+Z[2] (3) A[2]=X[1]+1
To find the data dependence relations in a simple loop, we can unroll the loop and see which statement instances depend on which others:
X[3] =Y[3]+Z[3] A[3] =X[2]+1
X[4]=Y[4]+Z[4]A[4]=X[3]+1
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 20
In our example, there is a loop-carried, lexically forward flow
dependence relation.
Data Dependence in LoopsData Dependence in LoopsData Dependence in LoopsData Dependence in LoopsCon’d
S2
S3
(1)
Data dependence graph for statements in a loop.
- Loop-carried vs loop-independent- Lexical-forward vs lexical backward
Dependence distance = 1
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 21
An Example
for i = 0 to N - 1 do
a: a[i] = a[i - 1] + R[i];
b: b[i] = a[i] + c[i - 1];
c: c[i] = b[i] + 1;
end;
time i = 0 i = 1 i = 2
0 a
1 b
2 c a
3 b
4 c a
a
b
c
II
So, iteration interval II = 2
i = 3
tim
e
iterations
. . .
Assume each operation takes 1 cycle and thereis only one addition unit!
Note: We use a token here to
represent a flow dependence of distance = 1
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 22
Software Pipeline: Concept
Software Pipeline is a technique that reduces the executiontime of important loops by interweaving operations
from many iterations to optimize the use of resources.
0 1 2 3 4 5 6 7 8 9 10 11 12 16151413 time
ldffadds
stf
sub
cmpbg
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 23
% prologue
a[0] = a[-1] + R[0];
% pattern
for i = 0 to N-2 do
b[i] = a[i] + c[i -1];
a[i + 1] = a[i] + R[i + 1];
c[i] = b[i] + 1;
end;
% epilogue
b[N - 1] = a[N - 1] + c[N - 2];
c[N - 1] = b[N - 1] + 1
prologue
bi ,ci , a i+1
epilogue
The Structrure of the SWP code
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 24
Software Pipeline (Cont’d)
What limits the speed of a loop?• Data dependencies: recurrence initiation interval (rec_mii)• Processor resources: resource initiation interval (res_mii)
0 1 2 3 4 5 6 7 8 9 10 11 12 16151413 time
ldffadds
stf
sub
cmpbg
Initiation interval
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 25
Previous Approaches
• Approach I (Operational):
“Emulate” the loop execution under the machine model and a
“pattern” will eventually occur
[AikenNic88, EbciogluNic89, GaoEtAl91]
• Approach II (Periodic scheduling):
Specify the scheduling problem into a periodical scheduling
problem and find optimal solution
[Lam88,
RauEtAl81,GovindAltmanGao94]
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 26
Periodic Schedule(Modulo Scheduling)
The time (cycle) when the I-th instance of
the operation v is scheduled :
t(i, v) = T * i+ A[v] where T = II
so t(i + 1, v) -t(i, v) = T*(i + 1) – T*(i) = T
For our example:
t(i, v) = 2i + A[v]
where A(a) = 1
A(b) = 0
A(c) = 1
Question: Is this an optimal schedule?
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 27
Yes, the schedule
t(i, v) = 2i + A[v]
is time-optimal!
With II = 2
Periodic Schedule (cont’d)
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 28
Given a DDG of a loop L, how to determine the fastest
computation rate of L -- also called minimum initiation
interval (MII) ?
Hint: Consider the “Critical Cycles” as well
as critical resource usage in L
Restate the problem
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 29
Recurrence MII -- RecMII
• An Example
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 30
An Example (Revisit) : RecMII
for i = 0 to N - 1 do
a: a[i] = a[i - 1] + R[i];
b: b[i] = a[i] + c[i - 1];
c: c[i] = b[i] + 1;
end;
time i = 0 i = 1 i = 2
0 a
1 b
2 c a
3 b
4 c a
a
b
c
II
So, RecMII = 2
i = 3
tim
e
iterations
. . .
Assume each operation takes 1 cycle and thereis only one addition unit!
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 31
Theorem: The maximum computation rate of a loop is bounded by the following ratio
ropt = min{ Dc/Wc}
where C is a dependence cycle, Dc is the total dependence distance along C and Wc is total execution time of C. i.e.
(C)
Dc di
c
Wc wi
c
(di is the dependence distance along the edge i in C)
(wi is the edge weight along theedge i in C)
Maximum Computation RateHint: now one must think about deadlines.Hint: now one must think about deadlines.
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 32
And the optimal period
MII = Topt = 1/ropt
Def: A cycle is critical if the period of the cycle
equal to MII
(We should write it as RecMII)
Note: a loop may have multiple critical cycles!
RecMII (Contn’d)
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 33
An Example (Revisit) : RecMII
for i = 0 to N - 1 do
a: a[i] = a[i - 1] + R[i];
b: b[i] = a[i] + c[i - 1];
c: c[i] = b[i] + 1;
end;
time i = 0 i = 1 i = 2
0 a
1 b
2 c a
3 b
4 c a
a
b
c
II
So, RecMII = 2
i = 3
tim
e
iterations
. . .
Assume each operation takes 1 cycle and thereis only one addition unit!
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 34
How About Machine Resource and Register
Constraints?
• Numbers and types of FUs, etc,
(ResMII)
• Number of registers
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 35
Software Pipelining
• Review of previous work [RauFisher93] [Rau94]
• Minimum initiation interval [MII]
MII = max {RecMII, ResMII}
where
RecMII: determined by “critical’
recurrence cycles in DDG
ResMII: determined by resource
constraints (#s and types of FUs)
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 36
The Resource Constraint MII (ResMII)
• Can be calculated by totaling, for each resource, the usage
requirement imposed by one iteration of the loop
• can be done by bin-packing a resource reservation table
(expensive)
• usually derive a lower-bound is enough as a starting point to begin
the search process
• More price with complex hardware pipelines ?
Consider a simple example of n nodes and 1 FU -- what is ResMII ?Consider a simple example of n nodes and 1 FU -- what is ResMII ?
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 37
Example 1: Reservation Table - An Example
1
2
3
543210Stage
XXX1
XX2
X3
(a) a Pipeline M (b) The Reservation Table of M
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 38
A Quiz ?
• What is the RT of a fully pipelined adder
with 3 stages ?
• How about an unpipelined adder ?
• How to computer ResMII for each case ?
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 39
Modulo Reservation Table
• The modulo reservation table only has length = II
• Each entry in the table records a sequence of
reservations corresponds a sequence of slots every II
cycles
• Hints: think about your weekly calendar
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 40
Modulo Scheduling
MII
II = II + 1
Choose Next Operation X
Remove from L is L =<>?
succeed
failed try to place x in a time slot i in II
No
A legal scheduleis found
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 41
Heuristic Method for Modulo Scheduling
Why a simple variant list
scheduling may not work?
Hint: consider the deadline constraints of operations ina cycle.
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 42
4 2
2 4
[1]
B D
A C
0123
A B
C
D
0123
A
B
C
D
(a)
(b) MII = RecMII = 4
Note: if simple list scheduling is usedB cannot be scheduled due to thedeadline set by scheduling C: deadlock!
(c)MII = RecMII = 4
Counter Example I: List Scheduling May Fail !
Note: we cannot fire C as early as possible!
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 52
Heuristic (Aiken88, AikenNic88, Ebcioglu89, etc.)OperationalApproach Formal Model (GaoWonNin91)
PLDI'91
SoftwarePipelining
In-Exact Method (Heuristic)(RauGla81, Lam88, RauEtAI92, Huff93, DehnertTow93, Rau94, WanEis93, StouchininEtAI00)
Basic Formulation(DongenGao92)CONPAR'92
PeriodicScheduling
(Modulo Scheduling)
Register Optimal(Ninggao91, NingGao93, Ning93)
POPL'93
Resource ConstrainedExact Method ILP based (GovindAITGao94)
Micro27
Resource & Register"Showdown"(GovindAITGao95 , Altman95,(RuttenbergGao PLDI'95StouchininWoody96)
Exhaustive
EichenbergerDav95)PLDI'96
SearchEuroPar'96
(Altman95)FSA Co-Schedulingformulation(GovindAltmanGao'96)
FSA BasedMethod FSA Construction Method
(Model hardware (GovindAltmanGao98)
pipeline withsharing and hazards)
FSA Heuristic/optimization(ZhangGovindRyanGao99)
Theory of Co-Scheduling(GovindAltmanGao00)
A Taxonomy of Software Pipelining
23年 4月 19日 \course\cpeg421-08s\Topic-7.ppt 53
Advanced Topics
• Consider register constraints
• Realistic pipeline architecture constraints
(with structural hazards)
• Loop body with conditionals
• Multi-Dimensional Loops