Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining...
-
Upload
aldous-bennett -
Category
Documents
-
view
224 -
download
3
Transcript of Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining...
Pipelining and Retiming
Prepared by Mark Jarvin
Agenda
Synchronous circuit retiming Pipelining Software pipelining
The Retiming Problem: Example
1
2
1
1a
c
y
b
D = 4 T = 4 Latency = 4 Throughput = 4
How can this be improved?
Pipelining?
The Retiming Problem: Example
Latency = 6 Throughput = 3
Delay is not balanced This can still be improved
1
1a
cb
1
2 y
The Retiming Problem: Example
Latency = 4 Throughput = 2
Now, delay is balanced
1
1a
cb
2 y
1
Observations
Some basic transformations can be used for cycle time reduction
The retiming transformation moves registers across gates
11
Observations
Levelization doesn’t help Only useful for acyclic circuits
Naïve Algorithm
while ( not timed ) {
pick a candidate gate;
apply retiming transformation;
do timing analysis;
}
Questions
Can we apply retiming in batch mode? i.e., simultaneously on all gates
Can we make sure the retimed circuit is optimal? Can we achieve this in polynomial time?
Retiming Circuit Model
, , ,
: gates and primary inputs
: : delay of gates
: : # registers between 2 gates
G V E d w
V
E V V
D V
W E
Retiming Circuit Model: Example 1
23
x
y
ab
c
vi
vx
vy
0 1
0
0
2
3
Retiming Circuit Model: Example 1
3 y
a
b
c
vi
vx
vy
1 0
0
0
2
3
2x
Retiming Circuit Model: Example 2
vh
va vb vc
vd
vevg vf
0
1
0
1 1
0
1
0
0
00
7 7 7
3 3 3
0 3
7
3
0
3 3 3
77
g
h a b c d
ef
Metrics
Path delay:
Path weight:
, ,i j kd p d v v d
, ,i j kw p w v v w
Metrics
Define weight and delay metrics for any given vertex pair:
Both quantities are undefined if there is no path p from u to v
,
, min
, max
p u v
p u v w p W u v
W u v w p
D u v d p
W (D) Matrix for Example 2
W (D) a b c d e f g h
a 0 (3) 1 (6) 2 (9) 3 (12) 2 (16) 1 (13) 0 (10) 0 (10)
b 1 (20) 0 (3) 1 (6) 2 (9) 1 (13) 0 (10) 0 (17) 0 (17)
c 1 (27) 2 (30) 0 (3) 1 (6) 0 (10) 0 (17) 0 (24) 0 (24)
d 1 (27) 2 (30) 3 (33) 0 (3) 0 (10) 0 (17) 0 (24) 0 (24)
e 1 (24) 2 (27) 3 (30) 4 (33) 0 (7) 0 (14) 0 (21) 0 (21)
f 1 (17) 2 (20) 3 (23) 4 (26) 3 (30) 0 (7) 0 (14) 0 (14)
g 1 (10) 2 (13) 3 (16) 4 (19) 3 (23) 2 (20) 0 (7) 0 (7)
h 1 (3) 2 (6) 3 (9) 4 (12) 3 (16) 2 (13) 1 (10) 0 (0)
The Retiming Transformation
How do we represent retiming? How does it affect G?
Informally: The transformation is fundamentally moving registers across
gates Represent it as the number of registers to push from a gate’s
outputs to its inputs Define this number for all gates
The Retiming Transformation
Definition: a retiming of a network is an integer-valued vertex labelling that transforms into where for each edge :
, , ,G V E d w:r V G
, , ,G V E d w ,i jV V E
ij ij j iw w r r
The Retiming Transformation
vi
vx
vy
1 0
0
0
2
3
Initially:
Apply retiming:
Finally:
Note: retiming will change the number of registers in general, but not the number of registers in a given cycle
vi
vx
vy
0 1
0
0
2
3
0, 1, 0ix xy iyw w w
0, 1, 0i x yr r r
1, 0, 0ix xy iyw w w
Legal and Feasible Retiming
A retiming is legal if the retimed network doesn’t contain negative weights:
For a given cycle time , the network is timing feasible if it can correctly operate under
This holds if for all
0ij ij j i
j i ij i ij
w w r r
r r w r w
, , , 1D i j W i j
Feasible Retiming
Furthermore:
Finally:
1 1 2
1 1
1 2 2 1
,
,
m
m m
ik k k k j
ik k i
k k k k
k j j k
j i
W i j w w w
w r r
w r r
w r r
W i j r r
, 1 1 ,j i ir r W i j r W i j
Feasible Test Algorithm
Any retiming must satisfy the system of difference constraints:
General approach: integer linear programming Special form: single-source longest path problem Note: we can skip the second inequality wherever
or
, ,
1 , ,
j i i j
j i
r r W i j V V E
r r W i j D i j
,D i j d j ,D i j d i
Feasible Test Algorithm
Longest path problem can be solved with Bellman-Ford Build a constraint graph with an edge from i to j if we
have a constraint of the form j i kr r b
13
vh
va vb vc
vd
vevg vf
0
-1
0
-1 -1
0-1
0
0
00
7 7 7
3 3 3
0 3
1
-11
-2 -1
Feasible Test Algorithm
The solution is feasible if there are no positive cycles If feasible, the longest distance of each vertex provides
the retiming function For the previous example, with reference node vh:
There are no positive cycles; hence, is a feasible clock period
1 2
2 1
3 0
2 0
a e
b f
c g
d h
r r
r r
r r
r r
13
Feasible Test Algorithm
Here, there is a positive cycle:
Hence, a clock period of 12 is not feasible
vh
va vb vc
vd
vevg vf
0
-1
0
-1 -1
0
-1
0
0
00
7 7 7
3 3 3
0 3
1 1
-1 0
12
b e f g bv v v v v
Optimally Retimed Example Circuit
7
3
0
3 3 3
77g
h a b c d
ef
vh
va vb vc
vd
vevg vf
0
0
1
0 1
1
0
0
1
11
7 7 7
3 3 3
0 3
Optimal Retiming
Binary search of minimum cycle timeoptimalRetiming ( G ) {
min = 0;
max = MAX;
while ( min ≠ max ) {
mid = ( max – min ) / 2;
if ( feasibleTest ( G, mid ) )
max = mid;
else
min = mid;
}
return min;
}
Optimal Retiming
Do we really need to search all clock periods? No… Optimal cycle time must be one of D(i,j) So, sort and search O(V2) clock periods Computing each D(i,j) requires O(VE+V2 lgV) time Overall, the complexity is O(VE lgV)
Optimal Retiming
Can we do better? Yes… Look at the delay-to-register ratios and maximum node delay of
the cycles in the circuit, where delay-to-register ratio and maximum node delay are defined as:
Then, the minimum feasible clock period lies in the range:
This improves the overall running time to O(VE lgD)
minmax max 1c G c G
R C G R C D
v C
e C
d vR C
w e
max :D d v v V
Pipelining
Can be thought of as a special case of retiming
fetch decode execute writeback
fetch decode execute writeback
fetch decode execute writeback
Software Pipelining
This can also be thought of in terms of retiming
Iteration
LoopBoundary