Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining...

Post on 18-Jan-2016

224 views 3 download

Transcript of Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining...

Pipelining and Retiming

Prepared by Mark Jarvin

Agenda

Synchronous circuit retiming Pipelining Software pipelining

The Retiming Problem: Example

1

2

1

1a

c

y

b

D = 4 T = 4 Latency = 4 Throughput = 4

How can this be improved?

Pipelining?

The Retiming Problem: Example

Latency = 6 Throughput = 3

Delay is not balanced This can still be improved

1

1a

cb

1

2 y

The Retiming Problem: Example

Latency = 4 Throughput = 2

Now, delay is balanced

1

1a

cb

2 y

1

Observations

Some basic transformations can be used for cycle time reduction

The retiming transformation moves registers across gates

11

Observations

Levelization doesn’t help Only useful for acyclic circuits

Naïve Algorithm

while ( not timed ) {

pick a candidate gate;

apply retiming transformation;

do timing analysis;

}

Questions

Can we apply retiming in batch mode? i.e., simultaneously on all gates

Can we make sure the retimed circuit is optimal? Can we achieve this in polynomial time?

Retiming Circuit Model

, , ,

: gates and primary inputs

: : delay of gates

: : # registers between 2 gates

G V E d w

V

E V V

D V

W E

Retiming Circuit Model: Example 1

23

x

y

ab

c

vi

vx

vy

0 1

0

0

2

3

Retiming Circuit Model: Example 1

3 y

a

b

c

vi

vx

vy

1 0

0

0

2

3

2x

Retiming Circuit Model: Example 2

vh

va vb vc

vd

vevg vf

0

1

0

1 1

0

1

0

0

00

7 7 7

3 3 3

0 3

7

3

0

3 3 3

77

g

h a b c d

ef

Metrics

Path delay:

Path weight:

, ,i j kd p d v v d

, ,i j kw p w v v w

Metrics

Define weight and delay metrics for any given vertex pair:

Both quantities are undefined if there is no path p from u to v

,

, min

, max

p u v

p u v w p W u v

W u v w p

D u v d p

W (D) Matrix for Example 2

W (D) a b c d e f g h

a 0 (3) 1 (6) 2 (9) 3 (12) 2 (16) 1 (13) 0 (10) 0 (10)

b 1 (20) 0 (3) 1 (6) 2 (9) 1 (13) 0 (10) 0 (17) 0 (17)

c 1 (27) 2 (30) 0 (3) 1 (6) 0 (10) 0 (17) 0 (24) 0 (24)

d 1 (27) 2 (30) 3 (33) 0 (3) 0 (10) 0 (17) 0 (24) 0 (24)

e 1 (24) 2 (27) 3 (30) 4 (33) 0 (7) 0 (14) 0 (21) 0 (21)

f 1 (17) 2 (20) 3 (23) 4 (26) 3 (30) 0 (7) 0 (14) 0 (14)

g 1 (10) 2 (13) 3 (16) 4 (19) 3 (23) 2 (20) 0 (7) 0 (7)

h 1 (3) 2 (6) 3 (9) 4 (12) 3 (16) 2 (13) 1 (10) 0 (0)

The Retiming Transformation

How do we represent retiming? How does it affect G?

Informally: The transformation is fundamentally moving registers across

gates Represent it as the number of registers to push from a gate’s

outputs to its inputs Define this number for all gates

The Retiming Transformation

Definition: a retiming of a network is an integer-valued vertex labelling that transforms into where for each edge :

, , ,G V E d w:r V G

, , ,G V E d w ,i jV V E

ij ij j iw w r r

The Retiming Transformation

vi

vx

vy

1 0

0

0

2

3

Initially:

Apply retiming:

Finally:

Note: retiming will change the number of registers in general, but not the number of registers in a given cycle

vi

vx

vy

0 1

0

0

2

3

0, 1, 0ix xy iyw w w

0, 1, 0i x yr r r

1, 0, 0ix xy iyw w w

Legal and Feasible Retiming

A retiming is legal if the retimed network doesn’t contain negative weights:

For a given cycle time , the network is timing feasible if it can correctly operate under

This holds if for all

0ij ij j i

j i ij i ij

w w r r

r r w r w

, , , 1D i j W i j

Feasible Retiming

Furthermore:

Finally:

1 1 2

1 1

1 2 2 1

,

,

m

m m

ik k k k j

ik k i

k k k k

k j j k

j i

W i j w w w

w r r

w r r

w r r

W i j r r

, 1 1 ,j i ir r W i j r W i j

Feasible Test Algorithm

Any retiming must satisfy the system of difference constraints:

General approach: integer linear programming Special form: single-source longest path problem Note: we can skip the second inequality wherever

or

, ,

1 , ,

j i i j

j i

r r W i j V V E

r r W i j D i j

,D i j d j ,D i j d i

Feasible Test Algorithm

Longest path problem can be solved with Bellman-Ford Build a constraint graph with an edge from i to j if we

have a constraint of the form j i kr r b

13

vh

va vb vc

vd

vevg vf

0

-1

0

-1 -1

0-1

0

0

00

7 7 7

3 3 3

0 3

1

-11

-2 -1

Feasible Test Algorithm

The solution is feasible if there are no positive cycles If feasible, the longest distance of each vertex provides

the retiming function For the previous example, with reference node vh:

There are no positive cycles; hence, is a feasible clock period

1 2

2 1

3 0

2 0

a e

b f

c g

d h

r r

r r

r r

r r

13

Feasible Test Algorithm

Here, there is a positive cycle:

Hence, a clock period of 12 is not feasible

vh

va vb vc

vd

vevg vf

0

-1

0

-1 -1

0

-1

0

0

00

7 7 7

3 3 3

0 3

1 1

-1 0

12

b e f g bv v v v v

Optimally Retimed Example Circuit

7

3

0

3 3 3

77g

h a b c d

ef

vh

va vb vc

vd

vevg vf

0

0

1

0 1

1

0

0

1

11

7 7 7

3 3 3

0 3

Optimal Retiming

Binary search of minimum cycle timeoptimalRetiming ( G ) {

min = 0;

max = MAX;

while ( min ≠ max ) {

mid = ( max – min ) / 2;

if ( feasibleTest ( G, mid ) )

max = mid;

else

min = mid;

}

return min;

}

Optimal Retiming

Do we really need to search all clock periods? No… Optimal cycle time must be one of D(i,j) So, sort and search O(V2) clock periods Computing each D(i,j) requires O(VE+V2 lgV) time Overall, the complexity is O(VE lgV)

Optimal Retiming

Can we do better? Yes… Look at the delay-to-register ratios and maximum node delay of

the cycles in the circuit, where delay-to-register ratio and maximum node delay are defined as:

Then, the minimum feasible clock period lies in the range:

This improves the overall running time to O(VE lgD)

minmax max 1c G c G

R C G R C D

v C

e C

d vR C

w e

max :D d v v V

Pipelining

Can be thought of as a special case of retiming

fetch decode execute writeback

fetch decode execute writeback

fetch decode execute writeback

Software Pipelining

This can also be thought of in terms of retiming

Iteration

LoopBoundary