Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining...

31
Pipelining and Retiming Prepared by Mark Jarvin

Transcript of Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining...

Page 1: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Pipelining and Retiming

Prepared by Mark Jarvin

Page 2: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Agenda

Synchronous circuit retiming Pipelining Software pipelining

Page 3: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

The Retiming Problem: Example

1

2

1

1a

c

y

b

D = 4 T = 4 Latency = 4 Throughput = 4

How can this be improved?

Pipelining?

Page 4: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

The Retiming Problem: Example

Latency = 6 Throughput = 3

Delay is not balanced This can still be improved

1

1a

cb

1

2 y

Page 5: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

The Retiming Problem: Example

Latency = 4 Throughput = 2

Now, delay is balanced

1

1a

cb

2 y

1

Page 6: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Observations

Some basic transformations can be used for cycle time reduction

The retiming transformation moves registers across gates

11

Page 7: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Observations

Levelization doesn’t help Only useful for acyclic circuits

Page 8: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Naïve Algorithm

while ( not timed ) {

pick a candidate gate;

apply retiming transformation;

do timing analysis;

}

Page 9: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Questions

Can we apply retiming in batch mode? i.e., simultaneously on all gates

Can we make sure the retimed circuit is optimal? Can we achieve this in polynomial time?

Page 10: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Retiming Circuit Model

, , ,

: gates and primary inputs

: : delay of gates

: : # registers between 2 gates

G V E d w

V

E V V

D V

W E

Page 11: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Retiming Circuit Model: Example 1

23

x

y

ab

c

vi

vx

vy

0 1

0

0

2

3

Page 12: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Retiming Circuit Model: Example 1

3 y

a

b

c

vi

vx

vy

1 0

0

0

2

3

2x

Page 13: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Retiming Circuit Model: Example 2

vh

va vb vc

vd

vevg vf

0

1

0

1 1

0

1

0

0

00

7 7 7

3 3 3

0 3

7

3

0

3 3 3

77

g

h a b c d

ef

Page 14: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Metrics

Path delay:

Path weight:

, ,i j kd p d v v d

, ,i j kw p w v v w

Page 15: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Metrics

Define weight and delay metrics for any given vertex pair:

Both quantities are undefined if there is no path p from u to v

,

, min

, max

p u v

p u v w p W u v

W u v w p

D u v d p

Page 16: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

W (D) Matrix for Example 2

W (D) a b c d e f g h

a 0 (3) 1 (6) 2 (9) 3 (12) 2 (16) 1 (13) 0 (10) 0 (10)

b 1 (20) 0 (3) 1 (6) 2 (9) 1 (13) 0 (10) 0 (17) 0 (17)

c 1 (27) 2 (30) 0 (3) 1 (6) 0 (10) 0 (17) 0 (24) 0 (24)

d 1 (27) 2 (30) 3 (33) 0 (3) 0 (10) 0 (17) 0 (24) 0 (24)

e 1 (24) 2 (27) 3 (30) 4 (33) 0 (7) 0 (14) 0 (21) 0 (21)

f 1 (17) 2 (20) 3 (23) 4 (26) 3 (30) 0 (7) 0 (14) 0 (14)

g 1 (10) 2 (13) 3 (16) 4 (19) 3 (23) 2 (20) 0 (7) 0 (7)

h 1 (3) 2 (6) 3 (9) 4 (12) 3 (16) 2 (13) 1 (10) 0 (0)

Page 17: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

The Retiming Transformation

How do we represent retiming? How does it affect G?

Informally: The transformation is fundamentally moving registers across

gates Represent it as the number of registers to push from a gate’s

outputs to its inputs Define this number for all gates

Page 18: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

The Retiming Transformation

Definition: a retiming of a network is an integer-valued vertex labelling that transforms into where for each edge :

, , ,G V E d w:r V G

, , ,G V E d w ,i jV V E

ij ij j iw w r r

Page 19: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

The Retiming Transformation

vi

vx

vy

1 0

0

0

2

3

Initially:

Apply retiming:

Finally:

Note: retiming will change the number of registers in general, but not the number of registers in a given cycle

vi

vx

vy

0 1

0

0

2

3

0, 1, 0ix xy iyw w w

0, 1, 0i x yr r r

1, 0, 0ix xy iyw w w

Page 20: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Legal and Feasible Retiming

A retiming is legal if the retimed network doesn’t contain negative weights:

For a given cycle time , the network is timing feasible if it can correctly operate under

This holds if for all

0ij ij j i

j i ij i ij

w w r r

r r w r w

, , , 1D i j W i j

Page 21: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Feasible Retiming

Furthermore:

Finally:

1 1 2

1 1

1 2 2 1

,

,

m

m m

ik k k k j

ik k i

k k k k

k j j k

j i

W i j w w w

w r r

w r r

w r r

W i j r r

, 1 1 ,j i ir r W i j r W i j

Page 22: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Feasible Test Algorithm

Any retiming must satisfy the system of difference constraints:

General approach: integer linear programming Special form: single-source longest path problem Note: we can skip the second inequality wherever

or

, ,

1 , ,

j i i j

j i

r r W i j V V E

r r W i j D i j

,D i j d j ,D i j d i

Page 23: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Feasible Test Algorithm

Longest path problem can be solved with Bellman-Ford Build a constraint graph with an edge from i to j if we

have a constraint of the form j i kr r b

13

vh

va vb vc

vd

vevg vf

0

-1

0

-1 -1

0-1

0

0

00

7 7 7

3 3 3

0 3

1

-11

-2 -1

Page 24: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Feasible Test Algorithm

The solution is feasible if there are no positive cycles If feasible, the longest distance of each vertex provides

the retiming function For the previous example, with reference node vh:

There are no positive cycles; hence, is a feasible clock period

1 2

2 1

3 0

2 0

a e

b f

c g

d h

r r

r r

r r

r r

13

Page 25: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Feasible Test Algorithm

Here, there is a positive cycle:

Hence, a clock period of 12 is not feasible

vh

va vb vc

vd

vevg vf

0

-1

0

-1 -1

0

-1

0

0

00

7 7 7

3 3 3

0 3

1 1

-1 0

12

b e f g bv v v v v

Page 26: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Optimally Retimed Example Circuit

7

3

0

3 3 3

77g

h a b c d

ef

vh

va vb vc

vd

vevg vf

0

0

1

0 1

1

0

0

1

11

7 7 7

3 3 3

0 3

Page 27: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Optimal Retiming

Binary search of minimum cycle timeoptimalRetiming ( G ) {

min = 0;

max = MAX;

while ( min ≠ max ) {

mid = ( max – min ) / 2;

if ( feasibleTest ( G, mid ) )

max = mid;

else

min = mid;

}

return min;

}

Page 28: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Optimal Retiming

Do we really need to search all clock periods? No… Optimal cycle time must be one of D(i,j) So, sort and search O(V2) clock periods Computing each D(i,j) requires O(VE+V2 lgV) time Overall, the complexity is O(VE lgV)

Page 29: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Optimal Retiming

Can we do better? Yes… Look at the delay-to-register ratios and maximum node delay of

the cycles in the circuit, where delay-to-register ratio and maximum node delay are defined as:

Then, the minimum feasible clock period lies in the range:

This improves the overall running time to O(VE lgD)

minmax max 1c G c G

R C G R C D

v C

e C

d vR C

w e

max :D d v v V

Page 30: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Pipelining

Can be thought of as a special case of retiming

fetch decode execute writeback

fetch decode execute writeback

fetch decode execute writeback

Page 31: Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining.

Software Pipelining

This can also be thought of in terms of retiming

Iteration

LoopBoundary