Derivation of Efficient FSM from Polyhedral Loop Nests

21
Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

description

Derivation of Efficient FSM from Polyhedral Loop Nests. Tomofumi Yuki , Antoine Morvan , Steven Derrien INRIA/ Université de Rennes 1. High-Level Synthesis. Writing HDL is too costly Led to emergence of HLS tools HLS is sensitive to input source Must be written in “HW-aware” manner - PowerPoint PPT Presentation

Transcript of Derivation of Efficient FSM from Polyhedral Loop Nests

Page 1: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Derivation of Efficient FSM from Polyhedral Loop Nests

Tomofumi Yuki, Antoine Morvan, Steven Derrien

INRIA/Université de Rennes 1

1

Page 2: Derivation of Efficient FSM from  Polyhedral  Loop Nests

High-Level Synthesis Writing HDL is too costly

Led to emergence of HLS tools HLS is sensitive to input source

Must be written in “HW-aware” manner Source-to-Source transformations

Common in optimizing compilers (semi-)automated exploration at HLS

stage Further enhance

productivity/performance2

Page 3: Derivation of Efficient FSM from  Polyhedral  Loop Nests

HLS Specific Transformations Not all optimizing compiler

transformations make sense in embedded context Its converse is also true

Finite State Machines is an example

for loops are preferred ingeneral purpose context

3

for i for j S0 for k S1

while (…) if (…) S0; if (…) S1; if (…) k = k+1; if (…) i=i+1; j=0;

FSMderivation

Page 4: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Contributions Analytical model of Loop Pipelining

Understanding when to use Nested LP w.r.t. Single Loop Pipelining

Derivation of Finite State Machines Handles imperfectly nested loops Based on polyhedral techniques

Pipelining of the control-path Computing n-states ahead Improves performance of the control-

path4

Page 5: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Outline Modeling Loop Pipelining

Single Loop Pipelining Nested Loop Pipelining NLP vs SLP

FSM Derivation Evaluation Conclusion

5

Page 6: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Single Loop Pipelining Overlapped execution of innermost loop

6

for i=1:M for j=1:N S(i,j);

for i=1:M for j=1:N stage0(i,j); stage1(i,j); stage2(i,j); stage3(i,j);

for i=1:M s0(i,1); s1(i,1); s0(i,2); s2(i,1); s1(i,2); … s3(i,1); s2(i,2); … s0(i,N); s3(i,2); … s1(i,N); … s2(i,N); s3(i,N);

Page 7: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Pipeline flush/fill Overhead Overhead for each iteration of the outer

loop

7

i=1 i=2 i=3

for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

under-utilized stages

Page 8: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Nested Loop Pipelining “Compress” by pipelining alltogether

8

i=1i=2

i=3

for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

for i=1:M j=j+1; j<N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

while(has_next) i,j=next(i,j) s0(i,j); s1(i,j); s2(i,j); s3(i,j);

Page 9: Derivation of Efficient FSM from  Polyhedral  Loop Nests

NLP Overhead Larger control-path

FSM for loop nest, instead of a single loop

FSM for SLP is a simple check on loop bound

Hinders maximum frequency Complex control-path may take longer

than one data-path stage Savings in flush/fill overhead must be

greater than the loss in frequency

9

Page 10: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Modeling Trade-offs Important parameters:

Frequency Degradation due to NLP Innermost trip count Number of pipeline stages

f*: NLP frequency normalized to SLP f* = 0.9 means 10% degradation in

frequency α= #stages / trip count

larger α means large flush/fill overhead

10

Page 11: Derivation of Efficient FSM from  Polyhedral  Loop Nests

When is NLP Beneficial?

11

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2 0.12 0.24 0.36 0.48 0.60 0.72 0.84 0.96 1.08 1.20

0.4 0.14 0.28 0.42 0.56 0.70 0.84 0.98 1.12 1.27 1.41

0.6 0.16 0.32 0.48 0.64 0.80 0.96 1.12 1.28 1.45 1.59

0.8 0.18 0.36 0.54 0.72 0.90 1.08 1.27 1.45 1.61 1.79

1 0.20 0.40 0.60 0.80 1.00 1.20 1.41 1.59 1.79 2.00

1.2 0.22 0.44 0.66 0.88 1.10 1.32 1.54 1.75 1.96 2.22

1.4 0.24 0.48 0.72 0.96 1.20 1.45 1.67 1.92 2.17 2.38

1.6 0.26 0.52 0.78 1.04 1.30 1.56 1.82 2.08 2.33 2.63

1.8 0.28 0.56 0.84 1.12 1.41 1.67 1.96 2.22 2.50 2.78

2 0.30 0.60 0.90 1.20 1.49 1.79 2.08 2.38 2.70 3.03

f*: higher = less degradation

α: larger = small trip count (innermost)

Program Characteristic

(cannot change)

Improving control-path is

possible

Model speedup as a function of f* and α

Page 12: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Outline Modeling Loop Pipelining FSM Derivation

Polyhedral Representation Computing Transitions State Look Ahead

Evaluation Conclusion

12

Page 13: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Polyhedral Representation Represent loops as mathematical

objects

13

for i = 0:N for j = 0:M S

S M

N

for i = 0:N for j = 0:i S0 for k = 0:N-i S1

S0S1

Page 14: Derivation of Efficient FSM from  Polyhedral  Loop Nests

FSM Derivation next function

Find a piece-wise function that gives the immediate successor in lexicographic order

Proposed in 1998 for low-level code generation

Direct Application to FSM Each piece = condition of transition Function = transition

Can be composed to obtain nextn

14

Page 15: Derivation of Efficient FSM from  Polyhedral  Loop Nests

State Look Ahead Pipelining the control-flow

When data-path is heavily pipelined, control-path becomes the critical path

Computing n-states ahead Allows n-stage pipelining of the control-

path

15

datapath

i,j i,’j’ i”,j”

next2

datapath

i,j i’,j’

next

Page 16: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Other Optimizations Merging transitions

next computed can have many transitions

Some can be merged by looking at its context

Common Sub-expressions HLS tools sometimes fail to catch

16

next(i,j) = (i,j+1) if i<N

next(i,j) = (N,j+1) if i=Nnext(i,j) = (i,j+1) if i≤N

if (a>b && c>d) A;if (a>b && e>f) B;

x = a>b;if (x && c>d) A;if (x && e>f) B;

Page 17: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Evaluation Methodology Focus on control-path

empty data-path (incrementing arrays) independent iterations loops with different shapes

3 versions: different pipelining SLP : innermost loop NLP: all loops FSM-LA2: while loop of FSM with next2

17

Page 18: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Evaluation: HLS Phase Maximum Target Frequency

18

rect 2d rect 3d triangular 2d

triangular 3d

0

100

200

300

400

500

SLP NLP FSM-LA2

Page 19: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Evaluation: Synthesized Design Achieved Frequency

19

rect 2d rect 3d triangular 2d

triangular 3d

0

100

200

300

400

500

SLP NLP FSM-LA2

Page 20: Derivation of Efficient FSM from  Polyhedral  Loop Nests

Conclusion Improved FSM generation from for

loops Example of HLS specific transformation State look ahead to pipeline control-path HLS tools currently lack compiler

optimizations Applied to Nested Loop Pipelining

Enlarge applicability by reducing its overhead

Future Directions Other uses of next function Other HLS-specific transformations

20

Page 21: Derivation of Efficient FSM from  Polyhedral  Loop Nests

21