Derivation of Efficient FSM from Polyhedral Loop Nests
description
Transcript of Derivation of Efficient FSM from Polyhedral Loop Nests
![Page 1: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/1.jpg)
Derivation of Efficient FSM from Polyhedral Loop Nests
Tomofumi Yuki, Antoine Morvan, Steven Derrien
INRIA/Université de Rennes 1
1
![Page 2: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/2.jpg)
High-Level Synthesis Writing HDL is too costly
Led to emergence of HLS tools HLS is sensitive to input source
Must be written in “HW-aware” manner Source-to-Source transformations
Common in optimizing compilers (semi-)automated exploration at HLS
stage Further enhance
productivity/performance2
![Page 3: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/3.jpg)
HLS Specific Transformations Not all optimizing compiler
transformations make sense in embedded context Its converse is also true
Finite State Machines is an example
for loops are preferred ingeneral purpose context
3
for i for j S0 for k S1
while (…) if (…) S0; if (…) S1; if (…) k = k+1; if (…) i=i+1; j=0;
FSMderivation
![Page 4: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/4.jpg)
Contributions Analytical model of Loop Pipelining
Understanding when to use Nested LP w.r.t. Single Loop Pipelining
Derivation of Finite State Machines Handles imperfectly nested loops Based on polyhedral techniques
Pipelining of the control-path Computing n-states ahead Improves performance of the control-
path4
![Page 5: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/5.jpg)
Outline Modeling Loop Pipelining
Single Loop Pipelining Nested Loop Pipelining NLP vs SLP
FSM Derivation Evaluation Conclusion
5
![Page 6: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/6.jpg)
Single Loop Pipelining Overlapped execution of innermost loop
6
for i=1:M for j=1:N S(i,j);
for i=1:M for j=1:N stage0(i,j); stage1(i,j); stage2(i,j); stage3(i,j);
for i=1:M s0(i,1); s1(i,1); s0(i,2); s2(i,1); s1(i,2); … s3(i,1); s2(i,2); … s0(i,N); s3(i,2); … s1(i,N); … s2(i,N); s3(i,N);
![Page 7: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/7.jpg)
Pipeline flush/fill Overhead Overhead for each iteration of the outer
loop
7
i=1 i=2 i=3
for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
under-utilized stages
![Page 8: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/8.jpg)
Nested Loop Pipelining “Compress” by pipelining alltogether
8
i=1i=2
i=3
for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
for i=1:M j=j+1; j<N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
while(has_next) i,j=next(i,j) s0(i,j); s1(i,j); s2(i,j); s3(i,j);
![Page 9: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/9.jpg)
NLP Overhead Larger control-path
FSM for loop nest, instead of a single loop
FSM for SLP is a simple check on loop bound
Hinders maximum frequency Complex control-path may take longer
than one data-path stage Savings in flush/fill overhead must be
greater than the loss in frequency
9
![Page 10: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/10.jpg)
Modeling Trade-offs Important parameters:
Frequency Degradation due to NLP Innermost trip count Number of pipeline stages
f*: NLP frequency normalized to SLP f* = 0.9 means 10% degradation in
frequency α= #stages / trip count
larger α means large flush/fill overhead
10
![Page 11: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/11.jpg)
When is NLP Beneficial?
11
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2 0.12 0.24 0.36 0.48 0.60 0.72 0.84 0.96 1.08 1.20
0.4 0.14 0.28 0.42 0.56 0.70 0.84 0.98 1.12 1.27 1.41
0.6 0.16 0.32 0.48 0.64 0.80 0.96 1.12 1.28 1.45 1.59
0.8 0.18 0.36 0.54 0.72 0.90 1.08 1.27 1.45 1.61 1.79
1 0.20 0.40 0.60 0.80 1.00 1.20 1.41 1.59 1.79 2.00
1.2 0.22 0.44 0.66 0.88 1.10 1.32 1.54 1.75 1.96 2.22
1.4 0.24 0.48 0.72 0.96 1.20 1.45 1.67 1.92 2.17 2.38
1.6 0.26 0.52 0.78 1.04 1.30 1.56 1.82 2.08 2.33 2.63
1.8 0.28 0.56 0.84 1.12 1.41 1.67 1.96 2.22 2.50 2.78
2 0.30 0.60 0.90 1.20 1.49 1.79 2.08 2.38 2.70 3.03
f*: higher = less degradation
α: larger = small trip count (innermost)
Program Characteristic
(cannot change)
Improving control-path is
possible
Model speedup as a function of f* and α
![Page 12: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/12.jpg)
Outline Modeling Loop Pipelining FSM Derivation
Polyhedral Representation Computing Transitions State Look Ahead
Evaluation Conclusion
12
![Page 13: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/13.jpg)
Polyhedral Representation Represent loops as mathematical
objects
13
for i = 0:N for j = 0:M S
S M
N
for i = 0:N for j = 0:i S0 for k = 0:N-i S1
S0S1
![Page 14: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/14.jpg)
FSM Derivation next function
Find a piece-wise function that gives the immediate successor in lexicographic order
Proposed in 1998 for low-level code generation
Direct Application to FSM Each piece = condition of transition Function = transition
Can be composed to obtain nextn
14
![Page 15: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/15.jpg)
State Look Ahead Pipelining the control-flow
When data-path is heavily pipelined, control-path becomes the critical path
Computing n-states ahead Allows n-stage pipelining of the control-
path
15
datapath
i,j i,’j’ i”,j”
next2
datapath
i,j i’,j’
next
![Page 16: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/16.jpg)
Other Optimizations Merging transitions
next computed can have many transitions
Some can be merged by looking at its context
Common Sub-expressions HLS tools sometimes fail to catch
16
next(i,j) = (i,j+1) if i<N
next(i,j) = (N,j+1) if i=Nnext(i,j) = (i,j+1) if i≤N
if (a>b && c>d) A;if (a>b && e>f) B;
x = a>b;if (x && c>d) A;if (x && e>f) B;
![Page 17: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/17.jpg)
Evaluation Methodology Focus on control-path
empty data-path (incrementing arrays) independent iterations loops with different shapes
3 versions: different pipelining SLP : innermost loop NLP: all loops FSM-LA2: while loop of FSM with next2
17
![Page 18: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/18.jpg)
Evaluation: HLS Phase Maximum Target Frequency
18
rect 2d rect 3d triangular 2d
triangular 3d
0
100
200
300
400
500
SLP NLP FSM-LA2
![Page 19: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/19.jpg)
Evaluation: Synthesized Design Achieved Frequency
19
rect 2d rect 3d triangular 2d
triangular 3d
0
100
200
300
400
500
SLP NLP FSM-LA2
![Page 20: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/20.jpg)
Conclusion Improved FSM generation from for
loops Example of HLS specific transformation State look ahead to pipeline control-path HLS tools currently lack compiler
optimizations Applied to Nested Loop Pipelining
Enlarge applicability by reducing its overhead
Future Directions Other uses of next function Other HLS-specific transformations
20
![Page 21: Derivation of Efficient FSM from Polyhedral Loop Nests](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166cb550346895ddad7ac/html5/thumbnails/21.jpg)
21