NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and...
NTHU-CS
1
Performance-Optimal Clustering with Retiming
for Sequential Circuits
Tzu-Chieh Tien and Youn-Long Lin
Department of Computer Science
National Tsing Hua University
Hsin-Chu, Taiwan, R.O.C.
NTHU-CS
2
Outline
IntroductionPrevious WorkProposed ApproachExperimental ResultsConclusion and Future Research
NTHU-CS
3
Retiming
critical path delay = 8
retiming
critical path delay = 7
3 5 2
1
3 5 2
1
NTHU-CS
4
Performance-Driven Clustering
Minimize clock period under cluster-size constraint
3 5 2
1
NTHU-CS
5
3 5 2
1
Combining Clustering and Retiming
critical path delay = 7critical path delay = 8
inter-cluster delay = 2
clusteringw/o retiming consideration
clusteringw/ retiming
consideration
3 5 2
1
3 5 2
1
NTHU-CS
6
Problem Definition
Given a sequential circuit G,a target clock period c, and an area-bound number M
Find a clustered/retimed/node-replicated circuit Gr
clock period less than or equal to ceach cluster is of size M or less
NTHU-CS
7
Previous Work
P. Pan, A. K. Karandikar, and C. L. Liu, “Optimal Clock Period Clustering for Sequential Circuits with Retiming,” IEEE T-CAD, June 1998.
Optimal under the unit gate delay modelNear-optimal for the general gate delay model
J. Cong, H. Li, and C. Wu, “Simultaneous Circuit Partitioning/Clustering with Retiming for Performance Optimization,” DAC’99.
100X more efficient but still near-optimal
NTHU-CS
8
This Work
Optimal for the general gate delay model
More (2X) efficient than Pan’s approach
NTHU-CS
9
Pan’s Approach
Label each node v an l-value, l(v)Find a clustered-retimed circuit such
that all PO’s l-values less than or equal to c
Retiming solutionResulting clock period less than c +
max. gate delay
1/)()( cvlvr
NTHU-CS
10
Pan’s l-value of a Node
Total w1 edge weight of the longest path from PI’s to the node
w1 weight of edge e from u to v: w1(e) = - c * w(e) + d(v)
w(e): number of FF’s along e
w1(e) 2 - 1 3 0l(v) 0 2 1 4 4 < 6
target c = 6 2 5 3
NTHU-CS
11
Pan’s l-value Labeling
Traveling the whole circuit for updating l-values until no more updating in any node
Time complexity VDEVO log3
NTHU-CS
12
Our Approach
Modified l-value definitionOptimal for general delay modelBased on W.-J. Chen, “A Study on the
Relationship Between Retiming and Loop Folding,” Master thesis, National Tsing-Hua Univ., Taiwan, R.O.C., Aug. 1994.FIFO to aid circuit traveling during labelingImprove run timeTime complexity
VVFEVFO log21222
NTHU-CS
13
Modified l-value Labeling
If an FF’s position is occupied by a gate v,
detected by
)(),(*)(max)( vdvuwculvlvu
)(*1/)()(' vdccvlvl
l(v) 0 2 1
target c = 6
cvdvlcvl /)()(/)(
5 8 8 > 6
2 5 3
NTHU-CS
14Example (target c = 7, inter-cluster delay = 2)
5 2
l(v) 3 1 3 10 12 1
l(v) 3 1 12 7
3 5
3 5 2
1
1
3
3
1
3 5
3
3 1 5
212
9
3 7
5
NTHU-CS
15Example (Cont’) (target c = 7, inter-cluster delay = 2)
3 3 5
1 5 2
clustering connecting & retiming merging
3 5 2
1
3 5 2
1
3 5 2
1 3 5
NTHU-CS
16Example (target c = 6, inter-cluster delay = 2)
5 2
l(v) 3 1 3 10 11 1
l(v) 3 1 11 7
3 5
3 5 2
1
1
3
3
1
3 5
3
3 1 5
211
9
3 7
5
NTHU-CS
17Example of Pan’s Approach (target c = 6, inter-cluster delay = 2)
2
l(v) 3 1 3 10 1
l(v) 3 1 8 6
3 5
3 5 2
1
1
3
3
1
3 5
3
3 1 5
28
6
3 6
5
NTHU-CS
18Example of Pan’s (Cont’) (target c = 6, inter-cluster delay = 2)
3 3 5
1 2
clustering connecting & retiming merging
3 5 2
1
3 5 2
1
3 5 2
13
NTHU-CS
19
Experimental Results
26 ISCAS-89 Benchmark CircuitsPan’s approach produces suboptimal
results for 11 circuitsOur approach produces optimal result
for every circuitOur CPU time consumption is 50% of
Pan’s
NTHU-CS
20
Conclusion and Future Research
First exact algorithm for performance-optimal clustering with retiming under general gate delay model
Twice as fast as Pan’s near-optimal heuristic
Future research is to improve run time efficiency
NTHU-CS
21
NTHU-CS
22
NTHU-CS
23
NTHU-CS
24
Experimental Results
0
1
2
3
4
5
6
7
cloc
k pe
riod
(ns)
Pan's Ours
NTHU-CS
25
0.0
10.0
20.0
30.0
40.0
50.0
CPU
tim
e (se
c)
Pan's Ours
0
50000
100000
150000
200000
250000
s35932 s38417
CPU
tim
e (s
ec)
Experimental Results (Cont’)
0200400600800
10001200
s820 s832 s838 s1196 s1238 s1423 s1488 s1494 s5378 s9234
CPU
tim
e (s
ec)