1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
-
Upload
leonard-anthony -
Category
Documents
-
view
223 -
download
0
Transcript of 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
![Page 1: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/1.jpg)
1
EECS 219BEECS 219BSpring 2001Spring 2001
Timing Optimization
Andreas Kuehlmann
![Page 2: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/2.jpg)
2
Restructuring for Timing OptimizationRestructuring for Timing Optimization
Outline:• Definitions and problem statement• Overview of techniques (motivated by adders)
– Tree height reduction (THR)– Generalized bypass transform (GBX)– Generalized select transform (GST)– Partial collapsing
![Page 3: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/3.jpg)
3
Circuit RestructuringCircuit Restructuring
Approaches:
Local: • Mimic optimization techniques in adders
– Carry lookahead (THR tree height reduction)– Conditional sum (GST transformation)– Carry bypass (GBX transformation)
Global:• Reduce depth of entire circuit
– Partial collapsing– Boolean simplification
![Page 4: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/4.jpg)
4
Tree-Height Reduction (THR)Tree-Height Reduction (THR)
n
l m
i j
h
k
3
6
5 5
1 4
1
0 0 0 0 2 0 0
a b c d e f g
i
1
0 0
a b
m
j
h
k
3
41
0 0 2 0 0
c d e f g
n’Duplicatedlogic
12
00
5Criticalregion
CollapsedCritical region
Singh’88:
![Page 5: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/5.jpg)
5
Tree-Height ReductionTree-Height Reduction
i
1
0 0
a b
m
j
h
k
3
41
0 0 2 0 0
c d e f g
n’Duplicatedlogic
12
00
5
i
1
0 0
a b
m
j
h
k
3
41
0 0 2 0 0
c d e f g
12
0
35
n’
2
1
0
4
CollapsedCritical region
New delay = 5
![Page 6: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/6.jpg)
6
Generalized bypass transform (GBX)Generalized bypass transform (GBX)
• Make critical path false– Speed up the circuit
• Bypass logic of critical path(s)
fm=f fm+1 fn=g…
fm =f fm+1 fn=g… 0
1
g’
dg__df
Booleandifference
s-a-0 redundant
McGeer’91:
![Page 7: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/7.jpg)
7
GBX and KMS transformGBX and KMS transform
GBX gives little area increase, BUT creates an untestable fault
(on control input to multiplexer)
KMS transform: (remove false paths without increasing delay)
• fk is last node on false path that fans out.
• Duplicate false path {f1,…, fk} -> {f’1, … , f’k}
• f’j fans out to every fanout of fj except fj+1, and fj just fans out to fj+1
• Set f0 input to f1 to controlling value and propagate constant (can do because path is false and does not fanout)
KMS results
– Function of every node, except f1, … ,fk is unchanged
– Added k nodes– Area added in linear in size of length of false paths; in practice small area increase.
![Page 8: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/8.jpg)
8
KMSKMS
fm+1 fm+2 fn…fm+k fm+k+1
f’m+1 f’m+2 f’m+k
fm+1 fm+2 fn…fm+k fm+k+10
…Delay is notincreased
Keutzer, Malik, Saldanha’90:
![Page 9: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/9.jpg)
9
Generalized select transform (GST)Generalized select transform (GST)
Berman’90: Late signal feeds multiplexor
c d e f g
a
b
out
c d e f g
b
c d e f g
b
a=0
a=1
out0
1
a
![Page 10: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/10.jpg)
10
GST vs GBXGST vs GBX
… 0
1g’
dh__da
a
a
b
cg
h
c d e f gb
c d e f gb
a=0
a=1
out0
1
a
GST
c d e f gb
c d e f gb
a=0
a=1
… 0
1g’
a cg
b
GBX
a
h
Boolean
diffe
Note:
rence =
a a a
hh h
GBX
![Page 11: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/11.jpg)
11
GST vs GBXGST vs GBX
• Select transform appears to be more area efficient• But Boolean difference generally more efficiently formed in
practice• No delay/speedup advantage for either transform• Can reuse parts of the critical paths for multiple fanouts on GST
c d e f gb
c d e f gb
a=0
a=1
out10
1
a
GST out20
1
a
![Page 12: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/12.jpg)
12
Technology Independent Delay ReductionsTechnology Independent Delay Reductions
Generally THR, GBX, GST (critical path based methods) work OK, – but very greedy and computationally expensive
Why are technology independent delay reductions hard?
Lack of fast and accurate delay models– # levels, fast but crude– # levels + correction term (fanout, wires,… ): a little better,
but still crude (what coefficients to use?)– Technology mapped: reasonable, but very slow– Place and route: better but extremely slow– Silicon: best, but infeasibly slow (except for FPGAs)
better
slower
![Page 13: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/13.jpg)
13
Clustering/partial-collapseClustering/partial-collapse
Traditional critical-path based methods require– Well defined critical path– Good delay/slack information
Problems:– Good delay information comes from mapper and layout– Delay estimates and models are weak
Possible solutions:– Better delay modeling at technology independent level– Make speedup, insensitive to actual critical paths and
mapped delays
![Page 14: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/14.jpg)
14
Clustering/Partial CollapseClustering/Partial Collapse
Two-level circuits are fast– Collapse circuit to 2-level - but
• Huge area penalty• Huge capacitive loading on inputs (can be much slower)
To avoid huge area penalty– Identify clusters of nodes
• Each cluster has some fixed size– Perform collapse of each cluster– Simplify each node
Details– How to choose the clusters?– How to choose cluster size?– How to simplify each node?
![Page 15: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/15.jpg)
15
Lawler’s Clustering AlgorithmLawler’s Clustering Algorithm
• Optimal in delay:– For a given clustering size
• May duplicate nodes (hence possible area penalty)– Not optimal w.r.t duplication– Use a heuristic
• Fast: O(m x k)– m = number of edges in network– k = maximum cluster size
![Page 16: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/16.jpg)
16
Clustering Algorithm - OverviewClustering Algorithm - Overview
• Label phase: (k is cluster size)– If node u is an input, label(u) := L := 0
• Else L := max label of fanin of u– If (# nodes in TFI(u) with (label = L) >= k)
label(u) := L+1• Cluster phase: (outputs to inputs)
– If node u is an output, L := infinity• Else L := max label of fanouts of u
– If (label(u) < L) then create a new cluster with “root” u and with members all the nodes in TFI(u) with label = label(u)
• Collapse phase:– Collapse all nodes in a cluster into a single node– Note: a node may be in several clusters (causes area increase
![Page 17: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/17.jpg)
17
Example of ClusteringExample of Clustering
0
0
0
0 0
0
1
1
1
1
2
01
12
0
0
Result: Lawler’s algorithmgives minimum depth circuit
Typically, 1. we decompose initial circuit
into 2-input NANDs and invertors.
2. then cluster size k reflects # 2-input NANDs to be collapsed together.
k = 3
![Page 18: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/18.jpg)
18
Choosing Choosing kk
• I(k): number of levels, given k
• d(k): duplication ratio
– Number of gates in cluster network divided by number of gates in original network
• Determine k0 where k0/d(k0)~2.0
• For every k from 2 to k0, compute d(k), I(k)
– Use exhaustive enumeration: label and cluster (without collapse) for each k.
– Each iteration is O(|E|k)
• Choose k such that
– I(k) is minimized
• Break ties using d(k)
– Minimize d(k)
d(k)
I(k)
1 2 k0
![Page 19: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/19.jpg)
19
Area RecoveryArea Recovery
Area increase is due to node duplication – this occurs when node is in multiple clusters
Two solutions:– Break clusters into smaller pieces off critical path– After cluster and collapse, recover area
![Page 20: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/20.jpg)
20
Relabeling Procedure:Relabeling Procedure:
Attempt to increase node labels without exceeding cluster size
In reverse topological order
Start : assign
Increase label(u) if• new-label(u) <= label(v) for each fanout v and• new-label(u) = new-label(v) for each fanout v only if
label(u) = label(v) before relabeling, and• no cluster size is violated
- ( ) max ( )i jj PO
new label O label O
![Page 21: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/21.jpg)
21
Relabeling ExampleRelabeling Example
00
00
00
00 00
00
11
11
11
22
22
00
00
00
00 00
00
11
11
11
11
22
before
after
![Page 22: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/22.jpg)
22
Post-Collapse Area RecoveryPost-Collapse Area Recovery
• Algebraic factorization, but– Undo factorization if depth increases
• Optimization using DC’s• Redundancy removal
![Page 23: 1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ee75503460f94bf800d/html5/thumbnails/23.jpg)
23
Conclusions Conclusions
• Variety of methods for delay optimization– No single technique dominates – When applied to ripple-carry adder get– Carry-lookahead adder (THR)– Carry-bypass adder (GBX)– Carry-select adder (GST)– Clustering/Partial collapse
• All techniques ignore false paths when assessing the delay and critical regions– Can use KMS transform to eliminate false paths without
increasing delay (area increase however).