Outline - University of California, Los Angelescadlab.cs.ucla.edu/~cong/CS258F/chap3_old.pdf · A...
Transcript of Outline - University of California, Los Angelescadlab.cs.ucla.edu/~cong/CS258F/chap3_old.pdf · A...
-
Prof. J. Cong 1
Jason Cong 1
Jason Cong 2
Outline
a Circuit Partitioning formulation
a Importance of Circuit Partitioning
a Partitioning Algorithms
a Circuit Clustering Formulation
a Clustering Algorithms
-
Prof. J. Cong 2
Jason Cong 3
Circuit Partitioning Formulation
Bi-partitioning formulation:Minimize interconnections between partitions
a Minimum cut: min c(x, x’)
a minimum bisection: min c(x, x’) with |x|= |x ’|
a minimum ratio-cut: min c(x, x’) / |x||x’|
X X’
c(X,X’)
Jason Cong 4
A Bi-Partitioning Example
Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19
a
b
c e
d f
mini-ratio-cut min-bisection
min-cut 9
10
100100 100
100100
100
4
Ratio-cut helps to identify natural clusters
-
Prof. J. Cong 3
Jason Cong 5
Circuit Partitioning Formulation (Cont’d)
General multi-way partitioning formulation:
Partitioning a network N into N1, N2, …, Nk such that
a Each partition has an area constraint
a each partition has an I/O constraint
Minimize the total interconnection:
∑∈
≤iNv
iAva )(
iii INNNc ≤− ),(
),( iN
i NNNci
−∑
Jason Cong 6
Importance of Circuit Partitioninga Divide-and- conquer methodology
The most effective way to solve problems of high complexity
E.g.: min-cut based placement, partitioning-based test generation,…
a System-level partitioning for multi -chip designs
inter-chip interconnection delay dominates system performance.
a Circuit emulation/parallel simulation
partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special -purpose processors (e.g. Zycad).
a Parallel CAD development
Task decomposition and load balancing
a In deep-submicron designs, partitioning defines local and global interconnect, and has significant impact on circuit performance
…… ……
-
Prof. J. Cong 4
Jason Cong 7
Partitioning Algorithms
a Iterative partitioning algorithms
a Spectral based partitioning algorithms
a Net partitioning vs. module partitioning
a Multi-way partitioning
a Multi-level partitioning (to be discussed after clustering)
a Further study in partitioning techniques
Jason Cong 8
Iterative Partitioning Algorithmsa Greedy Iterative improvement method
[Kernighan-Lin 1970]
[Fiduccia-Mattheyses 1982]
[krishnamurthy 1984]
a Simulated Annealing
[Kirkpartrick-Gelatt-Vecchi 1983]
[Greene-Supowit 1984]
(See Appendix, SA will be formally introduced in the Floorplan chapter)
-
Prof. J. Cong 5
Jason Cong 9
Kernighan-Lin’s Algorithma Pair-wise exchange of nodes to reduce cut sizea Allow cut size to increase temporarily within a pass
Compute the gain of a swapRepeat
Perform a feasible swap of max gainMark swapped nodes “locked”;Update swap gains;
Until no feasible swap; Find max prefix partial sum in gain sequence g1, g2, …, gmMake corresponding swaps permanent.
a Start another pass if current pass reduces the cut size(usually converge after a few passes)
u • v •
v • u •
locked
Jason Cong 10
Fiduccia-Mattheyses’ Improvement
a Each pass in KL-algorithm takes O(n3) or O(n2 logn) time (n: #modules)Choosing swap with max gain and updating swap gains take O(n2) time
a FM-algorithm takes O(p) time per pass( p: #pins)
a Key ideas in FM-algorithms! Each move affects only a few moves
⇒ constant time gain updating per move(amortized)! Maintain a list of gain buckets
⇒ constant time selection of the move with max gain
a Further improvement by KrishnamurthyLook-ahead in gain computation
•
•
u1
V1 V2
u2
gmax
-gmax
-
Prof. J. Cong 6
Jason Cong 11
Partitioning Algorithms
a Iterative partitioning algorithms
a Spectral based partitioning algorithms
a Net partitioning vs. module partitioning
a Multi-way partitioning
a Further study in partitioning techniques
Jason Cong 12
Spectral Based Partitioning Algorithms
a
b
c
d
13
4
3
0
3
4
3
343
000
001
010
d
c
b
a
A
dcba
=
10
0
0
0
000
300
050
004
d
c
b
a
D
dcba
=
D: degree matrix; A: adjacency matrix; D-A: Laplacian matrix
Eigenvectors of D-A form the Laplacian spectrum of G
-
Prof. J. Cong 7
Jason Cong 13
Some Applications of Laplacian Spectrum
a Placement and floorplan[Hall 1970][Otten 1982][Frankle-Karp 1986][Tsay-Kuh 1986]
a Bisection lower bound and computation[Donath-Hoffman 1973][Barnes 1982][Boppana 1987]
a Ratio-cut lower bound and computation[Hagen-Kahng 1991][Cong-Hagen-Kahng 1992]
Jason Cong 14
Eigenvalues and Eigenvectors
If Ax=λx
then λ is an eigenvalue of A
x is an eignevector of A w.r.t. λ
(note that Kx is also a eigenvector, for any constant K).
+++
+++
=
nnnnn
nn
n
n
xaxaxa
xaxaxa
x
xaaa
L
M
L
ML
2211
12121111
nnnn aaa L21
11211
xAxA
...
-
Prof. J. Cong 8
Jason Cong 15
A Basic Property
( )
∑
∑∑
=
=
=
==
jiijji
n
n
iini
n
iii
nnnn
n
n
axx
x
xaxax
x
x
aa
aaxx
,
1
1,
1,1
1
1
111
1 ,,
ML
M
L
M
L
LxTAx
Jason Cong 16
Basic Idea of Laplacian Spectrum Based Graph Partitioning
a Given a bisection ( X, X’ ), define a partitioning vector
clearly, x ⊥ 1, x ≠ 0 ( 1 = (1, 1, …, 1 ), 0 =(0, 0, …, 0))
a For a partition vector x:
a Let S = {x | x ⊥ 1 and x ≠ 0}. Finding best partition vector x such that the total edge cut C(X,X’) is minimum is relaxed to finding x in S such that is minimum
a Linear placement interpretation:
minimize total squared wirelength
'11
.s.t),,,( 21 XiXi
xxxxx in ∈∈
−
== L
xxEji
ji )(),(
2−∑∈
….
x1 x2 x3 xn
xxEji
ji )(),(
2−∑∈
= 4 * C(X, X’)
-
Prof. J. Cong 9
Jason Cong 17
Property of Laplacian Matrix
∑
∑∑ ∑
∑
∑
∈
∈
−=
−=
−=
−=
=
Ejiji
Ejijiii
jiijii
TT
jiijT
xx
xaij xxd
xxAxd
AxxDxx
xxQQxx
),(
2
),(
2
2
)(
2
( 1 )
squared wirelength
= 4 * C(X, X’)
Therefore, we want to minimizeT Qxx
If x is a partition vector
….
x1 x2 x3 xn
Jason Cong 18
( 2 ) Q is symmetric and semi-definite, i.e.
(i)
(ii) all eigenvalues of Q are ≥0
( 3 ) The smallest eigenvalue of Q is 0
corresponding eigenvector of Q is x0= (1, 1, …, 1 )
(not interesting, all modules overlap and x0∉S )
( 4 ) According to Courant-Fischer minimax principle:
the 2nd smallest eigenvalue satisfies:
Property of Laplacian Matrix (Cont’d)
0≥= ∑ jiijT xxQQxx
2||min
xQxxT=λ
x in S
-
Prof. J. Cong 10
Jason Cong 19
Results on Spectral Based Graph Partitioning
a Min bisection cost c(x, x’) ≥ n⋅λ/4
a Min ratio-cut cost c(x,x’)/|x|⋅|x’| ≥ λ/n (homework)
a The second smallest eigenvalue gives the best linear placement
a Compute the best bisection or ratio-cut
based on the second smallest eigenvector
Jason Cong 20
Computing the Eigenvector
a Only need one eigenvector
(the second smallest one)
a Q is symmetric and sparse
a Use block Lanczos Algorithm
-
Prof. J. Cong 11
Jason Cong 21
Interpreting the Eigenvector
a Linear Ordering of modules
a Try all splitting points, choose the best one
using bisection of ratio-cut metric
23 10 22 34 6 21 8
23 10 22 34 6 21 8
Jason Cong 22
Experiments on Special Graphsa GBUI(2n, d, b) [Bui-Chaudhurl-Leighton 1987]
2n nodesd-regularmin-bisection≈b
a GGAR (m, n, P int, Pext ) [Garbe-Promel-Steger 1990]
m clusters
n nodes each cluster
Pint: intra-cluster edge probability
Pext: Inter-cluster edge probability
n
n
n
n
n
n
-
Prof. J. Cong 12
Jason Cong 23
GBUI Results
GBUI Results
Valu
e of X
i
Cluster 2
Cluster 1
Module i
Jason Cong 24
GGAR Results
GGAR Results
Valu
e of X
i
Cluster 4
Cluster 1
Module i
Cluster 2
Cluster 3
-
Prof. J. Cong 13
Jason Cong 25
Partitioning Algorithms
a Iterative partitioning algorithms
a Spectral based partitioning algorithms
a Net partitioning vs. module partitioning
a Multi-way partitioning
a Further study in partitioning techniques
Jason Cong 26
Transform Netlists to Graph
a Transform each r-pin net into a r-clique
a Weight each edge in the r-clique1/r-1 ( simple scaling) or2/r (edge probability) or4/r2 (bound each net contribution in the cut size)
a Transforming multi-pin nets to cliques destroy the sparsity of D-A
-
Prof. J. Cong 14
Jason Cong 27
Net Partitioning vs. Module Partitioninga Dual netlist representation-- intersection graph
Node: NetEdge: if two nets share a common gate
a Intersection graph also captures circuit connectivity, but does not use hyperedges
o
o
o
o
o
o
o
i
G1
G7
G6
G5
G4
G3
G2
i
b
g
d
ca j
f
eh
Netlist N Intersection graph I( N )
i
b
c
f
j
hd
a e
Jason Cong 28
Transform Net Partition to Module Partition
o
o
o
o
o
o
o
i
G1
G7
G6
G5
G4
G3
G2
i
bg
d
ca j
f
eh
⊆X’⊆X
N
Every edge in B(Y, Y’) indicates a conflict:
G3 is wanted by both c and g
G4 is wanted by both d and e
G5 is wanted by both d and f
ig
c
i
b
c
f
j
hd
a
I ( N )
e
fd
e
B (Y, Y’)
Y Y’
-
Prof. J. Cong 15
Jason Cong 29
Some Definitionsa Maximum Independent Set(MIS)
Maximum subset of vertices with no two connected
a Minimum Vertex Cover(MVC)Minimum subset of vertices which “covers” all edges
MIS MVC
Jason Cong 30
Resolving Net Conflict When Transforming A Net Partition to A Module Partition
a Some nets in B(Y, Y’) will lose
Each losing net corresponds to a net-cut in module partition
a Objective: maximize the # winning nets, or equivalently, minimize the # losing nets
a Theorem: Set of winning nets forms an independent set in B(Y, Y’), and the set of losers form a vertex cover
⇒Find max independent set in B(Y, Y’)⇒Solved optimally in polynomial time for bipartite graph
-
Prof. J. Cong 16
Jason Cong 31
Bipartite Graph Propertiesa |MIS|+|MVC|=|V|
a size of any vertex cover is no smaller than max. matching
a |MVC|=|MM| (MM: maximum matching)
⇒ Goal: Find an MVC of size |MM|, or equivalently, an MIS of size |V|-|MM|
MIS = Winners
MVC =Losers
Jason Cong 32
Finding an MISL
R
L
R
Maximum Matching (MM)
UL
UR
-
Prof. J. Cong 17
Jason Cong 33
Finding an MIS
Alternating Paths
L
R
UL
UR
Jason Cong 34
Finding an MIS
Odd, Even Sets
L
R
UL
UR
Odd(R)
Odd(L)Even ( R )
Even(L)
-
Prof. J. Cong 18
Jason Cong 35
Improving the Module PartitionNet1= {a, b}
Net3= {a, e}
Net2= {b, c}
Net6= {g, e}
Net5= {e, f}
Net4= {b, d}
Even(L)={Net1, Net2}
ModSL={a,b,c}
Even(R)={Net5, Net6}
ModSR={e,f,g}
ModSL+{d} cuts 1set
ModSR+{d} cuts 2 nets
Jason Cong 36
Algorithm IG-Match
a Get spectral linear ordering
a Construct bipartite graph B
a Find MIS of B
a Convert MIS into module partition
Theorem: IG-Match computes all |V|-1 module partitions in O(|V|(|V|+|E|)) time
Key idea: incremental updating of the bipartite graph and alternating paths
-
Prof. J. Cong 19
Jason Cong 37
Experimental ResultsWei-Cheng
Areas/ CiusozeIG-Match
Areas/CutsizeImprovement
bm1 9: 873 / 1 21:861 / 1 57%
19ks 1011:1833 / 109 650:2194/85 -1%
Prim1 152:681 / 14 154: 679/ 14 1%
Prim2 1132: 1882/123 740: 2274/ 77 21%
Test02 372:1291/95 211:1452/38 37%
Test03 147: 1460/31 803: 804/ 58 38%
Test04 401:1114/51 73: 1442 / 6 50%
Test05 1204:1391/110 105: 1442 / 6 53%
Test06 145: 1607/18 141: 1611/ 17 3%
a 28.8% average improvement
Jason Cong 38
Further Study in Partitioning Techniques
a Module replication in circuit partitioning [Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]
a Generating uni-directional partitioning [Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]
a Logic restructuring during partitioning[Iman-Pedram-Fabian-Cong 1993]
a Communication based partitioning[Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]
-
Prof. J. Cong 20
Jason Cong 39
Multi-Way Partitioning
a Recursive bi-partitioning[Kernighan-Lin 1970]
a Generalization of Fuduccia-Mattheyse’sand Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]
a Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993]a generalized ratio-cut value=sum of flux of each partitiona generalized ratio-cut cost of a k-way partition
≥sum of the k smallest eigenvalue of the Laplacian Matrix
Jason Cong 40
Circuit Clustering Formulation
Motivation:
a Reduced the size of flat netlistsa Identify natural circuit hierarchy
Objectives:
a Maximize the connectivity of each clustera Minimize the size, delay (or simply depth),
and density of clustered circuits
-
Prof. J. Cong 21
Jason Cong 41
Measuring Cluster Connectivity
a Simple density measurementm/n -- include a new node when it
connects to more than m/n edges
nm/( ) -- includes a new node when it
connects to more than 2m/(n-1) edges -- biased for small clusters
a (K, I)-connectivity [Garber-Promel-Steger 1990]Every two nodes in the cluster are connected by at
least K edge-disjoint paths of length no more than IProper choice of k and I may not be easy
a D/S-metric [Cong-Hagen-Kahng 1991]D/S=Degree/SeparationDegree= average degree in the clusterSeparation=average shortest path between a pair of nodes in the cluster Robust but expensive to compute
uP1
P2
Pk
≤ l
v
2
Jason Cong 42
Clustering Algorithmsa (K, I)-connectivity algorithms [Garber-Promel-Steger 1990]
a Random-walk based algorithm [Cong-Hagen-Kahng 1991; Hagen-Kahng 1992]
a Multicommodity-Flow based algorithm [Yeh-Cheng-Lin 1992]
a Clique based algorithm [Bui 1989; Cong-Smith 1993]
a Clustering for delay minimization (combinational circuits) [Lawler-Levitt-Turner 1969; Murgai-Brayton-Sanjiovanni 1991; Rajaraman-Wong 1993]
a Clustering for delay minimization (with consideration of retiming for sequential circuits) [Pan et al, TCAD’99; Cong et al, DAC’99]
a Signal flow based clustering [Cong-Ding, DAC’93; Cong et al ICCAD’97]
a Multi-level clustering [Karypis-Kumar, TR95/DAC97; Cong-Sung, ASPDAC’2000]
-
Prof. J. Cong 22
Jason Cong 43
Lawler’s Labeling Algorithm[Lawler-Levitt-Turner 1969]
a Assumption: Cluster size≤ K; Intra-cluster delay = 0; Inter-cluster delay =1
a Objective: Find a clustering of minimum delay
a Algorithm:a Phase 1: Label all nodes in topological ordor
For each PI node V, L(v)= 0;For each non-PI node v
p=Maximum label of predecessors of vXp = set of predecessors of v with label pif |Xp|
-
Prof. J. Cong 23
Jason Cong 45
Signal Flow and Logic Dependency• Beyond Connectivity
– identifies logically dependant cells
– useful in guiding• clustering • partitioning [CoLB94, CoLL+97]• placement [CoXu95]
Jason Cong 46
Maximum Fanout Free Cone (MFFC)• Definition: for a node v in a combinational circuit,
– cone of v ( ) : v and all of its predecessors such that any path connecting a node in and v lies entirely in
– fanout free cone at v ( ) : cone of v such that for any node
– maximum FFC at v ( ) : FFC of v such that for any non-PI node w,
vCvC vC
vFFC
vv FFCuoutputFFCvu ⊆≠ )( , in
vMFFCvv MFFCwMFFCwoutput ∈⊆ then ,)( if
-
Prof. J. Cong 24
Jason Cong 47
Properties of MFFCs• If
• Two MFFCs are either disjoint or one contains another [CoDi93]• Area-optimal duplication-free LUTs map each MFFC independently
to get an optimal solution [CoDi93]• For acyclic bipartitioning without area constraint, there exists an
optimal cut which does not cut through any MFFCs [CoLB94]
[CoDi93] then , vwv MFFCMFFCMFFCw ⊆∈
Jason Cong 48
Maximum Fanout Free Subgraph (MFFS)• Definition : for a node v in a sequential circuit,
• Illustration
} through passes PO some to from path every|{ vuuFFSv =} , allfor |{ vvv FFSuFFSuMFFS ∈=
MFFCs ??? MFFS
-
Prof. J. Cong 25
Jason Cong 49
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
Jason Cong 50
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs– MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
-
Prof. J. Cong 26
Jason Cong 51
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
Jason Cong 52
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
-
Prof. J. Cong 27
Jason Cong 53
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
v
Jason Cong 54
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
-
Prof. J. Cong 28
Jason Cong 55
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
v
Jason Cong 56
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
v
-
Prof. J. Cong 29
Jason Cong 57
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
Jason Cong 58
Speedup of MFFS Clustering• Approximation of MFFS
– cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)
• defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)
h
v
SC(v, h)
circuit
-
Prof. J. Cong 30
Jason Cong 59
Speedup of MFFS Clustering• Approximation of MFFS
– cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)
• defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)
h
v
pseudo PIs
pseudo POs
SC(v, h)
circuit
Jason Cong 60
LSR/MFFS Partitioning Algorithm
• Basic Approach : Clustering Based Partitioning– MFFS clustering : exploit logic dependency– LSR partitioning : focus on loose and stable net
removal• Overview of the Algorithm
– cluster netlist using MFFS algorithm– partition clustered netlist with LSR algorithm– completely decompose netlist– refine flat netlist with LSR algorithm
-
Prof. J. Cong 31
Jason Cong 61
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
Jason Cong 62
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
-
Prof. J. Cong 32
Jason Cong 63
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
Jason Cong 64
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
-
Prof. J. Cong 33
Jason Cong 65
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
Jason Cong 66
LR Implementation• Basic Approach
– increase gain of free cells of loose nets outside anchor
– focus on smaller nets
• Cell Gain Increase Function– accumulate : until complete removal– set threshold : array-based bucket can be used– expanded gain range : no tie-breaking scheme (LIFO, LA3)
needed
⋅=∑∑
∈
∈
n
n
Fc
Lc
cdeg
cdeg
nsizemax_net
nincr)(
)(
)()(
-
Prof. J. Cong 34
Jason Cong 67
Stable Net Transition (SNT)• Stable net [Shibuya et al. 1995]
– remain cut throughout entire run– more than 80% of cut nets are stable
• Implementation– detect stable nets by comparing cutsets– move all unprocessed cells of a stable net into single block– serve as initial partition for next run
• fast convergence• efficiently explore more local minima
• Loose and Stable net Removal (LSR)– combination of two effective net removal schemes
• LR : small loose nets• SNT : large stable nets
Jason Cong 68
Summary of LSR/MFFS Algorithm
Cluster netlist with MFFS
Partition netlist with LSR
Gain?
Decluster netlist
Refine netlist with LSR
Gain?yes yes
nono
START
END
-
Prof. J. Cong 35
Jason Cong 69
Clustering Result• MCNC benchmarks with signal flow information
NAME SIZE AVR # CLST TIME # CLST TIMEs1423 619 1.0 193 0.9 168 1.9sioo 664 4.6 442 2.3 442 2.6
s1488 686 1.0 187 0.9 187 1.4balu 801 12.9 287 2.0 284 3.8
prim1 833 6.7 394 3.2 379 5.9struct 1952 2.5 1183 17.9 1183 12.1prim2 3014 6.7 1404 34.8 1243 22.1s9234 5866 1.0 3203 36.7 3177 9.7biomed 6514 4.5 2142 87.8 1853 30.5s13207 8772 1.0 2132 73.9 1994 14.1s15850 10470 1.0 2279 84.9 2137 17.6s35932 18148 1.0 5562 420.8 2943 32.1s38584 20995 1.0 5139 565.1 4242 44.5avq.sm 21918 4.5 8477 1287.3 8309 116.1s38417 23949 1.0 5906 452.2 5295 45.1avq.lg 25178 4.5 9103 1473.2 8658 90.2
TOTAL 48033 4543.9 42494 449.7
Optimal Heuristic
min_areamax_area
AVR =
Jason Cong 70
Cutsize Reduction by MFFS Clustering
• Bipartitioning under 45-55% Balance Criterion
500 700 900 1100 1300 1500 1700 1900
LSR
LR
SNT
FM
FLATMFFS
0.0%-25.8%
-8.0%-31.7%
-45.6%--48.4%
-46.9%-49.9%
-
Prof. J. Cong 36
Jason Cong 71
Cutsize Reduction by MFFS Clustering
NAME FM SNT LR LSR FM SNT LR LSRs1423 12 14 13 13 12 12 12 12sioo 25 25 25 25 25 25 25 25
s1488 48 45 44 42 42 42 43 43balu 27 27 27 27 27 27 27 27prim1 43 42 42 42 43 42 42 43struct 42 45 33 33 39 52 33 33prim2 174 167 121 122 133 122 131 119s9234 49 54 41 43 42 43 41 40biomed 83 83 83 83 100 84 84 84s13207 90 74 75 70 91 65 73 61s15850 113 73 73 65 72 70 44 43s35932 100 128 45 44 48 76 45 44s38584 52 52 49 47 52 52 47 47avq.sm 321 378 135 139 273 172 127 127s38417 176 112 53 55 119 147 51 50avq.lg 491 380 145 131 251 168 127 127
TOTAL 1846 1699 1004 981 1369 1261 952 925% IMPR 0.0 8.0 45.6 46.9 25.8 31.7 48.4 49.9
TIME 121.4 59.4 162.3 138.9 84.2 61.3 113.6 89.4
No Clustering MFFS Clustering
Jason Cong 72
Cutsize Comparison with Other Methods
800
900
1000
1100
1200
CD
IP
Pro
pF
Stra
w
hMet
is
ML
c
L/M
Pan
za
Par
ab
FB
B
IIPsNon-IIPs
-
Prof. J. Cong 37
Jason Cong 73
Details on Cutsize Comparison
NAME CDIP PROPf Straw hMetis MLc L/M PANZA Parab FBBs1423 12 15 16 13sioo 25 25 45
s1488 43 44 50balu 27 27 27 27 27 27 27 41
prim1 52 51 49 50 47 43 struct 36 33 33 33 33 33 33 40prim2 152 152 143 145 139 119 s9234 44 42 42 40 40 40 40 74 70biomed 83 84 83 83 83 84 83 135s13207 70 71 57 55 55 61 66 91 74s15850 67 56 44 42 44 43 44 91 67s35932 73 42 47 42 41 44 43 62 49s38584 47 51 49 47 47 47 47 55 47avq.sm 148 144 131 130 128 127 s38417 79 65 53 51 49 50 49 49 58avq.lg 145 143 140 127 128 127
TOTAL1 1023 961 898 872 861 845 TOTAL2 509 516 749% IMPR 17.4 12.1 5.9 3.1 1.9 1.4 32.0 21.4
TIME 5817 5611 12577 3455 1338 24619
Iterative Improvement Partitioners (IIPs) non-IIPs
Jason Cong 74
Summary
a Partitioning is very important for interconnect optimization
a Increasing emphasis on interconnect design has introduced many new partitioning formulations
a clustering is effective in reducing circuit size and identifying natural circuit hierarchy
-
Prof. J. Cong 38
Jason Cong 75
Appendix: Simulated Annealing Based Partitioning
a Basic approacha Solution space: all partitionsa Neighboring solution: move a module from one side to another
a Cost function: cost = c(X,X’)+λ(|X|-|X’|)2
a Temperature schedule: standard cooling schedule
a Simulated annealing without rejectiona Generate a move with probability proportional to its gain
a All gains can be updated efficiently after a move( unique for partitioning!)
a Substantial run-time reduction at low temperature
a Use conventional SA at high temperature and rejectless SA at low temperatures
Jason Cong 76
Simulated Annealing
a Local Search
cost
fun
ctio
n
solution space
o
o
oo
o
oo
o?
-
Prof. J. Cong 39
Jason Cong 77
Statistical Mechanics
Combinational Optimization
State { r: } (configuration - a set of atomic position)
Weight
-Boltzmann distribution
E({r:}) energy of configuration
KB: Boltzmann constant; T: temperature.
Low Temperature Limit??
TKrEbe /:})({
−
Jason Cong 78
Analogy
Physical System
State(configuration)
Energy
Ground State
Rapid Quenching
Careful Annealing
Optimization Problem
(Solution)
Cost function
Optimal solution
Iteration Improvement
Simulated Annealing
-
Prof. J. Cong 40
Jason Cong 79
Generic Simulated Annealing Algorithm
1. Get an initial solution S
2. Get an initial temperature T>0
3. While not yet “frozen” do the following:3.1 For 1≤ i ≤ L, do the following:
3.1.1 Pick a random neighbor S’ of S.3.1.2 Let ∆=cost( s’ )-cost(s)3.1.3 If ∆ ≤ ( 0 ) (downhill move),
Set S=S’3.1.4 If ∆ > 0 (uphill move)
set S=S’ with probability , 3.2 Set T= rT (reduce temperature)
4. Return S
Te /∆−
Jason Cong 80
Basic Ingredients for S.A.
a Solution space
a Neighborhood Structure
a Cost Function
a Annealing Schedule
-
Prof. J. Cong 41
Jason Cong 81
SA Partitioning“ Optimization by simulation Annealing” -Kirkpatrick, Gaett, Vecchi.
a Solution space=set of all partitions
a Neighborhood Structure
abc
def
ab
def
af
bcde
abc
a solution a solution a solution
def
bcde
ac
a move
Randomly move one cell to the other side
Jason Cong 82
SA Partitioninga Cost function:
f=C+λBa C is the partitioning cost as used before
a B is a measure of how balance the partitioning is
a λ is a constant.
Example of B:
ab...
cd...S2S1
B = ( |S1| - |S2| )2
-
Prof. J. Cong 42
Jason Cong 83
SA Partitioninga Annealing schedule:
a Tn=(T1/T0)nT0 Ratio T1/T0=0.9
a At each temperature, either
1. There are 10 accepted moves on the average;
or
2. # of attempts≥100× total # of cells
a The system is “frozen” if very low acceptances at 3 consecutive temp.
Jason Cong 84
Graph Partition UsingSimulated Annealing Without Rejections
a Greene and Supowit, ICCD-88 pp. 658-663
a Motivation:
At low temperature, most moves are rejected!
e.g. 1/100 acceptance rate for 1,000 vertices
-
Prof. J. Cong 43
Jason Cong 85
a Key Idea(I) Biased selection
If a move i has probability ω i to be accepted, generate move i with probability
N: size of neighborhood
In general,
In conventional model, each move has probability 1/N to be generated.
(II) If a move is generated, it is always be accepted
Graph Partition Using Simulated Annealing Without Rejections (Cont’d)
∑=
N
Jj
i
1
ω
ω
}.,1min{ /Ti ie∆ −=ω
Jason Cong 86
Graph Partition Using Simulated Annealing Without Rejections (Cont’d)
a Main Difficulty
( 1 ) ωi is dynamic ( since ∆ i is dynamic )
It is too expensive to update ωi’s (∆ i’s) after every move
( 2 ) Weighted selection problem
how to select move i with probability
∑=
N
jji
1
ωω ??
-
Prof. J. Cong 44
Jason Cong 87
Solution to the Weight Selection Problem(general solution to the several problems)
ω1+ ω 2
ω1+ •••+ ω 7
ω7
ω5+ ω 6+ ω 7
ω5+ ω 6ω3+ ω 4
ω1+ •••+ ω 4
ω1 ω2 ω3 ω4 ω5 ω6 ω70
Jason Cong 88
Solution to the Weight Selection Problem (Cont’d)
Let W= ω1+ ω2+ ω3+ω4+ ω5+ ω6+ •••+ωn, how to select i with probability ωi /W ?
Equivalent to choosing x such that ω1+ •••+ωi-1< x ≤ ω2+ •••+ωnv← rootx ← random( 0, 1 )* ω(v)while v is not a leaf do
if x < ω(left (v)) then v ← left(v)else x ← x-ω(left(v)), v ← right (v)
endProbability of ending up at leaf:
∑ ∑
∑−
= =
=
≤
-
Prof. J. Cong 45
Jason Cong 89
Application to PartitioningSpecial solution to the first problem
Given a partition (A, B)
Cost F(A,B)=Fc(A,B)+FI(A,B)Fc(A,B) = net-cut between A,BFI(A,B) = C(|A|2+|B|2) (min when |A|=|B|=n/2)
for move i, ∆ i=F(A’,B’)-F(A,B)
After a move
),()','(
),()','(
BAFBAF
BAFBAFcII
i
ccci
−=∆
−=∆
changesAll
changes.fewaIi
Ci
∆
∆
Jason Cong 90
Solution:
Two-step biased selection:
(i) choose A or B based on
(ii) choose move i within A or B based
Note, ’s are the same for each in A or B.
So we keep one copy of for A
one copy of for B
choose the moves within A or B using the tree algorithm
Application to PartitioningSpecial solution to the first problem(Cont’d)
)(-> TIi
Ii ∆ ∆ α
)(TCi
Ci ∆ ∆ −>α
)(TIi∆ α )(T Ci∆ α •Pi=
Ii∆
Ii∆
Ii∆