Outline - University of California, Los Angelescadlab.cs.ucla.edu/~cong/CS258F/chap3_old.pdf · A...

45
Prof. J. Cong 1 Jason Cong 1 Jason Cong 2 Outline a Circuit Partitioning formulation a Importance of Circuit Partitioning a Partitioning Algorithms a Circuit Clustering Formulation a Clustering Algorithms

Transcript of Outline - University of California, Los Angelescadlab.cs.ucla.edu/~cong/CS258F/chap3_old.pdf · A...

  • Prof. J. Cong 1

    Jason Cong 1

    Jason Cong 2

    Outline

    a Circuit Partitioning formulation

    a Importance of Circuit Partitioning

    a Partitioning Algorithms

    a Circuit Clustering Formulation

    a Clustering Algorithms

  • Prof. J. Cong 2

    Jason Cong 3

    Circuit Partitioning Formulation

    Bi-partitioning formulation:Minimize interconnections between partitions

    a Minimum cut: min c(x, x’)

    a minimum bisection: min c(x, x’) with |x|= |x ’|

    a minimum ratio-cut: min c(x, x’) / |x||x’|

    X X’

    c(X,X’)

    Jason Cong 4

    A Bi-Partitioning Example

    Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19

    a

    b

    c e

    d f

    mini-ratio-cut min-bisection

    min-cut 9

    10

    100100 100

    100100

    100

    4

    Ratio-cut helps to identify natural clusters

  • Prof. J. Cong 3

    Jason Cong 5

    Circuit Partitioning Formulation (Cont’d)

    General multi-way partitioning formulation:

    Partitioning a network N into N1, N2, …, Nk such that

    a Each partition has an area constraint

    a each partition has an I/O constraint

    Minimize the total interconnection:

    ∑∈

    ≤iNv

    iAva )(

    iii INNNc ≤− ),(

    ),( iN

    i NNNci

    −∑

    Jason Cong 6

    Importance of Circuit Partitioninga Divide-and- conquer methodology

    The most effective way to solve problems of high complexity

    E.g.: min-cut based placement, partitioning-based test generation,…

    a System-level partitioning for multi -chip designs

    inter-chip interconnection delay dominates system performance.

    a Circuit emulation/parallel simulation

    partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special -purpose processors (e.g. Zycad).

    a Parallel CAD development

    Task decomposition and load balancing

    a In deep-submicron designs, partitioning defines local and global interconnect, and has significant impact on circuit performance

    …… ……

  • Prof. J. Cong 4

    Jason Cong 7

    Partitioning Algorithms

    a Iterative partitioning algorithms

    a Spectral based partitioning algorithms

    a Net partitioning vs. module partitioning

    a Multi-way partitioning

    a Multi-level partitioning (to be discussed after clustering)

    a Further study in partitioning techniques

    Jason Cong 8

    Iterative Partitioning Algorithmsa Greedy Iterative improvement method

    [Kernighan-Lin 1970]

    [Fiduccia-Mattheyses 1982]

    [krishnamurthy 1984]

    a Simulated Annealing

    [Kirkpartrick-Gelatt-Vecchi 1983]

    [Greene-Supowit 1984]

    (See Appendix, SA will be formally introduced in the Floorplan chapter)

  • Prof. J. Cong 5

    Jason Cong 9

    Kernighan-Lin’s Algorithma Pair-wise exchange of nodes to reduce cut sizea Allow cut size to increase temporarily within a pass

    Compute the gain of a swapRepeat

    Perform a feasible swap of max gainMark swapped nodes “locked”;Update swap gains;

    Until no feasible swap; Find max prefix partial sum in gain sequence g1, g2, …, gmMake corresponding swaps permanent.

    a Start another pass if current pass reduces the cut size(usually converge after a few passes)

    u • v •

    v • u •

    locked

    Jason Cong 10

    Fiduccia-Mattheyses’ Improvement

    a Each pass in KL-algorithm takes O(n3) or O(n2 logn) time (n: #modules)Choosing swap with max gain and updating swap gains take O(n2) time

    a FM-algorithm takes O(p) time per pass( p: #pins)

    a Key ideas in FM-algorithms! Each move affects only a few moves

    ⇒ constant time gain updating per move(amortized)! Maintain a list of gain buckets

    ⇒ constant time selection of the move with max gain

    a Further improvement by KrishnamurthyLook-ahead in gain computation

    u1

    V1 V2

    u2

    gmax

    -gmax

  • Prof. J. Cong 6

    Jason Cong 11

    Partitioning Algorithms

    a Iterative partitioning algorithms

    a Spectral based partitioning algorithms

    a Net partitioning vs. module partitioning

    a Multi-way partitioning

    a Further study in partitioning techniques

    Jason Cong 12

    Spectral Based Partitioning Algorithms

    a

    b

    c

    d

    13

    4

    3

    0

    3

    4

    3

    343

    000

    001

    010

    d

    c

    b

    a

    A

    dcba

    =

    10

    0

    0

    0

    000

    300

    050

    004

    d

    c

    b

    a

    D

    dcba

    =

    D: degree matrix; A: adjacency matrix; D-A: Laplacian matrix

    Eigenvectors of D-A form the Laplacian spectrum of G

  • Prof. J. Cong 7

    Jason Cong 13

    Some Applications of Laplacian Spectrum

    a Placement and floorplan[Hall 1970][Otten 1982][Frankle-Karp 1986][Tsay-Kuh 1986]

    a Bisection lower bound and computation[Donath-Hoffman 1973][Barnes 1982][Boppana 1987]

    a Ratio-cut lower bound and computation[Hagen-Kahng 1991][Cong-Hagen-Kahng 1992]

    Jason Cong 14

    Eigenvalues and Eigenvectors

    If Ax=λx

    then λ is an eigenvalue of A

    x is an eignevector of A w.r.t. λ

    (note that Kx is also a eigenvector, for any constant K).

    +++

    +++

    =

    nnnnn

    nn

    n

    n

    xaxaxa

    xaxaxa

    x

    xaaa

    L

    M

    L

    ML

    2211

    12121111

    nnnn aaa L21

    11211

    xAxA

    ...

  • Prof. J. Cong 8

    Jason Cong 15

    A Basic Property

    ( )

    ∑∑

    =

    =

    =

    ==

    jiijji

    n

    n

    iini

    n

    iii

    nnnn

    n

    n

    axx

    x

    xaxax

    x

    x

    aa

    aaxx

    ,

    1

    1,

    1,1

    1

    1

    111

    1 ,,

    ML

    M

    L

    M

    L

    LxTAx

    Jason Cong 16

    Basic Idea of Laplacian Spectrum Based Graph Partitioning

    a Given a bisection ( X, X’ ), define a partitioning vector

    clearly, x ⊥ 1, x ≠ 0 ( 1 = (1, 1, …, 1 ), 0 =(0, 0, …, 0))

    a For a partition vector x:

    a Let S = {x | x ⊥ 1 and x ≠ 0}. Finding best partition vector x such that the total edge cut C(X,X’) is minimum is relaxed to finding x in S such that is minimum

    a Linear placement interpretation:

    minimize total squared wirelength

    '11

    .s.t),,,( 21 XiXi

    xxxxx in ∈∈

    == L

    xxEji

    ji )(),(

    2−∑∈

    ….

    x1 x2 x3 xn

    xxEji

    ji )(),(

    2−∑∈

    = 4 * C(X, X’)

  • Prof. J. Cong 9

    Jason Cong 17

    Property of Laplacian Matrix

    ∑∑ ∑

    −=

    −=

    −=

    −=

    =

    Ejiji

    Ejijiii

    jiijii

    TT

    jiijT

    xx

    xaij xxd

    xxAxd

    AxxDxx

    xxQQxx

    ),(

    2

    ),(

    2

    2

    )(

    2

    ( 1 )

    squared wirelength

    = 4 * C(X, X’)

    Therefore, we want to minimizeT Qxx

    If x is a partition vector

    ….

    x1 x2 x3 xn

    Jason Cong 18

    ( 2 ) Q is symmetric and semi-definite, i.e.

    (i)

    (ii) all eigenvalues of Q are ≥0

    ( 3 ) The smallest eigenvalue of Q is 0

    corresponding eigenvector of Q is x0= (1, 1, …, 1 )

    (not interesting, all modules overlap and x0∉S )

    ( 4 ) According to Courant-Fischer minimax principle:

    the 2nd smallest eigenvalue satisfies:

    Property of Laplacian Matrix (Cont’d)

    0≥= ∑ jiijT xxQQxx

    2||min

    xQxxT=λ

    x in S

  • Prof. J. Cong 10

    Jason Cong 19

    Results on Spectral Based Graph Partitioning

    a Min bisection cost c(x, x’) ≥ n⋅λ/4

    a Min ratio-cut cost c(x,x’)/|x|⋅|x’| ≥ λ/n (homework)

    a The second smallest eigenvalue gives the best linear placement

    a Compute the best bisection or ratio-cut

    based on the second smallest eigenvector

    Jason Cong 20

    Computing the Eigenvector

    a Only need one eigenvector

    (the second smallest one)

    a Q is symmetric and sparse

    a Use block Lanczos Algorithm

  • Prof. J. Cong 11

    Jason Cong 21

    Interpreting the Eigenvector

    a Linear Ordering of modules

    a Try all splitting points, choose the best one

    using bisection of ratio-cut metric

    23 10 22 34 6 21 8

    23 10 22 34 6 21 8

    Jason Cong 22

    Experiments on Special Graphsa GBUI(2n, d, b) [Bui-Chaudhurl-Leighton 1987]

    2n nodesd-regularmin-bisection≈b

    a GGAR (m, n, P int, Pext ) [Garbe-Promel-Steger 1990]

    m clusters

    n nodes each cluster

    Pint: intra-cluster edge probability

    Pext: Inter-cluster edge probability

    n

    n

    n

    n

    n

    n

  • Prof. J. Cong 12

    Jason Cong 23

    GBUI Results

    GBUI Results

    Valu

    e of X

    i

    Cluster 2

    Cluster 1

    Module i

    Jason Cong 24

    GGAR Results

    GGAR Results

    Valu

    e of X

    i

    Cluster 4

    Cluster 1

    Module i

    Cluster 2

    Cluster 3

  • Prof. J. Cong 13

    Jason Cong 25

    Partitioning Algorithms

    a Iterative partitioning algorithms

    a Spectral based partitioning algorithms

    a Net partitioning vs. module partitioning

    a Multi-way partitioning

    a Further study in partitioning techniques

    Jason Cong 26

    Transform Netlists to Graph

    a Transform each r-pin net into a r-clique

    a Weight each edge in the r-clique1/r-1 ( simple scaling) or2/r (edge probability) or4/r2 (bound each net contribution in the cut size)

    a Transforming multi-pin nets to cliques destroy the sparsity of D-A

  • Prof. J. Cong 14

    Jason Cong 27

    Net Partitioning vs. Module Partitioninga Dual netlist representation-- intersection graph

    Node: NetEdge: if two nets share a common gate

    a Intersection graph also captures circuit connectivity, but does not use hyperedges

    o

    o

    o

    o

    o

    o

    o

    i

    G1

    G7

    G6

    G5

    G4

    G3

    G2

    i

    b

    g

    d

    ca j

    f

    eh

    Netlist N Intersection graph I( N )

    i

    b

    c

    f

    j

    hd

    a e

    Jason Cong 28

    Transform Net Partition to Module Partition

    o

    o

    o

    o

    o

    o

    o

    i

    G1

    G7

    G6

    G5

    G4

    G3

    G2

    i

    bg

    d

    ca j

    f

    eh

    ⊆X’⊆X

    N

    Every edge in B(Y, Y’) indicates a conflict:

    G3 is wanted by both c and g

    G4 is wanted by both d and e

    G5 is wanted by both d and f

    ig

    c

    i

    b

    c

    f

    j

    hd

    a

    I ( N )

    e

    fd

    e

    B (Y, Y’)

    Y Y’

  • Prof. J. Cong 15

    Jason Cong 29

    Some Definitionsa Maximum Independent Set(MIS)

    Maximum subset of vertices with no two connected

    a Minimum Vertex Cover(MVC)Minimum subset of vertices which “covers” all edges

    MIS MVC

    Jason Cong 30

    Resolving Net Conflict When Transforming A Net Partition to A Module Partition

    a Some nets in B(Y, Y’) will lose

    Each losing net corresponds to a net-cut in module partition

    a Objective: maximize the # winning nets, or equivalently, minimize the # losing nets

    a Theorem: Set of winning nets forms an independent set in B(Y, Y’), and the set of losers form a vertex cover

    ⇒Find max independent set in B(Y, Y’)⇒Solved optimally in polynomial time for bipartite graph

  • Prof. J. Cong 16

    Jason Cong 31

    Bipartite Graph Propertiesa |MIS|+|MVC|=|V|

    a size of any vertex cover is no smaller than max. matching

    a |MVC|=|MM| (MM: maximum matching)

    ⇒ Goal: Find an MVC of size |MM|, or equivalently, an MIS of size |V|-|MM|

    MIS = Winners

    MVC =Losers

    Jason Cong 32

    Finding an MISL

    R

    L

    R

    Maximum Matching (MM)

    UL

    UR

  • Prof. J. Cong 17

    Jason Cong 33

    Finding an MIS

    Alternating Paths

    L

    R

    UL

    UR

    Jason Cong 34

    Finding an MIS

    Odd, Even Sets

    L

    R

    UL

    UR

    Odd(R)

    Odd(L)Even ( R )

    Even(L)

  • Prof. J. Cong 18

    Jason Cong 35

    Improving the Module PartitionNet1= {a, b}

    Net3= {a, e}

    Net2= {b, c}

    Net6= {g, e}

    Net5= {e, f}

    Net4= {b, d}

    Even(L)={Net1, Net2}

    ModSL={a,b,c}

    Even(R)={Net5, Net6}

    ModSR={e,f,g}

    ModSL+{d} cuts 1set

    ModSR+{d} cuts 2 nets

    Jason Cong 36

    Algorithm IG-Match

    a Get spectral linear ordering

    a Construct bipartite graph B

    a Find MIS of B

    a Convert MIS into module partition

    Theorem: IG-Match computes all |V|-1 module partitions in O(|V|(|V|+|E|)) time

    Key idea: incremental updating of the bipartite graph and alternating paths

  • Prof. J. Cong 19

    Jason Cong 37

    Experimental ResultsWei-Cheng

    Areas/ CiusozeIG-Match

    Areas/CutsizeImprovement

    bm1 9: 873 / 1 21:861 / 1 57%

    19ks 1011:1833 / 109 650:2194/85 -1%

    Prim1 152:681 / 14 154: 679/ 14 1%

    Prim2 1132: 1882/123 740: 2274/ 77 21%

    Test02 372:1291/95 211:1452/38 37%

    Test03 147: 1460/31 803: 804/ 58 38%

    Test04 401:1114/51 73: 1442 / 6 50%

    Test05 1204:1391/110 105: 1442 / 6 53%

    Test06 145: 1607/18 141: 1611/ 17 3%

    a 28.8% average improvement

    Jason Cong 38

    Further Study in Partitioning Techniques

    a Module replication in circuit partitioning [Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]

    a Generating uni-directional partitioning [Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]

    a Logic restructuring during partitioning[Iman-Pedram-Fabian-Cong 1993]

    a Communication based partitioning[Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]

  • Prof. J. Cong 20

    Jason Cong 39

    Multi-Way Partitioning

    a Recursive bi-partitioning[Kernighan-Lin 1970]

    a Generalization of Fuduccia-Mattheyse’sand Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]

    a Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993]a generalized ratio-cut value=sum of flux of each partitiona generalized ratio-cut cost of a k-way partition

    ≥sum of the k smallest eigenvalue of the Laplacian Matrix

    Jason Cong 40

    Circuit Clustering Formulation

    Motivation:

    a Reduced the size of flat netlistsa Identify natural circuit hierarchy

    Objectives:

    a Maximize the connectivity of each clustera Minimize the size, delay (or simply depth),

    and density of clustered circuits

  • Prof. J. Cong 21

    Jason Cong 41

    Measuring Cluster Connectivity

    a Simple density measurementm/n -- include a new node when it

    connects to more than m/n edges

    nm/( ) -- includes a new node when it

    connects to more than 2m/(n-1) edges -- biased for small clusters

    a (K, I)-connectivity [Garber-Promel-Steger 1990]Every two nodes in the cluster are connected by at

    least K edge-disjoint paths of length no more than IProper choice of k and I may not be easy

    a D/S-metric [Cong-Hagen-Kahng 1991]D/S=Degree/SeparationDegree= average degree in the clusterSeparation=average shortest path between a pair of nodes in the cluster Robust but expensive to compute

    uP1

    P2

    Pk

    ≤ l

    v

    2

    Jason Cong 42

    Clustering Algorithmsa (K, I)-connectivity algorithms [Garber-Promel-Steger 1990]

    a Random-walk based algorithm [Cong-Hagen-Kahng 1991; Hagen-Kahng 1992]

    a Multicommodity-Flow based algorithm [Yeh-Cheng-Lin 1992]

    a Clique based algorithm [Bui 1989; Cong-Smith 1993]

    a Clustering for delay minimization (combinational circuits) [Lawler-Levitt-Turner 1969; Murgai-Brayton-Sanjiovanni 1991; Rajaraman-Wong 1993]

    a Clustering for delay minimization (with consideration of retiming for sequential circuits) [Pan et al, TCAD’99; Cong et al, DAC’99]

    a Signal flow based clustering [Cong-Ding, DAC’93; Cong et al ICCAD’97]

    a Multi-level clustering [Karypis-Kumar, TR95/DAC97; Cong-Sung, ASPDAC’2000]

  • Prof. J. Cong 22

    Jason Cong 43

    Lawler’s Labeling Algorithm[Lawler-Levitt-Turner 1969]

    a Assumption: Cluster size≤ K; Intra-cluster delay = 0; Inter-cluster delay =1

    a Objective: Find a clustering of minimum delay

    a Algorithm:a Phase 1: Label all nodes in topological ordor

    For each PI node V, L(v)= 0;For each non-PI node v

    p=Maximum label of predecessors of vXp = set of predecessors of v with label pif |Xp|

  • Prof. J. Cong 23

    Jason Cong 45

    Signal Flow and Logic Dependency• Beyond Connectivity

    – identifies logically dependant cells

    – useful in guiding• clustering • partitioning [CoLB94, CoLL+97]• placement [CoXu95]

    Jason Cong 46

    Maximum Fanout Free Cone (MFFC)• Definition: for a node v in a combinational circuit,

    – cone of v ( ) : v and all of its predecessors such that any path connecting a node in and v lies entirely in

    – fanout free cone at v ( ) : cone of v such that for any node

    – maximum FFC at v ( ) : FFC of v such that for any non-PI node w,

    vCvC vC

    vFFC

    vv FFCuoutputFFCvu ⊆≠ )( , in

    vMFFCvv MFFCwMFFCwoutput ∈⊆ then ,)( if

  • Prof. J. Cong 24

    Jason Cong 47

    Properties of MFFCs• If

    • Two MFFCs are either disjoint or one contains another [CoDi93]• Area-optimal duplication-free LUTs map each MFFC independently

    to get an optimal solution [CoDi93]• For acyclic bipartitioning without area constraint, there exists an

    optimal cut which does not cut through any MFFCs [CoLB94]

    [CoDi93] then , vwv MFFCMFFCMFFCw ⊆∈

    Jason Cong 48

    Maximum Fanout Free Subgraph (MFFS)• Definition : for a node v in a sequential circuit,

    • Illustration

    } through passes PO some to from path every|{ vuuFFSv =} , allfor |{ vvv FFSuFFSuMFFS ∈=

    MFFCs ??? MFFS

  • Prof. J. Cong 25

    Jason Cong 49

    MFFS Construction Algorithm• For Single MFFS at Node v

    – select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)

    v

    Jason Cong 50

    MFFS Construction Algorithm• For Single MFFS at Node v

    – select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs– MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)

    v

  • Prof. J. Cong 26

    Jason Cong 51

    MFFS Construction Algorithm• For Single MFFS at Node v

    – select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)

    v

    Jason Cong 52

    MFFS Construction Algorithm• For Single MFFS at Node v

    – select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)

    v

  • Prof. J. Cong 27

    Jason Cong 53

    MFFS Clustering Algorithm• Clusters Entire Netlist

    – construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))

    v

    Jason Cong 54

    MFFS Clustering Algorithm• Clusters Entire Netlist

    – construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))

  • Prof. J. Cong 28

    Jason Cong 55

    MFFS Clustering Algorithm• Clusters Entire Netlist

    – construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))

    v

    Jason Cong 56

    MFFS Clustering Algorithm• Clusters Entire Netlist

    – construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))

    v

  • Prof. J. Cong 29

    Jason Cong 57

    MFFS Clustering Algorithm• Clusters Entire Netlist

    – construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))

    Jason Cong 58

    Speedup of MFFS Clustering• Approximation of MFFS

    – cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)

    • defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)

    h

    v

    SC(v, h)

    circuit

  • Prof. J. Cong 30

    Jason Cong 59

    Speedup of MFFS Clustering• Approximation of MFFS

    – cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)

    • defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)

    h

    v

    pseudo PIs

    pseudo POs

    SC(v, h)

    circuit

    Jason Cong 60

    LSR/MFFS Partitioning Algorithm

    • Basic Approach : Clustering Based Partitioning– MFFS clustering : exploit logic dependency– LSR partitioning : focus on loose and stable net

    removal• Overview of the Algorithm

    – cluster netlist using MFFS algorithm– partition clustered netlist with LSR algorithm– completely decompose netlist– refine flat netlist with LSR algorithm

  • Prof. J. Cong 31

    Jason Cong 61

    Loose net Removal (LR)• Net Status Transition

    • Loose net Removal

    free net

    loose net

    locked net =

    block 0 block 1 block 2

    free cells of loose net

    uncut net =

    Jason Cong 62

    Loose net Removal (LR)• Net Status Transition

    • Loose net Removal

    free net

    loose net

    locked net =

    block 0 block 1 block 2

    free cells of loose net

    uncut net =

  • Prof. J. Cong 32

    Jason Cong 63

    Loose net Removal (LR)• Net Status Transition

    • Loose net Removal

    free net

    loose net

    locked net =

    block 0 block 1 block 2

    free cells of loose net

    uncut net =

    Jason Cong 64

    Loose net Removal (LR)• Net Status Transition

    • Loose net Removal

    free net

    loose net

    locked net =

    block 0 block 1 block 2

    free cells of loose net

    uncut net =

  • Prof. J. Cong 33

    Jason Cong 65

    Loose net Removal (LR)• Net Status Transition

    • Loose net Removal

    free net

    loose net

    locked net =

    block 0 block 1 block 2

    free cells of loose net

    uncut net =

    Jason Cong 66

    LR Implementation• Basic Approach

    – increase gain of free cells of loose nets outside anchor

    – focus on smaller nets

    • Cell Gain Increase Function– accumulate : until complete removal– set threshold : array-based bucket can be used– expanded gain range : no tie-breaking scheme (LIFO, LA3)

    needed

    ⋅=∑∑

    n

    n

    Fc

    Lc

    cdeg

    cdeg

    nsizemax_net

    nincr)(

    )(

    )()(

  • Prof. J. Cong 34

    Jason Cong 67

    Stable Net Transition (SNT)• Stable net [Shibuya et al. 1995]

    – remain cut throughout entire run– more than 80% of cut nets are stable

    • Implementation– detect stable nets by comparing cutsets– move all unprocessed cells of a stable net into single block– serve as initial partition for next run

    • fast convergence• efficiently explore more local minima

    • Loose and Stable net Removal (LSR)– combination of two effective net removal schemes

    • LR : small loose nets• SNT : large stable nets

    Jason Cong 68

    Summary of LSR/MFFS Algorithm

    Cluster netlist with MFFS

    Partition netlist with LSR

    Gain?

    Decluster netlist

    Refine netlist with LSR

    Gain?yes yes

    nono

    START

    END

  • Prof. J. Cong 35

    Jason Cong 69

    Clustering Result• MCNC benchmarks with signal flow information

    NAME SIZE AVR # CLST TIME # CLST TIMEs1423 619 1.0 193 0.9 168 1.9sioo 664 4.6 442 2.3 442 2.6

    s1488 686 1.0 187 0.9 187 1.4balu 801 12.9 287 2.0 284 3.8

    prim1 833 6.7 394 3.2 379 5.9struct 1952 2.5 1183 17.9 1183 12.1prim2 3014 6.7 1404 34.8 1243 22.1s9234 5866 1.0 3203 36.7 3177 9.7biomed 6514 4.5 2142 87.8 1853 30.5s13207 8772 1.0 2132 73.9 1994 14.1s15850 10470 1.0 2279 84.9 2137 17.6s35932 18148 1.0 5562 420.8 2943 32.1s38584 20995 1.0 5139 565.1 4242 44.5avq.sm 21918 4.5 8477 1287.3 8309 116.1s38417 23949 1.0 5906 452.2 5295 45.1avq.lg 25178 4.5 9103 1473.2 8658 90.2

    TOTAL 48033 4543.9 42494 449.7

    Optimal Heuristic

    min_areamax_area

    AVR =

    Jason Cong 70

    Cutsize Reduction by MFFS Clustering

    • Bipartitioning under 45-55% Balance Criterion

    500 700 900 1100 1300 1500 1700 1900

    LSR

    LR

    SNT

    FM

    FLATMFFS

    0.0%-25.8%

    -8.0%-31.7%

    -45.6%--48.4%

    -46.9%-49.9%

  • Prof. J. Cong 36

    Jason Cong 71

    Cutsize Reduction by MFFS Clustering

    NAME FM SNT LR LSR FM SNT LR LSRs1423 12 14 13 13 12 12 12 12sioo 25 25 25 25 25 25 25 25

    s1488 48 45 44 42 42 42 43 43balu 27 27 27 27 27 27 27 27prim1 43 42 42 42 43 42 42 43struct 42 45 33 33 39 52 33 33prim2 174 167 121 122 133 122 131 119s9234 49 54 41 43 42 43 41 40biomed 83 83 83 83 100 84 84 84s13207 90 74 75 70 91 65 73 61s15850 113 73 73 65 72 70 44 43s35932 100 128 45 44 48 76 45 44s38584 52 52 49 47 52 52 47 47avq.sm 321 378 135 139 273 172 127 127s38417 176 112 53 55 119 147 51 50avq.lg 491 380 145 131 251 168 127 127

    TOTAL 1846 1699 1004 981 1369 1261 952 925% IMPR 0.0 8.0 45.6 46.9 25.8 31.7 48.4 49.9

    TIME 121.4 59.4 162.3 138.9 84.2 61.3 113.6 89.4

    No Clustering MFFS Clustering

    Jason Cong 72

    Cutsize Comparison with Other Methods

    800

    900

    1000

    1100

    1200

    CD

    IP

    Pro

    pF

    Stra

    w

    hMet

    is

    ML

    c

    L/M

    Pan

    za

    Par

    ab

    FB

    B

    IIPsNon-IIPs

  • Prof. J. Cong 37

    Jason Cong 73

    Details on Cutsize Comparison

    NAME CDIP PROPf Straw hMetis MLc L/M PANZA Parab FBBs1423 12 15 16 13sioo 25 25 45

    s1488 43 44 50balu 27 27 27 27 27 27 27 41

    prim1 52 51 49 50 47 43 struct 36 33 33 33 33 33 33 40prim2 152 152 143 145 139 119 s9234 44 42 42 40 40 40 40 74 70biomed 83 84 83 83 83 84 83 135s13207 70 71 57 55 55 61 66 91 74s15850 67 56 44 42 44 43 44 91 67s35932 73 42 47 42 41 44 43 62 49s38584 47 51 49 47 47 47 47 55 47avq.sm 148 144 131 130 128 127 s38417 79 65 53 51 49 50 49 49 58avq.lg 145 143 140 127 128 127

    TOTAL1 1023 961 898 872 861 845 TOTAL2 509 516 749% IMPR 17.4 12.1 5.9 3.1 1.9 1.4 32.0 21.4

    TIME 5817 5611 12577 3455 1338 24619

    Iterative Improvement Partitioners (IIPs) non-IIPs

    Jason Cong 74

    Summary

    a Partitioning is very important for interconnect optimization

    a Increasing emphasis on interconnect design has introduced many new partitioning formulations

    a clustering is effective in reducing circuit size and identifying natural circuit hierarchy

  • Prof. J. Cong 38

    Jason Cong 75

    Appendix: Simulated Annealing Based Partitioning

    a Basic approacha Solution space: all partitionsa Neighboring solution: move a module from one side to another

    a Cost function: cost = c(X,X’)+λ(|X|-|X’|)2

    a Temperature schedule: standard cooling schedule

    a Simulated annealing without rejectiona Generate a move with probability proportional to its gain

    a All gains can be updated efficiently after a move( unique for partitioning!)

    a Substantial run-time reduction at low temperature

    a Use conventional SA at high temperature and rejectless SA at low temperatures

    Jason Cong 76

    Simulated Annealing

    a Local Search

    cost

    fun

    ctio

    n

    solution space

    o

    o

    oo

    o

    oo

    o?

  • Prof. J. Cong 39

    Jason Cong 77

    Statistical Mechanics

    Combinational Optimization

    State { r: } (configuration - a set of atomic position)

    Weight

    -Boltzmann distribution

    E({r:}) energy of configuration

    KB: Boltzmann constant; T: temperature.

    Low Temperature Limit??

    TKrEbe /:})({

    Jason Cong 78

    Analogy

    Physical System

    State(configuration)

    Energy

    Ground State

    Rapid Quenching

    Careful Annealing

    Optimization Problem

    (Solution)

    Cost function

    Optimal solution

    Iteration Improvement

    Simulated Annealing

  • Prof. J. Cong 40

    Jason Cong 79

    Generic Simulated Annealing Algorithm

    1. Get an initial solution S

    2. Get an initial temperature T>0

    3. While not yet “frozen” do the following:3.1 For 1≤ i ≤ L, do the following:

    3.1.1 Pick a random neighbor S’ of S.3.1.2 Let ∆=cost( s’ )-cost(s)3.1.3 If ∆ ≤ ( 0 ) (downhill move),

    Set S=S’3.1.4 If ∆ > 0 (uphill move)

    set S=S’ with probability , 3.2 Set T= rT (reduce temperature)

    4. Return S

    Te /∆−

    Jason Cong 80

    Basic Ingredients for S.A.

    a Solution space

    a Neighborhood Structure

    a Cost Function

    a Annealing Schedule

  • Prof. J. Cong 41

    Jason Cong 81

    SA Partitioning“ Optimization by simulation Annealing” -Kirkpatrick, Gaett, Vecchi.

    a Solution space=set of all partitions

    a Neighborhood Structure

    abc

    def

    ab

    def

    af

    bcde

    abc

    a solution a solution a solution

    def

    bcde

    ac

    a move

    Randomly move one cell to the other side

    Jason Cong 82

    SA Partitioninga Cost function:

    f=C+λBa C is the partitioning cost as used before

    a B is a measure of how balance the partitioning is

    a λ is a constant.

    Example of B:

    ab...

    cd...S2S1

    B = ( |S1| - |S2| )2

  • Prof. J. Cong 42

    Jason Cong 83

    SA Partitioninga Annealing schedule:

    a Tn=(T1/T0)nT0 Ratio T1/T0=0.9

    a At each temperature, either

    1. There are 10 accepted moves on the average;

    or

    2. # of attempts≥100× total # of cells

    a The system is “frozen” if very low acceptances at 3 consecutive temp.

    Jason Cong 84

    Graph Partition UsingSimulated Annealing Without Rejections

    a Greene and Supowit, ICCD-88 pp. 658-663

    a Motivation:

    At low temperature, most moves are rejected!

    e.g. 1/100 acceptance rate for 1,000 vertices

  • Prof. J. Cong 43

    Jason Cong 85

    a Key Idea(I) Biased selection

    If a move i has probability ω i to be accepted, generate move i with probability

    N: size of neighborhood

    In general,

    In conventional model, each move has probability 1/N to be generated.

    (II) If a move is generated, it is always be accepted

    Graph Partition Using Simulated Annealing Without Rejections (Cont’d)

    ∑=

    N

    Jj

    i

    1

    ω

    ω

    }.,1min{ /Ti ie∆ −=ω

    Jason Cong 86

    Graph Partition Using Simulated Annealing Without Rejections (Cont’d)

    a Main Difficulty

    ( 1 ) ωi is dynamic ( since ∆ i is dynamic )

    It is too expensive to update ωi’s (∆ i’s) after every move

    ( 2 ) Weighted selection problem

    how to select move i with probability

    ∑=

    N

    jji

    1

    ωω ??

  • Prof. J. Cong 44

    Jason Cong 87

    Solution to the Weight Selection Problem(general solution to the several problems)

    ω1+ ω 2

    ω1+ •••+ ω 7

    ω7

    ω5+ ω 6+ ω 7

    ω5+ ω 6ω3+ ω 4

    ω1+ •••+ ω 4

    ω1 ω2 ω3 ω4 ω5 ω6 ω70

    Jason Cong 88

    Solution to the Weight Selection Problem (Cont’d)

    Let W= ω1+ ω2+ ω3+ω4+ ω5+ ω6+ •••+ωn, how to select i with probability ωi /W ?

    Equivalent to choosing x such that ω1+ •••+ωi-1< x ≤ ω2+ •••+ωnv← rootx ← random( 0, 1 )* ω(v)while v is not a leaf do

    if x < ω(left (v)) then v ← left(v)else x ← x-ω(left(v)), v ← right (v)

    endProbability of ending up at leaf:

    ∑ ∑

    ∑−

    = =

    =

  • Prof. J. Cong 45

    Jason Cong 89

    Application to PartitioningSpecial solution to the first problem

    Given a partition (A, B)

    Cost F(A,B)=Fc(A,B)+FI(A,B)Fc(A,B) = net-cut between A,BFI(A,B) = C(|A|2+|B|2) (min when |A|=|B|=n/2)

    for move i, ∆ i=F(A’,B’)-F(A,B)

    After a move

    ),()','(

    ),()','(

    BAFBAF

    BAFBAFcII

    i

    ccci

    −=∆

    −=∆

    changesAll

    changes.fewaIi

    Ci

    Jason Cong 90

    Solution:

    Two-step biased selection:

    (i) choose A or B based on

    (ii) choose move i within A or B based

    Note, ’s are the same for each in A or B.

    So we keep one copy of for A

    one copy of for B

    choose the moves within A or B using the tree algorithm

    Application to PartitioningSpecial solution to the first problem(Cont’d)

    )(-> TIi

    Ii ∆ ∆ α

    )(TCi

    Ci ∆ ∆ −>α

    )(TIi∆ α )(T Ci∆ α •Pi=

    Ii∆

    Ii∆

    Ii∆