Partitioning and Clustering Professor Lei He [email protected]

42
Partitioning and Clustering Professor Lei He [email protected] http://eda.ee.ucla.edu/
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    225
  • download

    6

Transcript of Partitioning and Clustering Professor Lei He [email protected]

Page 1: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Partitioning and Clustering

Professor Lei [email protected]

http://eda.ee.ucla.edu/

Page 2: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Outline

Circuit Partitioning formulation

Importance of Circuit Partitioning

Partitioning Algorithms

Circuit Clustering Formulation

Clustering Algorithms

Page 3: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Partitioning Formulation

Bi-partitioning formulation:

Minimize interconnections between partitions

Minimum cut: min c(x, x’)

minimum bisection: min c(x, x’) with |x|= |x’|

minimum ratio-cut: min c(x, x’) / |x||x’|

X X’

c(X,X’)

Page 4: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

A Bi-Partitioning Example

Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19

a

b

c e

d f

mini-ratio-cut min-bisection

min-cut 9

10

100

100 100100100

100

4

Ratio-cut helps to identify natural clusters

Page 5: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Circuit Partitioning Formulation (Cont’d)

General multi-way partitioning formulation:

Partitioning a network N into N1, N2, …, Nk such that

Each partition has an area constraint

each partition has an I/O constraint

Minimize the total interconnection:

iNv

iAva )(

iii INNNc ),(

),( iN

i NNNci

Page 6: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Importance of Circuit Partitioning

Divide-and- conquer methodology

The most effective way to solve problems of high complexity

E.g.: min-cut based placement, partitioning-based test generation,…

System-level partitioning for multi-chip designs or 3D

inter-chip interconnection delay dominates system performance

inter-layer wire pitch is much larger

Circuit emulation/parallel simulation

partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special-purpose processors (e.g. Zycad).

Parallel CAD development

Task decomposition and load

Page 7: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Partitioning Algorithms

Iterative partitioning algorithms

Multi-way partitioning

Multi-level partitioning (to be discussed after clustering)

Page 8: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Iterative Partitioning Algorithms

Greedy Iterative improvement method

[Kernighan-Lin 1970]

[Fiduccia-Mattheyses 1982]

[krishnamurthy 1984]

Simulated Annealing

[Kirkpartrick-Gelatt-Vecchi 1983]

[Greene-Supowit 1984]

(SA will be formally introduced in the Floorplan chapter)

Page 9: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Kernighan-Lin’s Algorithm

Pair-wise exchange of nodes to reduce cut sizeAllow cut size to increase temporarily within a pass

Compute the gain of a swap

Repeat

Perform a feasible swap of max gain

Mark swapped nodes “locked”;

Update swap gains;

Until no feasible swap;

Find max prefix partial sum in gain sequence g1, g2, …, gm

Make corresponding swaps permanent.

Start another pass if current pass reduces the cut size (usually converge after a few passes)

u v

v u

locked

Page 10: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Fiduccia-Mattheyses’ Improvement

Each pass in KL-algorithm takes O(n3) or O(n2 logn) time (n: #modules)Choosing swap with max gain and updating swap gains take O(n2) time

FM-algorithm takes O(p) time per pass( p: #pins)

Key ideas in FM-algorithms Each move affects only a few moves constant time gain updating per move(amortized)

Maintain a list of gain buckets constant time selection of the move with max gain

Further improvement by KrishnamurthyLook-ahead in gain computation

u1

V1 V2

u2

gmax

-gmax

Page 11: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Simulated Annealing

Local Searchco

st f

un

ctio

n

solution space

o

o

oo

o

oo

o

?

Page 12: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Statistical Mechanicsvs Combinational Optimization

State { r: } (configuration - a set of atomic position)

Weight

-Boltzmann distribution

E({r:}) energy of configuration

KB: Boltzmann constant; T: temperature.

Low Temperature Limit??

TKrE be /:})({

Page 13: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Analogy

Physical System

State(configuration)

Energy

Ground State

Rapid Quenching

Careful Annealing

Optimization Problem

(Solution)

Cost function

Optimal solution

Iteration Improvement

Simulated Annealing

Page 14: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Generic Simulated Annealing Algorithm

1. Get an initial solution S

2. Get an initial temperature T>0

3. While not yet “frozen” do the following:3.1 For 1 i L, do the following:

3.1.1 Pick a random neighbor S’ of S.3.1.2 Let cost( s’ )-cost(s)3.1.3 If ( 0 ) (downhill move),

Set S=S’3.1.4 If 0 (uphill move)

set S=S’ with probability

3.2 Set T= rT (reduce temperature)

4. Return S

Te /

Page 15: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Basic Ingredients for S.A.

Solution space

Neighborhood Structure

Cost Function

Annealing Schedule

Page 16: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

SA Partitioning“ Optimization by simulation Annealing” -Kirkpatrick, Gaett, Vecchi.

Solution space=set of all partitions

Neighborhood Structure

abc

def

ab

def

af

bcde

abc

a solution a solution a solution

def

bcde

ac

a move

Randomly move one cell to the other side

Page 17: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

SA PartitioningCost function:f=C+B

C is the partitioning cost as used before

B is a measure of how balance the partitioning is

is a constant.

Example of B:

ab...

cd...

S2S1

B = ( |S1| - |S2| )2

Page 18: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

SA Partitioning

Annealing schedule:

Tn=(T1/T0)nT0 Ratio T1/T0=0.9

At each temperature, either

1. There are 10 accepted moves on the average;

or

2. # of attempts100 total # of cells

The system is “frozen” if very low acceptances at 3 consecutive temp.

Page 19: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Graph Partition Using Simulated Annealing Without Rejections

Greene and Supowit, ICCD-88 pp. 658-663

Motivation:

At low temperature, most moves are rejected!

e.g. 1/100 acceptance rate for 1,000 vertices

Page 20: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Key Idea

(I) Biased selection

If a move i has probability i to be accepted, generate move i with probability

N: size of neighborhood

In general,

In conventional model, each move has probability 1/N to be generated.

(II) If a move is generated, it is always be accepted

Graph Partition Using Simulated Annealing Without Rejections (Cont’d)

N

Jj

i

1

}.,1min{ /Ti

ie

Page 21: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Graph Partition Using Simulated Annealing Without Rejections (Cont’d)

Main Difficulty

( 1 ) i is dynamic ( since i is dynamic )

It is too expensive to update i’s (i’s) after every move

( 2 ) Weighted selection problem

how to select move i with probability

N

jji

1

??

Page 22: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Solution to the Weight Selection Problem(general solution to the several problems)

1+ 2

1+ •••+ 7

7

5+ 6+ 7

5+ 63+ 4

1+ •••+ 4

12 3 4

5 6 70

Page 23: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Solution to the Weight Selection Problem (Cont’d)

Let W= 1+ 2+ 3+4+ 5+ 6+ •••+n, how to select i with probability i /W ?

Equivalent to choosing x such that 1+ •••+i-1< x i+ •••+n

v rootx random( 0, 1 )* (v)while v is not a leaf do

if x < (left (v)) then v left(v) else x x-(left(v)), v right (v)

endProbability of ending up at leaf:

1

1 1

1

(Probi

j

i

jjj

i

N

jji

x

W

)

Page 24: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Application to PartitioningSpecial solution to the first problem

Given a partition (A, B)

Cost F(A,B)=Fc(A,B)+FI(A,B)Fc(A,B) = net-cut between A,BFI(A,B) = C(|A|2+|B|2) (min when |A|=|B|=n/2)

for move i, i=F(A’,B’)-F(A,B)

After a move

),()','(

),()','(

BAFBAF

BAFBAFcII

i

ccci

changes All

changes. few aIi

Ci

Page 25: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Solution:

Two-step biased selection:

(i) choose A or B based on

(ii) choose move i within A or B based

Note, ’s are the same for each in A or B.

So we keep one copy of for A

one copy of for B

choose the moves within A or B using the tree algorithm

Application to PartitioningSpecial solution to the first problem(Cont’d)

) ( -> TIi

Ii

) ( TCi

Ci

) (TIi

) ( TCi•Pi=

Ii

Ii

Ii

Page 26: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

More Partitioning TechniquesSpectral based partitioning algorithms [Hagen-Kahng 1991] [Cong-Hagen-Kahng 1992]

Module replication in circuit partitioning

[Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]

Generating uni-directional partitioning

[Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]

Logic restructuring during partitioning[Iman-Pedram-Fabian-Cong 1993]

Communication based partitioning[Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]

Page 27: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Multi-Way Partitioning

Recursive bi-partitioning [Kernighan-Lin 1970]

Generalization of Fuduccia-Mattheyse’s and Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]

Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993] generalized ratio-cut value=sum of flux of each partition generalized ratio-cut cost of a k-way partition

sum of the k smallest eigenvalue of the Laplacian Matrix

Page 28: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Circuit Clustering Formulation

Motivation:

Reduced the size of flat netlists Identify natural circuit hierarchy

Objectives:

Maximize the connectivity of each cluster Minimize the size, delay (or simply depth),

density of clustered circuits

Page 29: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Lawler’s Labeling Algorithm[Lawler-Levitt-Turner 1969]

Assumption: Cluster size K; Intra-cluster delay = 0; Inter-cluster delay =1

Objective: Find a clustering of minimum delay

Algorithm:Phase 1: Label all nodes in topological order

For each PI node V, L(v)= 0;

For each non-PI node v

p=Maximum label of predecessors of v

Xp = set of predecessors of v with label p

if |Xp|<K then L(v) = p else L(v) =P+1Phase2: Form clustersStart from PO to generate necessary clusters

Nodes with the same label form a cluster

p-1

Xp

p-1

v

p-1

p

p

Page 30: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Lawler’s Labeling Algorithm(Cont’d)

Performance of the algorithm Efficient run-time Minimum delay clustering solution Allow node duplication No attempt to minimize the number of clusters

Extension to allow arbitrary gate delays Heuristic solution

[Murgai-Brayton-Sangiovanni 1991] Optimal solution

[Rajaraman-Wong 1993]

Page 31: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Maximum Fanout Free Cone (MFFC)

Definition: for a node v in a combinational circuit, cone of v ( ) : v and all of its predecessors such that any path

connecting a node in and v lies entirely in fanout free cone at v ( ) : cone of v such that for any node

maximum FFC at v ( ) : FFC of v such that for any non-PI node w,

vC

vC vC

vFFCvv FFCuoutputFFCvu )( , in

vMFFC

vv MFFCwMFFCwoutput then ,)( if

Page 32: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Properties of MFFCs If

Two MFFCs are either disjoint or one contains another [CoDi93]

[CoDi93] then , vwv MFFCMFFCMFFCw

Page 33: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Maximum Fanout Free Subgraph (MFFS)

Definition : for a node v in a sequential circuit,

Illustration

} through passes PO some to from path every|{ vuuFFSv } , allfor |{ vvv FFSuFFSuMFFS

MFFCs ??? MFFS

Page 34: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Construction Algorithm For Single MFFS at Node v

select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)

v

Page 35: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Construction Algorithm For Single MFFS at Node v

select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)

v

Page 36: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Construction Algorithm For Single MFFS at Node v

select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)

v

Page 37: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Construction Algorithm For Single MFFS at Node v

select root node v and cut all its fanout edges mark all nodes reachable backwards from all POs MFFSv = {unmarked nodes} complexity : O(|N| + |E|)

v

Page 38: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Clustering Algorithm Clusters Entire Netlist

construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))

v

Page 39: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Clustering Algorithm Clusters Entire Netlist

construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))

Page 40: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Clustering Algorithm Clusters Entire Netlist

construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))

v

Page 41: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

MFFS Clustering Algorithm Clusters Entire Netlist

construct MFFS at a PO and remove it from netlist include its inputs as new POs repeat until all nodes are clustered complexity : O(|N| · (|N| + |E|))

Page 42: Partitioning and Clustering Professor Lei He lhe@ee.ucla.edu

Summary Partitioning is key for applying divide-and-

conquer methodology (for complexity management)

Partitioning also defines global/local interconnects and greatly impact circuit performance

Growing importance of interconnect design has introduced many new partitioning formulations

clustering is effective in reducing circuit size and identifying natural circuit hierarchy

Multi-level circuit clustering + iterative improvement based methods produce the best partitioning results