Analytical Minimization of Signal Delay in VLSI Placement

Post on 04-Feb-2016

68 views 0 download

description

Analytical Minimization of Signal Delay in VLSI Placement. Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan http://www.eecs.umich.edu/~imarkov IBM technical contact: Paul Villarrubia. Outline. Background: Global Placement for VLSI wirelength minimization delay minimization - PowerPoint PPT Presentation

Transcript of Analytical Minimization of Signal Delay in VLSI Placement

Analytical Minimization of Signal Delayin VLSI Placement

Andrew B. Kahng and Igor L. Markov

UCSD, Univ. of Michiganhttp://www.eecs.umich.edu/~imarkov

IBM technical contact: Paul Villarrubia

Outline

• Background: Global Placement for VLSI– wirelength minimization

– delay minimization

• Contribution– minimization objective

– “generic” minimization algorithm: outer loop and inner loop

– empirical results

• Futures

VLSI Global Placement

• Find locations for standard cells

• Standard cells placed in rows, without overlap

• Minimize wirelength, “routing congestion”

• Minimize clock cycle

• Key abstractions:– standard cells rectangular outlines

– netlist weighted hypergraph (signal nets hyperedges)

– signal delay function of cell locations (interconnect dominates)

A VLSI Global Placement Example

bad placement good placement

Netlist Hypergraph and Timing Graph

• Two signal nets: 3 pins (l.blue), and 4 pins (l.green)

• Ovals: hyperedges

• Red edges: timing graph edges

Top-Down Global Placement• Placement blocks represent cells and layout area

– single block at the start, driven by recursive (min-cut) bipartitioning– each pass: number of blocks doubles, size of blocks halves– end case: several cells in a tiny region

etc.

•Intuition: many cells can operate in parallel.Partitioning finds “independent” groups of cells

Analytical Global Placement

• Find a continuous placement (locations == reals)• Efficient optimizations when nonconvex constraints are

relaxed (e.g., cells are allowed to overlap)• Represent multi-pin hyperedges by sets of edges

– minimize total weighted “wirelength” of all edges

Popular objectives:• Linear (Manhattan) WL = w12 ( |x1-x2| + |y1-y2| )• Quadratic “squared” WL = w12 ( (x1-x2)2 + (y1-y2)2 )Constraints: fixed vertices and/or “region constraints”

P1

P2

Analytical Placement Alone is Not Enough

• Many cells overlap• Must “spread” the placement • IBM CPlace and XQ

– Remove overlap (comp. geometry)

– Cplace combines min-cut with analytical techniques

Timing-Driven Placement

• Cycle time maximum path delay, not total path delay (!) – max(x,y,...) is not differentiable

– framework: pin-based timing graph

• Analytical approaches allow cell overlaps– Cell overlaps are resolved later

• Main difficulty: cannot enumerate signal paths• Signal paths implicitly defined by device types

– signal path sources, sinks == I/O pins and storage elements

• Timing constraints also implicitly defined– “actual arrival times” (AATs) at sources– “required arrival times” (RATs) at sinks– source-sink path constraint: path delay RAT@sink - AAT@source

Implicit Analysis of Path Constraints

• Static Timing Analysis (STA) methodology– forward topological traversal in timing graph AAT@every_pin

– similar backward traversal RAT@every_pin

– slack@pin is given by RAT@pin - AAT@pin

– negative slacks violated timing constraints

• STA-based and STA-inspired placement methods– slacks net weights for HPWL minimization

• top-down placement to maximize negative slack (Marek-Sadowska/Lin 86)

– note: STA requires edge delays (e.g., from placement)– delay budgets

• zero-slack (Hauge, Nair and Yoffa 86)• iterative min-max (Shragowitz et al. 90/92)• limit-bumping (Frankle 92)

Motivations For Novelty

• Many promising techniques available– net reweighting

– delay budgeting

– others

• Existing frameworks have weaknesses– speed/scalability

– loss or ignorance of input information• delay budgeting algorithms tend to ignore fixed locations, obstacles

– optimization of “wrong” global objectives (e.g., average wirelength)

The Dimensionless Path-Timing Objective

• For path consider edge e

• Dimensionless Path-Timing Objective (DPO)

=max {t /c}= max {(e de)/c}

• Where

– c is path constraint

– t is path delay

– de= dij(xi,yi,xj,yj) is edge delay

DPO: Properties

=max {t /c}= max {(e de)/c}

• 1 all timing constraints are satisfied

• Convex when edge delay models are convex

• Min DPO max slack when all c are equal

• Max slack can be reduced to min DPO– add two new vertices: the source and the sink

– connect the source to former sources

– connect the sink to former sinks

– use constant edge delay models

Criticalities: “Multiplicative Slacks”

• By analogy with slack, define criticalities

i = max v {t /c} for vertex v=vi

ij = max e {t /c} for edge e=eij

• Criticalities are multiplicative versions of slack

• DPO and criticalities quickly computable– STA + postprocessing

• Vertex criticalities cells on critical paths– can be used by the proposed top-down timing-driven placement flow

Generic Minimization of DPO

• Reduce DPO to a simpler objective: maxij wijdij

– maximal weighted edge delay

– use “reweighting iterations”

• One reweighting iteration– assume a placement

– compute edge criticalities

– compute new edge weights wij

– minimize maxij wijdij

• (New weights: wij’= ij / dij where = maxij wijdij )

Properties of Reweighting

• Theorem 1. If = maxij wijdij does not increase at a

particular iteration, all timing constraints must be satisfied.

• Theorem 2. A re-weighting iteration either decreases DPO, or leaves it unchanged.

• Reweighting upper-bounds dij because wijdij can interpret reweighting as delay rebudgeting

• Youssef and Shragowitz used wij= ij in 1990/92– [interpretation of their iterative MiniMax]

– no iterations with placement: ignore fixed pad locations

Optimization of Maximal Edge Delay

• Must consider particular edge delay models– popular choices: linear and quadratic

• Theorem 3. 2-dim max edge delay can be reduced to 1-dim case with double #vertices

• [“Inlined” implementation: no new graph]

max akm |tk-tm|

max bkm (tk-tm)2

• Theorem 4. Let bkm=akm2 minimizers coincide

Linear and quadratic WL are numerically equivalent!

Top-Down Placement Framework

• Top-down placement done in passes• In one pass

– split every previously existing block

• Cell-to-block assignments– viewed as region constraints– gradually refine, converge to cell locs

• Assume we analytically minimized signal delay have cell locations can compute edge delays can perform Static Timing Analysis know which cells lie on critical paths• Use delay-minimizing cell locs when splitting

blocks

Empirical Validation

• We combined min-max placement with recursive min-cut bisection (Capo CapoT)

• Implemented minimization of edge delay objectives:– Length as delay

– Squared length as delay

– Quadratic RC delay

– MST-based Elmore delay (using

• Evaluated– Internal evaluators (after placement): sanity check

– Industry timing analyzer

• Compared to an industry placer on 4 test-cases– Won on three test-cases (by slack computed with industry STA)

Results of Quadratic, Linear and Min-Max Placement

Results of Quadratic, Linear and Min-Max Placement

Conclusions and Ongoing Work

• New timing-driven placement framework– can potentially be combined with budgeting or reweighting

– expected to be successful enough on its own

– leverages mincut placement

– relies on a novel analytical delay minimization

• Dimensionless Path-timing Objective (DPO)– novel global timing objective; generalizes slack optimization

• New minimization algorithms– reweighting iteration: reduction to simpler MAX-based objective

– MAX-based objective can be minimized very quickly

• Ongoing work in the context of timing-driven flows

Future Work

• Observation (how the proposed method works)– a classic placement approach is split into stages

– a new timing optimization is performed between those stages

– most critical wires/gates are found first

(traditionally: placement is found first)

Try other types of optimizations during placement– routing of timing-critical nets

• better delay estimation

• early cross-talk detection?

– sizing of timing-critical drivers

– buffer insertion for timing-critical nets

– early detection of dangerous cross-talk

Faster and cheaper ICs