Part III Gain-based...
Transcript of Part III Gain-based...
ASP-DAC'01 - Patrick Groeneveld 1
Part IIIPart III
GainGain--based synthesisbased synthesis
enabler for ‘correctenabler for ‘correct--byby--construction’ designconstruction’ design
ASPASP--DAC’01 Tutorial DAC’01 Tutorial
Patrick Patrick GroeneveldGroeneveld((patrickpatrick@[email protected]).com)
Magma Design AutomationMagma Design AutomationCupertino, CACupertino, CA
ASP-DAC'01 - Patrick Groeneveld III-2
SummarySummary
nn Iterative flow Iterative flow vsvs. stepwise refinement. stepwise refinementnn Derivation of a simple delay modelDerivation of a simple delay modelnn Gain based delay optimizationGain based delay optimizationnn Building a tool flow around this modelBuilding a tool flow around this modelnn Standard cell library issuesStandard cell library issuesnn Getting timing during routingGetting timing during routingnn RecommendationsRecommendations
ASP-DAC'01 - Patrick Groeneveld III-3
PreliminariesPreliminaries
nn Timing closure: Obtaining a feasible layout of a circuit Timing closure: Obtaining a feasible layout of a circuit that meets the given timing specification.that meets the given timing specification.
nn Objective: Obtain closure as fast and effortless as Objective: Obtain closure as fast and effortless as possible. possible.
nn Assumptions:Assumptions:uu ASIC design style.ASIC design style.uu Standard cell abstraction. Standard cell abstraction. uu Static CMOS.Static CMOS.
nn Neglect other design issues Neglect other design issues
ASP-DAC'01 - Patrick Groeneveld III-4
Interconnect Interconnect parasitics parasitics (C and R)(C and R)
nn Speed is entirely determined bySpeed is entirely determined by parasiticsparasiticsnn ParasiticsParasitics are tinyare tinynn ParasiticsParasitics depend on the depend on the exactexact layout layout nn Therefore they are hard or impossible to estimate, Therefore they are hard or impossible to estimate,
especially before placement.especially before placement.
ASP-DAC'01 - Patrick Groeneveld III-5
Timing Uncertainty Timing Uncertainty
GateGate--toto--gate delay depends on:gate delay depends on:•• Wire length (unknown during synthesis)Wire length (unknown during synthesis)•• The layer of the wire (determined during routing)The layer of the wire (determined during routing)•• The configuration of the neighboring wires:The configuration of the neighboring wires:
distance, near/far (unknown before detailed routing)distance, near/far (unknown before detailed routing)•• Timing window and slope of the neighboring wires.Timing window and slope of the neighboring wires.
ASP-DAC'01 - Patrick Groeneveld III-6
Meeting timing gets harderMeeting timing gets harder
FlipFlipflopflop
FlipFlipflopflop
FlipFlipflopflop
FlipFlipflopflop
FlipFlipflopflop
dd
dd
5 5 ns ns maxmax
ASP-DAC'01 - Patrick Groeneveld III-7
slack
Timing is a result of the placementTiming is a result of the placement
nn The bad news: the worst timing sets the clock speed!The bad news: the worst timing sets the clock speed!
slackCdream
s
Creal
ASP-DAC'01 - Patrick Groeneveld III-8
Prediction vs realityPrediction vs reality
number of number of netsnets
Real delay Real delay -- predicted delaypredicted delay
Average,Average,wireload model,wireload model,what you what you designed fordesigned for
fastest/bestfastest/best slowest/worstslowest/worst
circuit does circuit does not worknot work
--100%100% +100%+100%
ASP-DAC'01 - Patrick Groeneveld III-9
The end of the wire load modelThe end of the wire load model
nn Model is used in Model is used in coventionalcoventional synthesis toolssynthesis toolsnn It guesses load based on the number of pins of the It guesses load based on the number of pins of the
netnetnn The average is correct but...The average is correct but...
ASP-DAC'01 - Patrick Groeneveld III-10
You must iterate!You must iterate!
Logic Synthesis
Placement
Extraction
Routing
Timing Analysis
Logic Synthesis
GDSII
RTL
Multiple iterations
Met timing? NO
Today’s Conventional FlowToday’s Conventional Flow
nn Synthesis does not Synthesis does not accurately model accurately model interconnectinterconnect
nn Cell sizes fixed before Cell sizes fixed before placement.placement.
nn Place & route unable Place & route unable to meet timing goalto meet timing goal
ASP-DAC'01 - Patrick Groeneveld III-11
place & route
logic synthesis
The trial and error iterationThe trial and error iteration
ASP-DAC'01 - Patrick Groeneveld III-12
Methodology ProblemsMethodology Problems
nn To avoid endless iterations, the design must be ‘on To avoid endless iterations, the design must be ‘on the safe side’the safe side’
nn Iterations are very slow and may not convergeIterations are very slow and may not convergenn You’re never sure if you’ll make itYou’re never sure if you’ll make itnn Only a painful trial and error process reports design Only a painful trial and error process reports design
feasibility.feasibility.
ASP-DAC'01 - Patrick Groeneveld III-13
Ways to attack timing closureWays to attack timing closure
nn Iterate through SPEF or internallyIterate through SPEF or internallynn PostPost--placement optimization (ECO)placement optimization (ECO)nn Partition the design into smaller piecesPartition the design into smaller pieces
uu Variation in wire length will decreaseVariation in wire length will decreaseuu Better timing closure on each block if #gates < 50,000Better timing closure on each block if #gates < 50,000
nn GainGain--based synthesisbased synthesis
ASP-DAC'01 - Patrick Groeneveld III-14
Hierarchy: a solution?Hierarchy: a solution?
nn Make problems smallerMake problems smallernn Structure makes the problem Structure makes the problem
better manageablebetter manageablenn Solve subSolve sub--problems problems
independentlyindependentlynn Enables efficient reEnables efficient re--useusenn Enables consistent verificationEnables consistent verification
ASP-DAC'01 - Patrick Groeneveld III-15
Physical hierarchy and timing closurePhysical hierarchy and timing closure
nn Wires need to slalom around blocks or traverse Wires need to slalom around blocks or traverse through or over blocksthrough or over blocks
nn How to set pin locations?How to set pin locations?nn Where to put the buffers?Where to put the buffers?nn Automatic floor planning problem is unsolvedAutomatic floor planning problem is unsolvednn Large hidden inefficiencyLarge hidden inefficiency
ASP-DAC'01 - Patrick Groeneveld III-16
Physical hierarchy is a necessary evilPhysical hierarchy is a necessary evil
2,000,000 2,000,000 standard cells flat? standard cells flat?
macro
20 x 20 x approxapprox. 100,000 . 100,000 standard cellsstandard cells
macro
nn If you can do it, do it as flat If you can do it, do it as flat as possible! as possible!
nn Also do Also do datapathdatapath flatflat4 blocks of 500,0004 blocks of 500,000
standard cells?standard cells?
macro
ASP-DAC'01 - Patrick Groeneveld III-17
slack
Conventional layout synthesisConventional layout synthesis
slackCdream
s
Creal
ASP-DAC'01 - Patrick Groeneveld III-18
slack
GainGain--based synthesis: based synthesis:
Cdream
s
Creal
ASP-DAC'01 - Patrick Groeneveld III-19
Focus for timing closureFocus for timing closure
nn Combine logical and physical worlds.Combine logical and physical worlds.nn Crisp: focus on the main effect, skip irrelevant detailsCrisp: focus on the main effect, skip irrelevant detailsnn Enable blazingly fast optimizationEnable blazingly fast optimizationnn Compact: Memory efficient for tomorrow’s 50M gate chipCompact: Memory efficient for tomorrow’s 50M gate chip
ASP-DAC'01 - Patrick Groeneveld III-20
Good practices, bad practicesGood practices, bad practices
nn Use a simple model, and adapt reality to it. Use a simple model, and adapt reality to it. nn At each step, freeze a single constraint, postpone decisions on At each step, freeze a single constraint, postpone decisions on others.others.nn Allow sufficient freedom in future steps to fulfill all remaininAllow sufficient freedom in future steps to fulfill all remaining g
constraints.constraints.nn Bail out early if there’s no use continuingBail out early if there’s no use continuing
nn Fix multiple objectives at once.Fix multiple objectives at once.nn Iterate.Iterate.nn Indulge in ‘accurate’ modelsIndulge in ‘accurate’ modelsnn Attempt to be optimalAttempt to be optimal
ASP-DAC'01 - Patrick Groeneveld III-21
Is ‘Optimal’ optimal??, an exampleIs ‘Optimal’ optimal??, an example
Contacts in layout haveContacts in layout haveparasitic resistance and parasitic resistance and affect reliabilityaffect reliability
8 contacts8 contacts
Optimal Optimal with contact minimization:with contact minimization:
0 contacts 0 contacts
But…..But…..There are still 8 contacts!There are still 8 contacts!(they were just pushed (they were just pushed
into the neighboring regions)into the neighboring regions)
ASP-DAC'01 - Patrick Groeneveld III-22
CompromisesCompromises
nn FlexibleFlexible--die design is better:die design is better:uu Guarantee routing completionGuarantee routing completionuu Until the last moment we can tradeUntil the last moment we can trade--off delay for area.off delay for area.
nn FixedFixed--die design insteaddie design insteaduu Need to guess initial utilization.Need to guess initial utilization.
nn This could result in an iteration.This could result in an iteration.
ASP-DAC'01 - Patrick Groeneveld III-23
Simple delay model of a gateSimple delay model of a gate
nn Model transistor by a resistor and a switch Model transistor by a resistor and a switch nn We can assume that the riseWe can assume that the rise--delay and the fall delay are similar.delay and the fall delay are similar.nn Therefore pullTherefore pull--upup RuiRui and pulland pull--downdown RdiRdi become become RiRinn The transistor impedance depends on the transistor size (W/L)The transistor impedance depends on the transistor size (W/L)
Rgateinin inout
out
Cin CinCloadCgate
in outout
Cgate Cload
Rui
Rdi
nn CCin in : input capacitance of the gate: input capacitance of the gate
nn CCgategate : the internal parasitic capacitance (mostly diffusion): the internal parasitic capacitance (mostly diffusion)
nn CCload load : the external load that the gate is driving: the external load that the gate is driving
nn RRgategate : effective output impedance: effective output impedance
ASP-DAC'01 - Patrick Groeneveld III-24
Parasitic delay
Gate delay and loadGate delay and load
gategateloadgateabs CRCRd +=Rgatein
Cin
out
Cgate Cload
Cload
Cload
delayx
xx
x
Delay dependency on load is often given as table.
Cin
ASP-DAC'01 - Patrick Groeneveld III-25
Lets double the gate sizeLets double the gate size
in
Cin,0in out
out
Cgate,0 Cload
in out
inout
Cgate,0 Cload
0,
0,
0,
221
2
gategate
gategate
inin
CC
RR
CC
∗=
=
∗=
Rgate,0
Rgate,0
Rgate,0
Cgate,0Cin,0 Cin,0
0,0,0,
0,0,0,
22
22 gategateloadgate
absgategate
loadgate
abs CRCR
dCR
CR
d +=⇔∗+=
ASP-DAC'01 - Patrick Groeneveld III-26
Parasitic delay
Gate delay and sizeGate delay and sizenn Assume a gate sizing factor Assume a gate sizing factor α α
(=relative scaling towards smallest)(=relative scaling towards smallest)
Gate size
delay
xx
xx
0,0,0,
0,
0,
0,
1
gategateloadgate
abs
gategate
gategate
inin
CRCR
d
CC
RR
CC
+=
∗=
=
∗=
α
αα
αCload
Cload
Cload So keeping Cload constant results in:
Cload
ASP-DAC'01 - Patrick Groeneveld III-27
Delay and gainDelay and gain
nn The gain is the ratio of the The gain is the ratio of the input capacitance and the input capacitance and the load capacitance:load capacitance:
nn Now we can rewrite the Now we can rewrite the previous equations to in previous equations to in terms of gain:terms of gain:
Rgatein
Cin
out
Cgate Cload
in
load
CC
hgain ==
phgpC
Cgd
CRC
CCRd
CRCRdC
CRRR
CCCC
in
load
gategatein
loadingateabs
gategateloadgateabs
in
ingategategate
in
ininin
+∗=+∗=
⇔+=
⇒+=
==
=⇔∗=
0,0,0,0,
0,0,0,
0,0,
α
αα
ASP-DAC'01 - Patrick Groeneveld III-28
Making delay independent of loadMaking delay independent of load
nn If the gain is constant, delay is constant over a range!!If the gain is constant, delay is constant over a range!!
Cload
Size = Cin
Cload
delayCload
Cload
Cload
Cload
xx
x
x
x x x x
ASP-DAC'01 - Patrick Groeneveld III-29
Fixed Timing Methodology Fixed Timing Methodology
Delay
Load
Size
x
Fixed Timing plane
Timing Sign-off
CinCload
ASP-DAC'01 - Patrick Groeneveld III-30
Fixed Timing in a nutshell Fixed Timing in a nutshell
nn Goal:Goal:uu Correct by construction (eliminate iterations)Correct by construction (eliminate iterations)uu Emphasis on timing, not on size.Emphasis on timing, not on size.
nn Map to sizeMap to size--independent independent supercellssupercellsnn Pick optimized delay upPick optimized delay up--front = pick a gainfront = pick a gain
uu If no feasible gain can be found: change your RTLIf no feasible gain can be found: change your RTL
nn Fix this delay throughout placement and routingFix this delay throughout placement and routingnn Keep delay constant primarily by cell sizing.Keep delay constant primarily by cell sizing.
ASP-DAC'01 - Patrick Groeneveld III-31
“Fast circuit design on a napkin”“Fast circuit design on a napkin”
Fixed part,Fixed part,parasitic delayparasitic delay
Delay of theDelay of thegate + its loadgate + its load
Electrical effortElectrical effortproportional to output loadproportional to output load
CCloadload / C/ Cinin
Logical effortLogical effortdepends on depends on
function of gatefunction of gate
Delay = (g * h) + p Delay = (g * h) + p
Ivan Ivan SutherlandSutherland (1991):(1991):
CloadCin
For details: See the book: ‘Logical Effort’ by For details: See the book: ‘Logical Effort’ by SutherlandSutherland, , SproullSproull, Harris, HarrisMorgan Morgan Kaufmann Kaufmann publishers, ISBN 1publishers, ISBN 1--5586055860--557557--66
ASP-DAC'01 - Patrick Groeneveld III-32
Logical effort: gLogical effort: g
nn To keep the same output drive strength, the 2 nTo keep the same output drive strength, the 2 n--transistors in series transistors in series must double their size.must double their size.
nn As a result, the input capacitance of the As a result, the input capacitance of the nandnand is larger.is larger.nn For the same output drive strength, an inverter needs less inputFor the same output drive strength, an inverter needs less input
capacitance: the inverter has a higher gain. capacitance: the inverter has a higher gain. nn More complex gates have less gainMore complex gates have less gain
Inverter: Cin = 1Inverter: Cin = 1Nand 2: Cin = 4/3Nand 2: Cin = 4/3
ASP-DAC'01 - Patrick Groeneveld III-33
Logical effort: gLogical effort: g
nn Assuming that in static Assuming that in static CMOS gates the mobility of CMOS gates the mobility of the pthe p--transistor is half of the transistor is half of the nn--mobility:mobility:
Gate 1 2 3 n
Inverter 1 - - -
Nand - 4/3 5/3 (n+2)/3
Nor - 5/4 7/3 (2n+1)/3
nn 33--input norinput nor
ASP-DAC'01 - Patrick Groeneveld III-34
p: The parasitic delayp: The parasitic delay
nn Independent of size and loadIndependent of size and loadnn Dependent on process and logic functionDependent on process and logic functionnn Can be ignored during optimizationCan be ignored during optimization
0,0, gategate CRp =
Same input cap Cin
Then p2nand = 2pinverter
x x
xxx
x
Gate Relative Parasiticdelay
Inverter 1
n-input nand n
n-input nor n
ASP-DAC'01 - Patrick Groeneveld III-35
Putting it togetherPutting it together
Parasitic delay: p
Effort delay: g*h
1
2
3
4
5
6
7
1 2 3
h: Electrical effort = gain
d: n
orm
aliz
ed to
inve
rter
Inverter: g=1, p=12-input n
or: g=
5/3, p=
2
phgd += *2-input
nand : g=
4/3, p=
2
4-inp
ut no
r: g=9
/3, p
=4
CinCload
h = Cload/ Cin
ASP-DAC'01 - Patrick Groeneveld III-36
Optimizing speed Optimizing speed
nn Goal: Drive load as Goal: Drive load as fastfast as possibleas possibleuu What is the optimal number of stages What is the optimal number of stages nn ??uu What is the size ratio of the gates?What is the size ratio of the gates?
Cload
Cin
ASP-DAC'01 - Patrick Groeneveld III-37
Tune for Tune for maximummaximum speedspeed
nn Mead and Conway (1980), ignoring parasitic delayMead and Conway (1980), ignoring parasitic delay
gainstageisize
isizeC
CC
CHh
stagesofnumberHn
gaintotalC
CC
CH
iin
iin
iin
iloadni
in
nload
in
load
_71.2)(
)1(
__)ln(
_
,
1,
,
,
1,
,
==+≈===
==
===
+
Cload
Cin,1 Cin,2 Cin,nCin,3
nn With the parasitic delay p, the optimum ratio is 3.59With the parasitic delay p, the optimum ratio is 3.59
ASP-DAC'01 - Patrick Groeneveld III-38
Maximum speed….Maximum speed….
ASP-DAC'01 - Patrick Groeneveld III-39
Tune a path for maximum speedTune a path for maximum speed
aa bb
nn Maximum speed is obtained if effort delay f=(g*h) is Maximum speed is obtained if effort delay f=(g*h) is the same for each stage.the same for each stage.
nn The optimal effort delay is f = 3.59 The optimal effort delay is f = 3.59 nn The more complex the gate, the more capacitance The more complex the gate, the more capacitance
will be propagated backwards.will be propagated backwards.
59.3)*45( =a
in
aload
CC
cc
59.3)*36( =b
in
bload
CC
59.3)*37( =c
in
cload
CC
20=cloadc
1359.3
20*37*
===fcg
Ccloadcc
in2.759.3
13*36
==binC5.2
59.3
2.7*45
==ainC
ASP-DAC'01 - Patrick Groeneveld III-40
Choosing the right number of stages Choosing the right number of stages (logical depth)(logical depth)
nn During layout: Adding inverters for longDuring layout: Adding inverters for long--wire delay minimization.wire delay minimization.nn The optimum depth depends on the The optimum depth depends on the path effortpath effort and process and process
parameters.parameters.nn Not very critical: being 50% off results in less than 10% delay Not very critical: being 50% off results in less than 10% delay penalty penalty
nn Logic depth is determined by synthesisLogic depth is determined by synthesisnn prepre--layout: Adding buffers to highlayout: Adding buffers to high--fanoutfanout nets generally improves speed nets generally improves speed
due to the high inverter gain. due to the high inverter gain.
ASP-DAC'01 - Patrick Groeneveld III-41
Assigning delaysAssigning delays
nn Timing constraints determine the delay budget:Timing constraints determine the delay budget:uu e.g.e.g. ddabcdabcd < 2.0ns,< 2.0ns, ddeded < 2.0ns, < 2.0ns, ddfcdfcd < 2.0ns< 2.0ns
nn Spread delay budgets evenly over all pathsSpread delay budgets evenly over all pathsuu If paths collide, take the smallest delay budgetIf paths collide, take the smallest delay budgetuu Relax othersRelax others
nn Translate delay budgets into gain.Translate delay budgets into gain.
ffff
bb ccdd
ee
ff
aa
0.50.50.50.50.50.50.50.5
1.01.00.660.66
0.660.66
0.66 0.66 --> 1.0> 1.0
1.0 1.0 --> 1.5> 1.5
ASP-DAC'01 - Patrick Groeneveld III-42
PrePre--layout signlayout sign--offoff0.5ns 0.5ns 0.5ns 0.5ns
FFffffffff
nn If there is no feasible gain assignment, the sizes If there is no feasible gain assignment, the sizes literally ‘explode’. literally ‘explode’.
ASP-DAC'01 - Patrick Groeneveld III-43
Keeping delay constant during layout Keeping delay constant during layout
nn The gain ratio (=The gain ratio (=CloadCload//CinCin) is maintained is placement) is maintained is placementnn Sizes change Sizes change duringduring placement.placement.nn As a result, delay is (almost) constantAs a result, delay is (almost) constantnn Sizes cannot ‘explode’Sizes cannot ‘explode’
Cload/Cin = fixed
ASP-DAC'01 - Patrick Groeneveld III-44
Sizing driven placementSizing driven placement
nn Gate sizes change gradually during placement to keep Gate sizes change gradually during placement to keep delay constant.delay constant.
nn Placer much be able to cope with the net list changes Placer much be able to cope with the net list changes due to buffering, cloning, restructuring, clock insertion, due to buffering, cloning, restructuring, clock insertion, etc.etc.
nn .. while producing a routable result. .. while producing a routable result.
ASP-DAC'01 - Patrick Groeneveld III-45
Automatic Automatic Congestion HandlingCongestion Handlingnn During placementDuring placement
Routing Congestion Utilization
Routing Congestion Utilization
ASP-DAC'01 - Patrick Groeneveld III-46
What happenedWhat happened
…. at the logical…. at the logical--physical boundary?physical boundary?
nn Delay fixedDelay fixednn Cell Area unknownCell Area unknownnn Sum of areas determines Sum of areas determines
chip size. (Additive)chip size. (Additive)nn No iterations requiredNo iterations requirednn Each gate has exactly the Each gate has exactly the
right drive strength:right drive strength:uu Not too little (fanout Not too little (fanout
violation, timing fails)violation, timing fails)uu Not too much (waste of Not too much (waste of
area)area)
nn Cell Area fixedCell Area fixednn Delay is a gambleDelay is a gamblenn Worst case delay Worst case delay
determines timing (max)determines timing (max)nn Iterate to make ends meet.Iterate to make ends meet.nn After timing finally closes, After timing finally closes,
many gates will be too big:many gates will be too big:uu waste of areawaste of areauu waste of powerwaste of power
ASP-DAC'01 - Patrick Groeneveld III-47
Conventional way:Conventional way:Worst case delay sets timingWorst case delay sets timing
nn 99% of paths meets timing, 99% of paths meets timing, 1% does not1% does not
nn Cell sizes do not change Cell sizes do not change during Place and Routeduring Place and Route
nn Design conservatively to avoid Design conservatively to avoid excessive iterations. Also excessive iterations. Also WLM is tuned conservatively. WLM is tuned conservatively.
nn This This oversizesoversizes all cellsall cellsuu because also cells on nonbecause also cells on non--
critical paths are sized up.critical paths are sized up.
nn Chip significantly bigger than Chip significantly bigger than necessary (10necessary (10--30%)30%)
ASP-DAC'01 - Patrick Groeneveld III-48
What about InWhat about In--place optimization?place optimization?
nn Do a postDo a post--placement ECO,placement ECO,nn Change only the cells on the Change only the cells on the
critical paths.critical paths.
nn Conservatism is still required Conservatism is still required because of limited ECO because of limited ECO capacity.capacity.
nn All nonAll non--critical cells are still critical cells are still oversizedoversized
nn Chip still bigger than Chip still bigger than necessary.necessary.
ASP-DAC'01 - Patrick Groeneveld III-49
Gain based synthesis: area is additiveGain based synthesis: area is additive
nn Timing is fixed, Timing is fixed,
nn As a result, cell sizes change. As a result, cell sizes change.
nn But large cells and small cells But large cells and small cells cancel out: some get bigger, cancel out: some get bigger, others smallerothers smaller
nn All cells have exactly the right All cells have exactly the right drive strength: many paths are drive strength: many paths are almost critical. almost critical.
nn Chip size remains small (10Chip size remains small (10--30% smaller than conventional 30% smaller than conventional way)way)
ASP-DAC'01 - Patrick Groeneveld III-50
Logic (Logic (wireloadwireload) Synthesis) Synthesis
nn For a simple function ( (A’ + B) * C ) `For a simple function ( (A’ + B) * C ) `nn Various logic structures are possible with one sizeVarious logic structures are possible with one size
nn Conventional logic synthesis tool attempts to Conventional logic synthesis tool attempts to optimize the delay by:optimize the delay by:uu Logic restructuringLogic restructuringuu Picking the proper sizes Picking the proper sizes
nn This is driven by a vague idea of the wire loadThis is driven by a vague idea of the wire load
ASP-DAC'01 - Patrick Groeneveld III-51
Many sizing combinationsMany sizing combinations
Heuristics tradeoffs --significantly slower than equation-based constant delay
ASP-DAC'01 - Patrick Groeneveld III-52
GainGain--based synthesis: based synthesis: supercellssupercells
nn Need a single ‘super’ cell representing all sizes in a logic funNeed a single ‘super’ cell representing all sizes in a logic function. ction.
Super!
nn Contains:Contains:uu g, h, pg, h, puu sizesize--range range
ASP-DAC'01 - Patrick Groeneveld III-53
GainGain--based mappingbased mapping
nn In timingIn timing--critical parts, thecritical parts, the mappermapper picks super cells picks super cells that have low parasitic delay and highest maximum that have low parasitic delay and highest maximum drive strength.drive strength.
nn In nonIn non--critical parts, ‘weaker’ super cells can be used.critical parts, ‘weaker’ super cells can be used.uu Pick cells that have potentially the smallest size. Pick cells that have potentially the smallest size.
nn Insert buffers on highInsert buffers on high--fanout fanout netsnets
ASP-DAC'01 - Patrick Groeneveld III-54
Putting it togetherPutting it together
nn Map onto generic ‘super cells’ with flexible area.Map onto generic ‘super cells’ with flexible area.nn Optimize gains for all super cells such that maximum speed is Optimize gains for all super cells such that maximum speed is
achieved. achieved. This fixes all delays in the circuit!This fixes all delays in the circuit!nn Give upGive up if the (optimally conditioned) circuit does not meet the given if the (optimally conditioned) circuit does not meet the given
timing criteria.timing criteria.nn Perform ‘sizing driven placement’: keep delay constant by adaptiPerform ‘sizing driven placement’: keep delay constant by adapting cell ng cell
size to parasitic capacitance of the wires. Parasitic wire delaysize to parasitic capacitance of the wires. Parasitic wire delay is based is based on coarse routing of the wires.on coarse routing of the wires.
nn Fix remaining timing problems through buffering, cloning, restruFix remaining timing problems through buffering, cloning, restructuring.cturing.nn Update floor plan if the timing is still not met.Update floor plan if the timing is still not met.nn For each For each supercellsupercell, pick the one standard cell that matches the , pick the one standard cell that matches the
required drive strength.required drive strength.nn Legalize the placement (a.k.a detailed placement)Legalize the placement (a.k.a detailed placement)nn Perform final routing under delay constraints.Perform final routing under delay constraints.
ASP-DAC'01 - Patrick Groeneveld III-55
That’s very nice in theory, but….That’s very nice in theory, but….
nn Library only has a few drive strengths: is there aLibrary only has a few drive strengths: is there adescretizationdescretization error?error?
nn How to account for differences in fall and rise time?How to account for differences in fall and rise time?nn Do I need a special library?Do I need a special library?nn What if a very large drive strength is needed?What if a very large drive strength is needed?nn When are buffers inserted?When are buffers inserted?nn Isn’t the model too simplistic?Isn’t the model too simplistic?nn What about the parasitic wire resistance?What about the parasitic wire resistance?
ASP-DAC'01 - Patrick Groeneveld III-56
Library AnalysisLibrary Analysis/cmos18/NAND2 (A /cmos18/NAND2 (A --> Z) inverting> Z) invertingmodelmodel hidehide typtyp loadload gaingain input capinput cap areaarea rise delayrise delay fall delayfall delay slewslew max slewmax slew---------------------- -------- ---------------- -------- ------------------ -------- -------------------- -------------------- -------- ----------------NAND2d1NAND2d1 2525 2.512.51 1010 11 161161 102102 6666 20002000NAND2d2NAND2d2 5454 2.712.71 2020 11 153153 100100 6767 20002000NAND2d3NAND2d3 110110 2.692.69 4141 22 153153 100100 6767 20002000NAND2d4NAND2d4 186186 2.662.66 7070 55 153153 9999 6767 20002000NAND2d5NAND2d5 DD 370370 18.5218.52 2020 99 254254 293293 5757 20002000---------------------- -------- ---------------- -------- ------------------ -------- -------------------- -------------------- -------- ----------------NAND2_SUPERNAND2_SUPER 370370 2.742.74 148148 108108 6767 20002000
nn Gain is averagedGain is averagednn Toss out ‘weird cells’Toss out ‘weird cells’nn Typical load is the load the gate Typical load is the load the gate
drives when optimized for maximum drives when optimized for maximum speed: g*h =3.59speed: g*h =3.59
Cload
Cin
d1
d2
d4
d3
d5
ASP-DAC'01 - Patrick Groeneveld III-57
Fixing cell sizes & keeping timingFixing cell sizes & keeping timing
Standard CellSuperCell
1x
2x
4x
Cload
Cin
1x
2x4x
Permissiblerange
Load violation
ASP-DAC'01 - Patrick Groeneveld III-58
The The discretizationdiscretization error...error...Gain=0.3Gain=0.3 Gain=0.9Gain=0.9
1x1x 2x2x2x2x
2.2x2.2x2.9x2.9x
4x4x
1.3x1.3x1.2x1.2x
Gain=0.7Gain=0.7 Gain=0.9Gain=0.9
ASP-DAC'01 - Patrick Groeneveld III-59
.. is generally not a big problem.. is generally not a big problem
nn Delay versus size curve is Delay versus size curve is flat, because the size is flat, because the size is optimized for maximum optimized for maximum speedspeed
nn Rounding error is absorbed Rounding error is absorbed by appropriate upby appropriate up-- and and downsizing of surrounding downsizing of surrounding cells. cells.
nn On critical paths, buffer On critical paths, buffer insertion and logic insertion and logic restructuring minimize effect.restructuring minimize effect.
Optimum delay at 3.2x,but size is not available
size
Pathdelay
2x 4x1x
x
xx
ASP-DAC'01 - Patrick Groeneveld III-60
Load violationsLoad violations
nn Maximum drive strength in the library might be too smallMaximum drive strength in the library might be too smallnn Drive information is stored in super cell, and managed preDrive information is stored in super cell, and managed pre--placement.placement.nn Buffering, cloning and restructuring are used to maintain delayBuffering, cloning and restructuring are used to maintain delay during during
placementplacement
Cload
Cin
1x
2x4x
Permissiblerange
Load violation
ASP-DAC'01 - Patrick Groeneveld III-61
Buffered wire: smallest delayBuffered wire: smallest delay
nn Delay per stage (Delay per stage (elmoreelmore):):
nn Optimum buffer distance:Optimum buffer distance:
nn Optimum buffer size:Optimum buffer size:
20CR
Cw
w
wopt
τ=
ww
bufferopt CR
pL
)1(2 +=
τ
wLCR2
LCR)wCLC(
wR
d 0w2
ww0w
0 +++=
ASP-DAC'01 - Patrick Groeneveld III-62
Buffering in a typical 0.25 Buffering in a typical 0.25 µµm processm process
nn Optimum buffer distance tends to be around 2000 Optimum buffer distance tends to be around 2000 µµm.m.nn This works out to an area of 4mmThis works out to an area of 4mm22, or about 10, or about 10--20K cells.20K cells.nn ButBut wwoptopt is is muchmuch larger then what most libraries have available:larger then what most libraries have available:
W (buffer size)
Delaypermicron
50x 100x25x
Optimal at 80x
75x
Range of availabledrive strengths in the
library
ASP-DAC'01 - Patrick Groeneveld III-63
Library constrains performanceLibrary constrains performance
nn Limited drive strength in standard cell libraries results Limited drive strength in standard cell libraries results in significantly longer delays at the chipin significantly longer delays at the chip--level.level.
nn This is true for This is true for ANY ANY methodology, and not exclusive methodology, and not exclusive to gainto gain--based synthesis.based synthesis.
nn Reason for limited drive strength:Reason for limited drive strength:uu Concerns about signalConcerns about signal electromigrationelectromigration..uu Router doesn’t handle wide wires.Router doesn’t handle wide wires.uu Huge cells (20x a ‘normal’ cell) frustrates placer. Huge cells (20x a ‘normal’ cell) frustrates placer. uu Folklore.Folklore.
ASP-DAC'01 - Patrick Groeneveld III-64
Parallel cellsParallel cells
nn A simple way to test whether a better library would A simple way to test whether a better library would improve results:improve results:
nn Issues:Issues:uu testabilitytestabilityuu signalsignal--EMEMuu congestion: detailed placercongestion: detailed placer
ASP-DAC'01 - Patrick Groeneveld III-65
ElectromigrationElectromigration: wires wear out: wires wear out
Electrons move atomsElectrons move atoms
Contact(tungsten)
‘reservoir’
‘End-of-line’overhang
‘Cavities’ in wire
ASP-DAC'01 - Patrick Groeneveld III-66
Dealing withDealing with ElectromigrationElectromigration
nn A statistical effect, resulting in a gradual increase of the wirA statistical effect, resulting in a gradual increase of the wire e resistance, followed by failure.resistance, followed by failure.
nn The time that 50% of the wires fail is given by::The time that 50% of the wires fail is given by::
kTE
f
a
eJ
At−
= *1
* 2
nn Depends on the current density JDepends on the current density Juu Wider wires would helpWider wires would help
nn Exponential dependency on temperature makes it hard to Exponential dependency on temperature makes it hard to predict.predict.
nn Wires selfWires self--heat due to resistance heat due to resistance
ASP-DAC'01 - Patrick Groeneveld III-67
What makes a good DSM library?What makes a good DSM library?
nn Many drive strengths per functionMany drive strengths per functionuu No functions with few drive strengthsNo functions with few drive strengthsuu No holes or missing drive strengthsNo holes or missing drive strengthsuu Also have drive strengths for flipAlso have drive strengths for flip--flops and latchesflops and latches
nn High drive strengthsHigh drive strengthsnn Linear scaling of load and areaLinear scaling of load and area
uu avoid multiavoid multi--stage cellsstage cells
nn Avoid multiAvoid multi--output cellsoutput cellsnn Avoid single stage gates with more than 4 inputsAvoid single stage gates with more than 4 inputsnn Not many different functions are needed.Not many different functions are needed.
ASP-DAC'01 - Patrick Groeneveld III-68
Buffering & wire sizingBuffering & wire sizing
nn To tame the quadratic nature of wire delayTo tame the quadratic nature of wire delaynn To avoid load violationsTo avoid load violations
nn A static timer is run concurrently during (incremental) A static timer is run concurrently during (incremental) placementplacement
nn Wire delay is estimated based on the most accurate Wire delay is estimated based on the most accurate information available at the time:information available at the time:
uu Elmore Elmore I (based on I (based on steinersteiner tree)tree)uu Elmore Elmore II (based on global routing)II (based on global routing)uu 2nd order AWE (post routing)2nd order AWE (post routing)
nn Buffers are inserted where neededBuffers are inserted where neededuu After buffer insertion the gains need to be reAfter buffer insertion the gains need to be re--distributeddistributed
ASP-DAC'01 - Patrick Groeneveld III-69
Wire delay optimizationWire delay optimization
nn Delay after optimization:Delay after optimization:
FF bufferingbuffering, ,
FF cell sizingcell sizingFF wire sizingwire sizing..
nn 0.18 micron technology0.18 micron technologynn ∆∆ Wire length 64x results in Wire length 64x results in nn ∆∆ Delay < 3xDelay < 3x
10
100
1000
100 1000 10000Wire Length(um)
Del
ay (
ps)
Data courtesy of Prof. JasonCong, UCLA
ASP-DAC'01 - Patrick Groeneveld III-70
Logic cloning and restructuringLogic cloning and restructuring
nn To keep timing fixed by adapting the reality to the To keep timing fixed by adapting the reality to the modelmodel
nn Restructuring and rewiring of the critical path Restructuring and rewiring of the critical path improves timing.improves timing.
ASP-DAC'01 - Patrick Groeneveld III-71
Gain based synthesis flowGain based synthesis flow
nn Timing analysis tool runs Timing analysis tool runs concurrently during all stepsconcurrently during all steps
nn Strong infrastructure is Strong infrastructure is necessarynecessary
nn Backend (routing) must Backend (routing) must make this come true make this come true
Sizing-driven placementbuffering
cloning, restructuringclock insertion
RTL
OK?
Scan insertiondetailed placement
track routingdetailed routing
Logic mappingGain assignment
OK?
GDSII
Library analysisBuild supercells
Delays fixed, sized floating
Delays fixed, Sizes fixed
ASP-DAC'01 - Patrick Groeneveld III-72
ObjectivesObjectives
nn Implement wire pattern that is:Implement wire pattern that is:uu LVSLVS--correct: no shorts nor unconnectscorrect: no shorts nor unconnectsuu DRCDRC--correct, includes electromigration and correct, includes electromigration and
antenna rulesantenna rulesuu correct: adapt model to realitycorrect: adapt model to realityuu Deals with special requirements for power and Deals with special requirements for power and
clock routingclock routing
ASP-DAC'01 - Patrick Groeneveld III-73
Correct by Construction orCorrect by Construction orConstruct by Correction??Construct by Correction??
nn Traditional tools are primarily focused on completion:Traditional tools are primarily focused on completion:uu Correct by construction for LVS and DRC, but not for timing!Correct by construction for LVS and DRC, but not for timing!uu Timing violations addressed by ripTiming violations addressed by rip--upup--andand--reroute, I.e. ‘construct reroute, I.e. ‘construct
by correction’.by correction’.
nn Modern EDA flows should target ‘correct by construction’ Modern EDA flows should target ‘correct by construction’ for timing: for timing: uu careful planning for timing budget and careful planning for timing budget and
uu variable spacing and widthvariable spacing and width detailed routing.detailed routing.
ASP-DAC'01 - Patrick Groeneveld III-74
Global routingGlobal routing
Bucket
Finds coarse path and layerFinds coarse path and layerassignment for each net, such that:assignment for each net, such that:
wire density is spread evenlywire density is spread evenly
ASP-DAC'01 - Patrick Groeneveld III-75
Interconnect speedInterconnect speed
ground plane
top viewdlat dlat
w
h
dox
l
CCwirewire = C= C00 * ((l * w)/* ((l * w)/ddoxox + (2 * l * h)/+ (2 * l * h)/ddlatlat) = ) = CCwirewire,,gnd gnd + + CCwirewire,,latlat
Consider the middle wire:Consider the middle wire:RRwirewire = R= R00 * l/(w * h)* l/(w * h)
groundground laterallateral
τwire = Rwire * Cwire = quadratic with length l
ASP-DAC'01 - Patrick Groeneveld III-76
ApplyingApplying Moore’sMoore’s lawlaw
nn Double the density by a lateral shrink:Double the density by a lateral shrink:uu l, w andl, w and ddlatlat shrink by factorshrink by factor sqrtsqrt(2)(2)
ground plane
dlat dlath
dox
w
CCwirewire = C= C00 * ((* ((l l * * ww)/)/ddoxox + (2 * + (2 * ll * h)/* h)/ddlatlat))
RRwirewire = R= R00 * * ll/(/(ww * h) = constant* h) = constant
ground = halfground = half lateral = constantlateral = constant
ASP-DAC'01 - Patrick Groeneveld III-77
Speedup due to shrinkSpeedup due to shrink
Cgate
Rgate Rwire
Cwire
unchangedunchanged
halfhardly smaller
… speedup with lateral capacitance is down to 1 instead of factor 2 (without)
ASP-DAC'01 - Patrick Groeneveld III-78
Lateral capacitance is worse!Lateral capacitance is worse!
effectively 2 x Clat
This is the miller effect
ASP-DAC'01 - Patrick Groeneveld III-79
CrosstalkCrosstalk Noise on wiresNoise on wires
nn The size of the cross talk capacitorThe size of the cross talk capacitornn Slope of the aggressorSlope of the aggressornn Threshold voltageThreshold voltagenn Ratio between victim and aggressor output resistance'sRatio between victim and aggressor output resistance's
Cross talk causes noise, which depends on:Cross talk causes noise, which depends on:
Cgate
Rgate CCwirewire,,latlat
ASP-DAC'01 - Patrick Groeneveld III-80
Track Routing: maintaining timingTrack Routing: maintaining timing
nn Refines the global routing by fixing track positionsRefines the global routing by fixing track positionsnn Timing is a given constraint: satisfy crosstalk by spacing apartTiming is a given constraint: satisfy crosstalk by spacing apart
‘unfriendly’ wires. ‘Friendliness’ data is given by timer.‘unfriendly’ wires. ‘Friendliness’ data is given by timer.nn Use shielding for clocks, spacing or shielding for signal wires.Use shielding for clocks, spacing or shielding for signal wires.
Spacing between unfriendlynets is enlarged to meetload budget.
ASP-DAC'01 - Patrick Groeneveld III-81
“Common Database” Architecture“Common Database” Architecture
Timingalgorithm
Database,translators
(on hard disk)
TOOL 1Data Model
Extractionalgorithm
...TOOL n
Data Model
Placementalgorithm
TOOL 2Data Model
Routingalgorithm
TOOL 3Data Model
nn Each tool has its own data Each tool has its own data representation. Design data is representation. Design data is shared by:shared by:
uu reading/writing (huge) files.reading/writing (huge) files.uu Data management layer Data management layer
controls access to files and controls access to files and convert formatsconvert formats
nn Great for “integrating” many Great for “integrating” many separate tools.separate tools.
nn Makes realMakes real--time sharing of data time sharing of data slow and inefficient.slow and inefficient.
ASP-DAC'01 - Patrick Groeneveld III-82
Infrastructure is keyInfrastructure is key
In-coreData Model
PlacementAlg.
RoutingAlg.
Tool nAlg.
...
TCLaccess
TimingAlg.
nn Tools share a common Tools share a common data structure. They run data structure. They run directly on it.directly on it.
nn Let all design data lives Let all design data lives “in core” during the flow, “in core” during the flow, attached to data attached to data structure.structure.
nn Use only one format: the Use only one format: the data structuredata structure
GUIaccess
VerificationAlg.
Volcano on disk
Externalformats
ASP-DAC'01 - Patrick Groeneveld III-83
Track ReTrack Re--orderingordering
nn Crosstalk aware wire ordering during routingCrosstalk aware wire ordering during routingnn Based on timing windowsBased on timing windows
ET LT
ET LT
ET LT
NET B
NET A
NET C
ET LT
ET LT
ET LT
NET A
NET C
NET B
ASP-DAC'01 - Patrick Groeneveld III-84
How to get timing closure?How to get timing closure?
nn Good placements and floor plansGood placements and floor plansuu FloorplanningFloorplanning is a hard and unsolved problemis a hard and unsolved problem
nn Let the computer do the work for youLet the computer do the work for youuu If you have no clue about the floor plan: flatten it!If you have no clue about the floor plan: flatten it!
nn EDA tool needs to:EDA tool needs to:uu Have massive capacityHave massive capacityuu Have a transparent data modelHave a transparent data model
nn Relaxing some parameters could help dramatically.Relaxing some parameters could help dramatically.
KdomainKdomain
3.2M gates3.2M gates
OdomainOdomain
2.5M gates2.5M gates
T1T1812K 812K
gatesgates
T2T2
2.1M 2.1M
gatesgates
3-D labs design
0.18u 266Mhz
ASP-DAC'01 - Patrick Groeneveld III-86
ASP-DAC'01 - Patrick Groeneveld III-87
SummarySummary
nn The gain based synthesis model proves excellent for The gain based synthesis model proves excellent for the logic to layout conversion.the logic to layout conversion.
nn Timing is more important than actual gate size: Timing is more important than actual gate size: therefore delays is fixed before size.therefore delays is fixed before size.
nn The simplicity of the model allows scaling to larger The simplicity of the model allows scaling to larger chips (millions of chips (millions of placeable placeable objects).objects).