The Scaling Challenge: Can Correct-by-Construction Design Help?

26
The Scaling Challenge: The Scaling Challenge: Can Correct-by-Construction Design Can Correct-by-Construction Design Help? Help? Prashant Saxena Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Noel Menezes Pasquale Cocchini Desmond Kirkpatrick Kirkpatrick Intel Labs (CAD Research) Intel Labs (CAD Research) Hillsboro OR Hillsboro OR International Symposium on Physical Design International Symposium on Physical Design Monterey, CA Monterey, CA Apr 16, 2003 Apr 16, 2003

description

The Scaling Challenge: Can Correct-by-Construction Design Help?. Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Kirkpatrick Intel Labs (CAD Research) Hillsboro OR International Symposium on Physical Design Monterey, CA Apr 16, 2003. - PowerPoint PPT Presentation

Transcript of The Scaling Challenge: Can Correct-by-Construction Design Help?

Page 1: The Scaling Challenge: Can Correct-by-Construction Design Help?

The Scaling Challenge:The Scaling Challenge:Can Correct-by-Construction Design Can Correct-by-Construction Design

Help?Help?

Prashant Saxena Prashant Saxena Noel Menezes Pasquale Cocchini Desmond KirkpatrickNoel Menezes Pasquale Cocchini Desmond Kirkpatrick

Intel Labs (CAD Research)Intel Labs (CAD Research)

Hillsboro ORHillsboro OR

International Symposium on Physical DesignInternational Symposium on Physical Design

Monterey, CAMonterey, CA

Apr 16, 2003Apr 16, 2003

Page 2: The Scaling Challenge: Can Correct-by-Construction Design Help?

22

ISPD’03ISPD’03

Repeaters, which are already a Repeaters, which are already a full-chip headache, will become full-chip headache, will become

critical at the block level alsocritical at the block level also

Page 3: The Scaling Challenge: Can Correct-by-Construction Design Help?

33

ISPD’03ISPD’03

OutlineOutline

Some scaling experimentsSome scaling experiments– Spice simulationsSpice simulations

Implications for post-RTL designImplications for post-RTL design Correct-by-Construction (CbC) designCorrect-by-Construction (CbC) design

–What’s the promise? What’s missing?What’s the promise? What’s missing?

Page 4: The Scaling Challenge: Can Correct-by-Construction Design Help?

44

ISPD’03ISPD’03

A Scaling PrimerA Scaling Primer

Process scaling:Process scaling:– Devices shrink 0.7x, delay 0.7x Devices shrink 0.7x, delay 0.7x

– Wires shrink 0.7xWires shrink 0.7x– R/R/ increases 2x, C/ increases 2x, C/ unchanged unchanged

– So, (delay/scaled So, (delay/scaled increases 1.4x increases 1.4x

Block area often stays sameBlock area often stays same– # cells, # nets doubles# cells, # nets doubles

– Wiring histogram shape invariantWiring histogram shape invariant

SS

GG

DD

Page 5: The Scaling Challenge: Can Correct-by-Construction Design Help?

55

ISPD’03ISPD’03

Critical Repeater LengthsCritical Repeater Lengths

Optimally-sized Optimally-sized uniformly for min delay uniformly for min delay

– Min distance at which Min distance at which inserting a repeater speeds inserting a repeater speeds up the lineup the line

““Ideally shrunk” circuit Ideally shrunk” circuit requires additional requires additional repeaters repeaters (0.7x (0.7x vs vs 0.57x)0.57x)

90nm 65nm 45nm 32nm

M3M60

0.2

0.4

0.6

0.8

1

Relative Critical

Repeater Length

0.57x0.57x

586.0ss In line with scaling theory:In line with scaling theory:

Page 6: The Scaling Challenge: Can Correct-by-Construction Design Help?

66

ISPD’03ISPD’03

Critical Sequential LengthsCritical Sequential Lengths Optimized for max Optimized for max

distance in one clock distance in one clock periodperiod

Assumes:Assumes: – 2x frequency scaling, 5GHz on 90nm2x frequency scaling, 5GHz on 90nm

– Ignores setup, hold, skewIgnores setup, hold, skew

““Ideally shrunk” circuit: Ideally shrunk” circuit: – Requires Requires muchmuch new wire new wire

pipeliningpipelining (0.7x (0.7x vsvs 0.43x) 0.43x)

– Ratio of regular to clocked Ratio of regular to clocked repeaters decreasingrepeaters decreasing

90nm 65nm 45nm 32nm

M3M60

1

2

3

4

5

6

7

Relative Critical

Seq. Length

0.43x0.43x

90nm 65nm 45nm 32nm

0

1

2

3

4

5

6

7

# rep. between

FFs

0.75x0.75x

Page 7: The Scaling Challenge: Can Correct-by-Construction Design Help?

77

ISPD’03ISPD’03

1

10

100

1000

10000

100000

0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Normalized wirelengthNormalized wirelength

# W

ires

(90

nm

)#

Wir

es (

90n

m)

1

10

100

1000

10000

100000

0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Block Wiring Histogram and Block Wiring Histogram and Critical Repeater LengthsCritical Repeater Lengths

Critical lengths migrating rapidly to the left… Critical lengths migrating rapidly to the left… (zoomed view coming up)(zoomed view coming up)

Normalized wirelengthNormalized wirelength

# W

ires

(90

nm

)#

Wir

es (

90n

m)

45nm32nm

65nmM6M3

Metal Process90nm

Page 8: The Scaling Challenge: Can Correct-by-Construction Design Help?

88

ISPD’03ISPD’03

# w

ires

(90

nm

)#

wir

es (

90n

m)

Normalized WirelengthNormalized Wirelength

Block Wiring Histogram: Block Wiring Histogram: Zoomed ViewZoomed View

Increasingly steep slope of curve Increasingly steep slope of curve (log scale)(log scale) => # impacted nets exploding! => # impacted nets exploding!

Critical Repeater Lengths

1

10

100

1000

10000

100000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Normalized WirelengthNormalized Wirelength

# W

ires

(90

nm

)#

Wir

es (

90n

m)

M6M3

Metal Process90nm65nm45nm32nm

Page 9: The Scaling Challenge: Can Correct-by-Construction Design Help?

99

ISPD’03ISPD’03

PSC/bus1p Wiring HistogramCritical Sequential Distances

1

10

100

1000

10000

100000

0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

#wir

es (

90n

m)

Normalized Wirelength

Block Wiring Histogram and Block Wiring Histogram and Critical Sequential LengthsCritical Sequential Lengths

# pipelined nets growing from negligible (90nm) to substantial (32nm)# pipelined nets growing from negligible (90nm) to substantial (32nm)

PSC/bus1p Wiring HistogramCritical Sequential Distances

1

10

100

1000

10000

100000

0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

MetalM6M3

Process90nm65nm45nm32nm

#wir

es (

90n

m)

#wir

es (

90n

m)

Normalized Normalized WirelengthWirelength

PSC/bus1p Wiring HistogramCritical Sequential Distances

1

10

100

1000

10000

100000

0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

MetalM6M3

Process90nm65nm45nm32nm

#wir

es (

90n

m)

#wir

es (

90n

m)

Normalized Normalized WirelengthWirelength

PSC/bus1p Wiring HistogramCritical Sequential Distances

1

10

100

1000

10000

100000

0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

#wir

es (

90n

m)

#wir

es (

90n

m)

Normalized Normalized WirelengthWirelength

MetalMetalM6M3

ProcessProcess90nm65nm45nm32nm

#wir

es (

90n

m)

#wir

es (

90n

m)

Normalized Normalized WirelengthWirelength

#wir

es (

90n

m)

Normalized Wirelength

Page 10: The Scaling Challenge: Can Correct-by-Construction Design Help?

1010

ISPD’03ISPD’03

Repeated Block-level NetsRepeated Block-level Nets

0

5

10

15

20

25

30

35

90nm 65nm 45nm 32nm

% r

ep

ea

ted

ne

ts

M3 M6 Ever-increasing %age of block-Ever-increasing %age of block-level nets requires repeaterslevel nets requires repeaters

Even the rate of growth is Even the rate of growth is accelerating!accelerating!

…especially for clocked repeaters

0

2

4

6

8

10

12

14

90nm 65nm 45nm 32nm

% n

ets

wit

h c

lk-r

ep

M3 M6

Page 11: The Scaling Challenge: Can Correct-by-Construction Design Help?

1111

ISPD’03ISPD’03

Total Repeater CountTotal Repeater Count

Ever-increasing Ever-increasing fractions of total cell fractions of total cell count will be repeaterscount will be repeaters– 70% in 32nm70% in 32nm (and this (and this

omits FC repeaters within omits FC repeaters within block !)block !) 0

10

20

30

40

50

60

70

80

90nm 65nm 45nm 32nm%ce

lls u

sed

to

rep

eat

blo

ck-l

evel

net

s

clk-rep

rep

tot-rep

Total repeater count is independent of Total repeater count is independent of frequency scaling assumptionsfrequency scaling assumptions

Page 12: The Scaling Challenge: Can Correct-by-Construction Design Help?

1212

ISPD’03ISPD’03

Interconnects scaling worse than devicesInterconnects scaling worse than devices …….in spite of optimal (re-)buffering.in spite of optimal (re-)buffering

# repeaters increasing exponentially# repeaters increasing exponentially

So, what’s changing?So, what’s changing?

Interconnect repeaters will comprise significant Interconnect repeaters will comprise significant fractionfraction of cells in blockof cells in block

Even block-level nets will need to be pipelinedEven block-level nets will need to be pipelined

Page 13: The Scaling Challenge: Can Correct-by-Construction Design Help?

1313

ISPD’03ISPD’03

Implications on SynthesisImplications on Synthesis

Literal/Gate count and fanout Literal/Gate count and fanout metrics misleadingmetrics misleading– Major delay contribution from Major delay contribution from

communicationcommunication

– Fanouts often isolated by repeatersFanouts often isolated by repeaters

– Area often wire-limitedArea often wire-limited

Sizing often determined by Sizing often determined by (predictable) repeater load(predictable) repeater load

– Pre-layout sizing wastedPre-layout sizing wasted

Page 14: The Scaling Challenge: Can Correct-by-Construction Design Help?

1414

ISPD’03ISPD’03

Implications on SynthesisImplications on Synthesis

Less logic per pipeline stageLess logic per pipeline stage Combinational synthesis: max Combinational synthesis: max

benefit shrinkingbenefit shrinking Synthesis across sequential Synthesis across sequential

boundariesboundaries Methodological support for Methodological support for

retiming retiming

Page 15: The Scaling Challenge: Can Correct-by-Construction Design Help?

1515

ISPD’03ISPD’03

Implications on SynthesisImplications on Synthesis

Bandwidth ceilingBandwidth ceiling– Hard to move data around for Hard to move data around for

computationcomputation

Logic replicationLogic replication– Encourage low fansEncourage low fans

Dense encodingsDense encodings Distribution of computation across Distribution of computation across

channelchannel

Page 16: The Scaling Challenge: Can Correct-by-Construction Design Help?

1616

ISPD’03ISPD’03

Implications on LayoutImplications on Layout

RoutingRouting– Must understand repeater insertionMust understand repeater insertion– Fine power grid => templated routing?Fine power grid => templated routing?

Placement with repeaters Placement with repeaters – Intra-block nets: # repeaters depends on Intra-block nets: # repeaters depends on

routing routing – OTH routes: fixed obstructionsOTH routes: fixed obstructions– Add buffering into placement core Add buffering into placement core

… … as opposed to ECO postprocessingas opposed to ECO postprocessing

a b

a b

a

b

S SSV

SSVS

S

S

Page 17: The Scaling Challenge: Can Correct-by-Construction Design Help?

1717

ISPD’03ISPD’03

Implications on LayoutImplications on Layout Latency-constrained placementLatency-constrained placement

– arch sub-optimalityarch sub-optimality

– Hard constraint per stage Hard constraint per stage (unlike (unlike delay)delay)

OROR

Post-RTL latency optimizationPost-RTL latency optimization– Methodological nightmareMethodological nightmare

– Delay insensitive design?Delay insensitive design?

32nm

90nm

Page 18: The Scaling Challenge: Can Correct-by-Construction Design Help?

1818

ISPD’03ISPD’03

Implications on FC AssemblyImplications on FC AssemblyWhat if we reduce block area to avoid wire effects?What if we reduce block area to avoid wire effects?

Many of the new physical synthesis problems go awayMany of the new physical synthesis problems go away

BUTBUT

# blocks triples!# blocks triples! (and block assembly is the hardest part of chip design!)(and block assembly is the hardest part of chip design!)

Flat assemblyFlat assembly(Fragmentation of paths across blocks)(Fragmentation of paths across blocks)

OROR

Increased hierarchyIncreased hierarchy(Lack of visibility across hierarchy levels)(Lack of visibility across hierarchy levels)

Page 19: The Scaling Challenge: Can Correct-by-Construction Design Help?

1919

ISPD’03ISPD’03

The CbC LinkThe CbC Link

Process scaling => worsening predictabilityProcess scaling => worsening predictability

Predictability => CbC designPredictability => CbC design

But current CbC approaches too rigidBut current CbC approaches too rigid

Can we still apply them?Can we still apply them?

Page 20: The Scaling Challenge: Can Correct-by-Construction Design Help?

2020

ISPD’03ISPD’03

Principles of CbC DesignPrinciples of CbC Design More predictabilityMore predictability

– Reduced estimation error improves high-level Reduced estimation error improves high-level optimizationsoptimizations

Break the design-verification loopBreak the design-verification loop– Sequence of small, guaranteed-correct Sequence of small, guaranteed-correct

transformationstransformations– No unexpected deterioration of secondary metricsNo unexpected deterioration of secondary metrics

Avoid micro-engineeringAvoid micro-engineering– Design productivity gapDesign productivity gap

Page 21: The Scaling Challenge: Can Correct-by-Construction Design Help?

2121

ISPD’03ISPD’03

Abstract FabricsAbstract Fabrics Structural fabrics: too resource-intensiveStructural fabrics: too resource-intensive

e.g. DWF: 50% routing trackse.g. DWF: 50% routing tracks

Use algorithmic fabrics insteadUse algorithmic fabrics instead– Prune to subspace with desirable CbC propertiesPrune to subspace with desirable CbC properties e.g. Non-uniform power grid using “min power pitch” (ISPD’02)e.g. Non-uniform power grid using “min power pitch” (ISPD’02) Guaranteed throughput bus design (ICCAD’02)Guaranteed throughput bus design (ICCAD’02)

– CbC rules-of-thumb CbC rules-of-thumb e.g. Bound on max adjacent runs of signalse.g. Bound on max adjacent runs of signals

Performance with predictabilityPerformance with predictability

Page 22: The Scaling Challenge: Can Correct-by-Construction Design Help?

2222

ISPD’03ISPD’03

Synth/mapped Synth/mapped

netlistnetlist

CbC Block ConstructionCbC Block Construction ““Vertical” partitioning and Vertical” partitioning and

successive refinementsuccessive refinement– Coarse layout of unsynthesized Coarse layout of unsynthesized

designdesign– Successive refinement of “vertical” Successive refinement of “vertical”

partitionspartitions– Critical partitions firstCritical partitions first– Different partitions exist at different Different partitions exist at different

level of refinementlevel of refinement– Hierarchical enginesHierarchical engines

– Enables early repeater predictionEnables early repeater prediction

RTLRTL

Placed/buffered Placed/buffered

netlistnetlist

GR/track-assigned GR/track-assigned

layoutlayout

Page 23: The Scaling Challenge: Can Correct-by-Construction Design Help?

2323

ISPD’03ISPD’03

Latency prediction for full-chip interconnectsLatency prediction for full-chip interconnects– Preferential routing for performance-critical netsPreferential routing for performance-critical nets

– Flip-flop staging on non-critical netsFlip-flop staging on non-critical nets

– Performance prediction with cycle latency rangesPerformance prediction with cycle latency ranges

Block area mis-prediction toleranceBlock area mis-prediction tolerance– Move blocks without re-implementationMove blocks without re-implementation

– Global communication gridsGlobal communication grids

CbC Full Chip AssemblyCbC Full Chip Assembly

Page 24: The Scaling Challenge: Can Correct-by-Construction Design Help?

2424

ISPD’03ISPD’03

Summing UpSumming Up

Repeaters becoming critical at the block levelRepeaters becoming critical at the block level Most post-RTL design problems changing Most post-RTL design problems changing

fundamentallyfundamentally Combination of algorithmic and methodological Combination of algorithmic and methodological

advances requiredadvances required

CbC approaches viable, but at the abstract levelCbC approaches viable, but at the abstract level

– Current structural fabrics too resource intensiveCurrent structural fabrics too resource intensive

– Achieve predictability through algorithmic fabricsAchieve predictability through algorithmic fabrics

Page 25: The Scaling Challenge: Can Correct-by-Construction Design Help?

Backup SlidesBackup Slides

Page 26: The Scaling Challenge: Can Correct-by-Construction Design Help?

2626

ISPD’03ISPD’03

PIE (Process Independent PIE (Process Independent Exploration) ModelsExploration) Models To provide an easier way to study interconnect structures and their To provide an easier way to study interconnect structures and their

trends in future CMOS processestrends in future CMOS processes To be used in place of To be used in place of fudgedfudged process files process files Analytical models directly correlating to device and interconnect physicsAnalytical models directly correlating to device and interconnect physics

– Device models based on BSIM3 equations including major 2Device models based on BSIM3 equations including major 2ndnd order effects order effects– Accurate mobility and velocity saturation models, DIBL and channel length Accurate mobility and velocity saturation models, DIBL and channel length

modulation approximationmodulation approximation– Continuous from weak to strong inversionContinuous from weak to strong inversion

– Interconnect models with 2D fringe capacitance approximationInterconnect models with 2D fringe capacitance approximation– Scattering not accounted forScattering not accounted for

Entire process expressed by small set of physically meaningful process Entire process expressed by small set of physically meaningful process parameters (e.g. Tparameters (e.g. Toxox, V, Vthth, k, kildild, etc.) in PEF (Process Exploration File) files, etc.) in PEF (Process Exploration File) files

– 16 for devices16 for devices– 6 each metal layer6 each metal layer

Test cases simulated as SPICE netlistsTest cases simulated as SPICE netlists PIE models implemented as behavioral sourcesPIE models implemented as behavioral sources Calibrated against existing process filesCalibrated against existing process files