Impact of Local Interconnects on Timing and Power in a...
Transcript of Impact of Local Interconnects on Timing and Power in a...
![Page 1: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/1.jpg)
December 13th 2011 UT ICS/IEEE Seminar, UT Austin
Impact of Local Interconnects on Timing and Power in a High Performance Microprocessor
Rupesh S. Shelar
Low Power IA Group
Intel Corporation, Austin, TX
![Page 2: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/2.jpg)
2
Agenda
•Introduction
•Impact on Timing
•Impact on Power
•Conclusions
![Page 3: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/3.jpg)
3
Why Look at Interconnects Closely
•Unlike transistors, interconnects
– do not perform any computation
– merely transfer information
•Paying power/timing cost for wires yields nothing
0 1 0 1 1 1 0 0
![Page 4: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/4.jpg)
4
Motivation: Interconnect Delay & Power
• Global interconnects known to contribute significantly to path delays
• For local interconnects in intra-block paths, exact numbers probably not known, as these vary depending on the block-size, design style
• Relatively less attention paid to interconnect power dissipation
• Many academic studies exist: most based on small data
![Page 5: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/5.jpg)
5
About Data
•Delay/power data from blocks in a high performance microprocessor core in 45 nm technology
•Blocks implemented using RTL-to-Layout Synthesis (RLS) design style • Mostly automated (using vendor/in-house tools); write RTL, partition, and run
tools/flows
• Design quality determined by algorithms, tools, flows, parameters; supposedly poor utilization, or sparse layouts
•Local interconnects: implemented mostly in min-width M2 to M5 layers
•Delay/power impact due to interconnects inside standard cells is considered as cell-delay/-power contribution in this study
![Page 6: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/6.jpg)
6
Agenda
•Introduction
•Impact on Timing
•Impact on Power
•Conclusions
![Page 7: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/7.jpg)
7
Impact of Interconnects on timing •For max timing, interconnects contribute in terms of
– Wire delay
– Slope degradation (slows down receivers)
– Cell-delay degradation (slows down driver)
– Cumulative effect of above 3 on path delays
– Delays due to repeaters (inserted for timing/slope/noise)
•Chose 3 metrics on the worst internal paths: – Wire delay
– Interconnect impact (obtained by setting R=C=0)
– Repeater delay
•Why internal paths: should exclude the effect of timing constraints on primary i/os on results due to synthesis flows (RLS)
•Why worst paths: determine operating frequency
![Page 8: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/8.jpg)
8
A Close Look at One Block: Wire Delay
•Wire delay increases as slack decreases
•Timing wall due to sizing/ll-insertion because of emphasis on power also
•Interconnect delay impact won’t change without power
optimization
Mean wire delay % vs Slack
0
2
4
6
8
10
12
14
16
-0.05 0 0.05 0.1 0.15 0.2 0.25
Slack
Mean
wir
e d
ela
y %
Mean wire delay % vs slack for worst internal paths between unique pairs of sequentials in a ~40 K cell block with ~4 K sequentials
14%
![Page 9: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/9.jpg)
9
A Close Look: Slope-/Cell-delay Degradation
•Slope-/cell-delay degradation contribute as much as wire delay
•Secondary effect not second order
Mean wire delay & impact vs slack for worst internal paths between unique pair of sequentials
Mean wire delay, interconnect delay impact vs Slack
0
5
10
15
20
25
30
-0.05 0 0.05 0.1 0.15 0.2 0.25
Slack
Mean
wir
e d
ela
y, in
terc
on
nect
dela
y im
pact
%
28%
![Page 10: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/10.jpg)
10
A Close Look: Repeater Delay
•Repeater = inverter or buffer
•On critical path, most inverters/buffers are repeaters
– Cell library is granular
•Repeater delay same as interconnect delay impact
Mean wire delay, ic. impact, rep. delay vs Slack
0
5
10
15
20
25
30
35
-0.05 0 0.05 0.1 0.15 0.2 0.25
Slack
Me
an
wir
e d
ela
y, ic
im
pa
ct,
re
p. %
Mean wire delay, interconnect impact, repeater delay vs slack for worst internal paths
33%
![Page 11: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/11.jpg)
0
10
20
30
40
50
60
70
-0.05 0 0.05 0.1 0.15 0.2 0.25
Me
an
in
terc
on
nec
t d
ela
y i
mp
ac
t +
re
peate
r d
ela
y
Slack
Mean ic delay impact + rep delay vs Slack
11
A Close Look: Adding all 3
Overall interconnect delay impact, including repeater delay vs slack for worst internal paths
•Average overall impact: 30%
•Similar behavior for smaller block sizes also
– Same quality: repeaters are indicators of synthesis quality
•One had hoped for better!
59%
![Page 12: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/12.jpg)
12 12
Repeater Count in RLS blocks
•Varies almost linearly with block-size
•Tools/flows used in the linear region
# of Repeaters vs. # of Cells
0
5000
10000
15000
20000
25000
0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
# of Cells
# o
f R
ep
eate
rs
![Page 13: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/13.jpg)
13
Summary of Observations so far
•Interconnect delay dominance regardless of design style
•Secondary effects as big as primary effect, the wire delay
•Repeater count more than 40% and linear in the size of blocks
•Repeater delays contribute as much as wires
![Page 14: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/14.jpg)
14
Agenda
•Introduction
•Impact on Timing
•Impact on Power
•Conclusions
![Page 15: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/15.jpg)
15
Power Dissipation in RLS blocks
•Typical power dissipation distribution in high speed microprocessors: 60%/10%/30%: Dyn./S. Ckt./Lkg.
•Leakage contained by – High-k metal gate transistors with strain
– High percentage of low-leakage/high-vt devices
– Power gates
•High use of clock gating reduces the dynamic power in combinational logic
•Synthesized logic blocks consume nearly 30%
![Page 16: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/16.jpg)
16
Clock Interconnect Power in RLS blocks
•Interconnects contribute to 18% of dynamic/glitch power in clocks
•Clock tree (including sequentials) contribute to 71% of dynamic power
– # of sequentials contribute roughly to 1/5th of cell count in RLS
•Out of total dynamic/glitch power in RLS blocks
– Clock cells contribute 16%
– Clock interconencts contribute 13%
– Sequentials contribute 42% of dynamic power in RLS
Dynamic/Glitch Power
Clock cells
Sequentials
Clock Interconnect
![Page 17: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/17.jpg)
17
Interconnect Power in Combinational Logic in RLS blocks
•32% of dynamic/glitch power in combinational logic; 8% of dynamic/glitch power in RLS
Dynamic Power Distribution in Combinational Logic
Comb. Logic Cells
Interconnect
![Page 18: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/18.jpg)
18
Repeater Power in RLS blocks
•Dynamic power in combinational logic: 27% of dynamic power in RLS
– Inv./buf. contribute 30% to that; somewhat low, given 44% of cell count, since activity factors for combinational logic are lower than those in clock tree
•SC power in combinational logic: 50% of SC power in RLS
– Inv./buf. contribute 65% to that; high since no transistors for stacking
•Lkg power in combinational logic: 71% of leakage in RLS
– Inv./buf. contribute to 46% to that; can be explained by 44% repeater count
Dynamic Power in Combinational Logic
Inverters
Buffers
Other Cells/interconnect
Short Circuit Power
Inverters
Buffers
Other Cells/interconnect
Leakage Power
Inverters
Buffers
Other Cells/interconnect
![Page 19: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/19.jpg)
19
Agenda
•Introduction
•Impact on Timing
•Impact on Power
•Conclusions
![Page 20: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/20.jpg)
20
Impact of Interconnects on Timing/Power
•Avg. impact of interconnect on timing: 30% of cycle time
•Dynamic Power dissipated by interconnects: ~30%
– ~21% by wires and ~8% by repeaters
•Thus, impact on speed and power: nearly 1/3rd
•Avg. repeater count: 44%
– Makes layout/timing convergence difficult
•Overall, pose severe challenges to high-speed design
![Page 21: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/21.jpg)
21
Implications
• [Bohr 95] “Interconnect Scaling – The Real Limiter to High Performance ULSI”
•Would have been true, had we kept doubling the frequency and not moved to Cu
•Pushing speed – Microprocessors? Cores already run at 3.2 GHz
– Processors in netbooks/smartphones
– Graphics processors
•Technology scaling: – Transistors improve; Wire R /um increases; Wire C /um stays the same
– RC stays the same, assuming ideal length scaling
– Interconnect impact component likely continue to increase
![Page 22: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/22.jpg)
22
Possible Solutions
•From technology side: – 3 D?
– Al Cu ? Low k?
– Not in sight for next few years?
•From CAD: – Placement, routing, physical synthesis running out of steam
• “don’t know what the opportunities are” – ISPD 2010
– Logic synthesis/tech. mapping doesn’t help, where it is used: serves the purpose of creating a netlist from RTL
• “The Death of Logic Synthesis” – ISPD 2005
•How about incremental logic re-synthesis after global routing
![Page 23: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/23.jpg)
23
Logic Re-Synthesis After Global Routing
•Why? – Routing picture known after placement/CTS/global route
– Only then we know the real impact of interconnects on delay • Dependence on topology, layers, vias, repeaters, detours, congestion
– Logic synthesis/technology mapping powerful transformations, but…
•Challenges: – Using placement/routing information
– Requires more memory/computation: faster/better/multi-core CPUs
– Polynomial time algorithms performing simultaneous optimizations • An example: simultaneous mapping/placement
![Page 24: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/24.jpg)
24
Acknowledgments
•Marek Patyra, Intel
•Noel Menezes, Intel
•Xinning Wang, Intel
•Wei-kai Shih, Intel
•Andy Carle, Intel
•… many from EMG/TMG, Intel
![Page 25: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/25.jpg)
25
Q&A
![Page 26: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/26.jpg)
26
Low Frequency (high 100s of MHz)/Low Power Designs
•Processor running at 5X slower frequency consumes 5x lower dynamic power
– Interconnect delay impact as percentage of cycle time reduces by same factor
•Additional quadratic power savings due to supply voltage reduction
– Slower gates, but interconnect component stays roughly the same
– Overall interconnect impact on delay goes down further
– Doesn’t require as many repeaters
– Critical paths gate-delay dominated
Interconenct impact at 5x slower frequency vs Slack
0
2
4
6
8
10
12
14
-0.05 0 0.05 0.1 0.15 0.2 0.25
Slack
Inte
rco
nn
ect
imp
act
at
5x
slo
wer
freq
uen
cy %
Projected* interconnect delay impact for 5x slower design (could be much lower)
12%
![Page 27: Impact of Local Interconnects on Timing and Power in a ...ewh.ieee.org/r5/central_texas/cas_ssc/meetings/... · – Cell library is granular •Repeater delay same as interconnect](https://reader036.fdocuments.net/reader036/viewer/2022070805/5f03cc1e7e708231d40ad1a5/html5/thumbnails/27.jpg)
27
Low Frequency (high 100s of MHz)/Low Power Designs
•Effect of re-pipelining on delay – Less sequentials Less clock
buffers/nets More routing resources for signals Better routing Lower interconnect impact
•Problems for low power/high speed not the same!
•1 Million cell placement for 600 MHz != 200 K cell placement for 3 GHz
•What if we want to run a processor in both the modes
Interconenct impact at 5x slower frequency vs Slack
0
2
4
6
8
10
12
14
-0.05 0 0.05 0.1 0.15 0.2 0.25
Slack
Inte
rco
nn
ect
imp
act
at
5x
slo
wer
freq
uen
cy %
Projected* interconnect delay impact for 5x slower design (could be much lower)
12%