1 L23: Clock Issues in Deep Submircron Design(2) 1999. 10 Jun Dong Cho Sungkyunkwan Univ. Dept. ECE...

1

L23: Clock Issues in Deep Submircron Design(2)

1999. 10 Jun Dong Cho

Sungkyunkwan Univ. Dept. ECEMail : [email protected]

Homepage : vada.skku.ac.kr

2

Buffer Pre-Placement

3

Iso-Radius Buffer Insertion

( a )

( b )( a )

( b )

4

Width Control

5

Width Tapering

6

Buffer/Load Distribution

7

H-Tree H-Tree is a special case of CRP, where all the clock

terminals are arranged in a symmetric fashion, as is the case in the gate arrays.

The H-Tree algorithms connects two terminals in a particular order. Then, it connects the two middle points of vertical segments. The connected middle points are called the tapping points.

The H-Tree makes all terminals have the same unit length, hence the skew in each terminals is zero.

X-Tree : If the routing is not restricted to being rectilinear, the shape of H-Tree can be changeable with X shape. But, it is undesirable since they may cause crosstalk due to close proximity of wires.

8

Hierarchical Matching Tree : MMM & GMA

MMM(Method of Means and Medians) : The MMM algorithm recursively partitions a circuit into two equal parts, and then connects the center of the mass of the whole circuit to the centers of mass of the two sub-circuits.

GMA(Geometric Matching Algorithm) : Unlikely MMM algorithm which is a top down algorithm, GMA works bottom up.

Cut 1

Cut 2

Cut 3

MMM GMA

Black -> Blue -> Red -> Green

9

Zero Skew Algorithm Zero Skew Algorithm has recursive, bottom-up characteristics in nature. This algorithm

Assumes that pairing of points has been done Concerns itself with finding the tapping point very accurately, based

on capacitive loading of the clock terminals as well as the delay in the sub-trees

TappingP oint

T1

T2

c 2/ 2

c 1/ 2c 1/ 2

c 2/ 2

C 1

C 2

t1

t2

x- 1

x

If we can’t achieve the zero skew in above method, we must elongate one path length to make zero skew to both path. So we call this “snaking”.

1 21 1 1 2 2 2( ) ( )

2 2

c cr c t r c t

10

A Worst Case Tree

11

RHMT

12

Interconnect Topology

Resistance ratio = driver resistance / unit wire resistance

when resistance ratio is small, interconnect topology optimization is importance.

Importance metric: total wire length, radius (longest source-sink path-length),diameter (for multi-source nets)

Optimal tree construction algorithms BRBC(Bounded-radius bounded-cost) algorithm A-tree algorithm: start with a forest of n single-node A-

trees, repeatedly combining two A-trees into a new one.

13

Recent Approaches in Clock Tree Synthesis

Research in Clock Tree Synthesis Algorithm Wire-sizing & Parallel Algorithm for zero skew

Reducing Clock Power using Multiple Voltage

Clock Tree Scheduling with Storage Retiming

Research in System Level Design Feature GALS Clock Scheme

Considering System Level Clock Tree

14

Wire-sizing & Parallel Algorithm for zero skew (1)

W

propagation path

Using an iterative approach. One wire segment is selected and an alternate wire-size is tried. To make the skew of the tree zero, we have to re-merge the sub-tree rooted at the current wire with its sibling.

Assumption : The sibling wire uses the same wire size. This propagation continues until all the wire segments on the path from

the current wire to the root wire are re-merged. When, the size of a wire is locally optimized, the effect of the wire size change is propagated by zero skew merging to the root of the clock tree.

The length of all the wires along the propagation path and their siblings may change but their wire-sizes remain unchanged.

15

Wire-sizing & Parallel Algorithm for zero skew (2)

p1p0

p1

p1

6 8

216

The tree is partitioned into the top part and the bottom part. Only the nodes in the bottom part are distributed among the

processors The nodes in the top part are shared among the processors

Iteration Method. First, let each processor do the wire-sizing for the top part.(Except

root) Each process can do the wire-sizing for all the wires in the bottom

part of the tree in a distributed manner, then synchronized the result.

Sub-tree Partition : Assume there are two processors. The sub-tree assignment will not occur on nodes of depth 1 since it will make an assignment of 16 nodes and 2 nodes. But in depth of 2, we have 4 sub-trees.

16

Reducing Clock Power using Multiple Voltage

HL Converter : converts the incoming clock signal to the chip from high voltage swing to a lower voltage swing.

LL Converter : regenerate the signal and maintain a sharp slew rate as the signal passes through the network.

LH Converter : convert the higher voltage swing used by logic network at the sink FF.

Instead of using multiple voltage, Only use reduced-swing clock scheme.

2 L dd s L ddP f C V V f C V

17

Clock Tree Scheduling with Storage Retiming

Retiming improves the speed of a digital circuit bye relocating its storage elements while preserving the functionality of the original design.

Clock scheduling achieves the same effect as retiming by introducing skew between the clock signals that control the timing of the storage elements within a circuit.

When the clock skew is zero, the minimum clock period is the longest delay of all the combinational paths in the circuit. So the goal is to balance the longest delay of all the data paths by relocation the registers.

When nonzero clock skew is introduced, the circuit can successfully operate at a clock period which equals the largest difference in the delays of the slowest path and the fastest path between any pair of registers.

Left : Original circuit Middle : Fastest retimed

circuit with zero skew Right : Fastest retimed

circuit with nonzero skew

18

GALS Clock Scheme(1) By now, Power consumption in Clock tree is about 50% per

cent of total power consumption. In the view of system design, we must reduce the power co

nsumption of clock. Power consumption in clock of large high performance VLSI

s can be reduced by adopting GALS(Globally Asynchronous, Locally Synchronous) design style.

GALS architecture is composed of large synchronous block(SBs) which communicate with each other on an asynchronous basis.

By eliminating the global clock, we eliminate a major source of power consumption and a design bottleneck.

GALS approach is skew tolerant at global level because it does not depend on a global clock reference for communication, but, gated clock will occurs clock skew when clock frequencies go beyond GHz.

19

GALS Clock Scheme(2) In GALS architecture, local clocks are required for the SBs. Using global reference and PLL

The signal swing can be a fraction of Vdd. The signal is distributed at a much lower frequency compared to the highest

frequency. No effort is made to carefully design the geometry of the signal to minimize

skew. Restriction : analog PLL in a noisy digital environment is difficult, and PLL is

sensitive to process variations. Local clock generators based on ring oscillators.

The basic ring oscillator consists of an odd number of inverters in a circular chain.

The frequency of the ring oscillator will be determined by the propagation time through the chain of inverters.

To change the oscillation, a delay line of controllable capacitor is used.

20

Conclusion - Low Power Issues (1)

Power optimization allows logic optimization to simultaneously optimize for timing, area and power. So all the inputs to optimization are the same with the addition of two new power constraints: max dynamic power and max leakage power. A power-optimizing logic optimization system takes as input a gate-level netlist or database, technology library, optional constraints for timing and area, and parasitic information (initially in the form of estimated wireloads, but if backannotation has been done that information will be used). All that's needed in addition for power optimization is to set a power constraint and supply switching activity - the same switching activity used with power analysis.

What you get out of power optimization is a gate-level netlist, optimized to meet all of your constraints. A natural question to ask is: "If optimization at the RTL and Behavioral levels can have a great impact on final power dissipation, why offer a gate-level power optimization capability first?" Over a decade of experience in synthesis and optimization it has proven that RTL level suffers the impact of optimization at the gate level. The first commercially successful synthesis products were gate-level timing optimizers and these paved the way for RTL and Behavioral synthesis systems. In a similar way gate-level power optimization will pave the way for RTL and Behavioral synthesis for low power.

21

Conclusion - Low Power Issues (1) Earlier we made the point that analysis precedes optimization. Here

we make another general point: We might say that just as analysis precedes optimization, optimization precedes synthesis. Or to put it another way: Successful synthesis at higher levels requires successful optimization at lower levels.

22

The Key Termsin Clock Tree Synthesis

Clock buffer: circuit element to isolate and amplify incoming clock signal. Clock tree: design technique to achieve balanced delays and loads in the clock buffers. Gated clock: clock line that can control clock transmission to the operating circuits. Ground bounce: the change in ground (vss) reference levels due to current in the ground lin

e. Ground loop: the noise caused in the ground line(s) due to unbalanced IR drops in the grou

nd line. Insertion delay: the time from clock pad to individual flop-flops. IR drop: the voltage drop caused by the current I through the resistor R. Jitter: the change in period to period timing in a clock signal. Latency: the time for a clock to become available in the circuit. Multiphase clock: clocking system with more than one phase may be overlapping or non-ov

erlapping. Biphase-clock and complement, Quadrature-clocks separated by a phase angle of 90 degree

PLL: Phase-Locked Loop, a variable frequency generator locked to a source signal. Skew: the maximum difference in clock arrival time between any two flip-flops. Slew rate: also called rise time or fall time. The time for a signal to go from one level to the

other level.

23

[1] B. Schweber. Delivering The High-Speed Clock: Not Easy To Be On Time. In Proc. EDN, July 6, 1995 [2] H. B. Bakoglu. Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley Publishing Company.

New York. 1990 [3] J. D. Cho and M. Sarrafzadeh. A Buffer Distribution Algorithm for High-Performance Clock Net Optimizatio

n. In Proc. IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol 3, No.1, March 1993. [4] N. C. Chou and C. K. Cheng. On General Zero-Skew Clock Net Construction. In Proc. IEEE Transactions O

n Very Large Scale Integration (VLSI) Systems, Vol 3, No.1, March 1995 [5] S. Lin and C. K. Wong. Process-Variation-Tolerant Zero Skew Clock Routing. In IEEE 1993 Custom Integr

ated Circuits Conference. 1993 [6] B. Wiederhold. Deep submicron ASIC Design Requires Design Planning. In Proc. EDN, February 16, 1995 [7] Menezes, A. Balivada, S. Pullela and L. T. Pillage. Skew Reduction in Clock trees Using Wire Width Optim

ization. In Proc. IEEE 93 Custom Integrated Circuits Conference. 1993 [8] R. Hansen and R. Deming. ASIC Design Techniques Synchronize Dual Clocks In High-Speed Designs. In P

roc. EDN, July 1993 [9] W. Khan and N. Sherwani. Zero Skew Clock Routing Algorithm For High Performance ASIC Systems. [10] K. D. Boese and A. B. Kahng. Zero-Skew Clock Routing Trees With Minimum Wirelength. In IEEE 1992 C

ustom Integrated Circuits Conference. 1992 [11] A. Hemani, T. Meinchke, S. Kumar, A. Postula, T. Olsson, P. Nisson, J. Oberg, P. Ellervee, D. Lundqvist.

Lowering power consumption in clock by using Globally Asynchronous Locally Synchronous design style,In Proc. DAC `99, 1999.

References & Suggested Readings(1)

24

References & Suggested Readings(2) [12] J. Rubinstein, P. Penfield, and M. A. Horowitz. Signal Delay in RC Tree Networks. In Proc. IEEE Transacti

ons On Computer-Aided Design, Vol. CAD-2, No.3, July 1983 [13] X. Liu, M. C. Papaefthymiou, E. G. Friedman, Maximizing Performance by Retiming and Clock Skew Sch

eduling, In Proc. DAC`99 1999. [14] J. Pangjun, S. S. Sapatnekar, Clock Distribution Using Multiple Voltages, ISLPED`99, 1999 [15] Z. Xing, P. Banerjee, A PARALLEL ALGORITHM FOR ZERO SKEW CLOCK TREE ROUTING, International Sy

mposium on Physical Design, 1998. [16] J. S. Yim, S. O. Bae, C. M. Kyung, A Floorplan-based Planning Methodology for Power and Clock Distribu

tion in ASICs, In Proc. DAC`99, 1999.

1 L23: Clock Issues in Deep Submircron Design(2) 1999. 10 Jun Dong Cho Sungkyunkwan Univ. Dept. ECE...

Documents

Transcript of 1 L23: Clock Issues in Deep Submircron Design(2) 1999. 10 Jun Dong Cho Sungkyunkwan Univ. Dept. ECE...