A New Characterization Method for Delay and Power...

A New Characterization Method for Delay and PowerDissipation of Standard Library Cells

JOS B. SULISTYO* and DONG S. HA†

Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA

(Received 15 March 2001; Revised 30 January 2002)

A simplified method for characterization of standard library cells based on the linear delay model ispresented in this paper. The linear model is chosen as it allows rapid characterization with a modestnumber of simulations, while achieving acceptable accuracy. All the parameters of cell delays aredefined as 50%-to-50% delays, as distinguished from 50%-to-threshold or threshold-to-50% often usedin commercial tools. We found that the 50%-to-50% definition of delays is more consistent and leads toclosed-form formula. A subset of library cells in a 0.25mm technology was characterized using theproposed technique. A test circuit was subsequently generated and simulated to determine the accuracyof the proposed characterization method. SPICE simulations on the test circuit show that the timingestimations obtained through the proposed method is accurate to within 5.6%, and the power estimationwas accurate to 4.2%, ignoring parasitics on interconnections.

Keywords: Standard cell; Characterization; Timing models; Linear model; Power estimation

INTRODUCTION

Automatic synthesis tools employ standard cell library

based design. Facilitating logic optimization as well as

speed and power estimation for a synthesis tool

necessitates that cell libraries contain accurate infor-

mation on the timing and power dissipation parameters. At

the same time, the modeling of standard cells should be

simple to avoid long processing time. In this paper, we

present a method to characterize library cells in terms of

delay and power dissipation.

Patel proposed a method to characterize cell delay and

capacitance parameters based on a nonlinear lookup table

method, and described a system implementing the method

[7]. Patel’s technique measures the delay as the switching-

to-switching voltage. While the proposed definition may

result in more accurate delay estimation for one kind of

cells, it may lead to an inconsistent definition if a cell

drives another cell of a different type (and hence, different

switching voltage). Cirit proposed a slightly different

method, one which slightly differs in that it assumes the

cell being characterized as a black box (i.e. making no

assumption about its internal structure) [3]. The black-box

approach is largely adopted in our method due to its

simplicity.

A nonlinear delay model based on a lookup table is

generally more accurate, but its complexity slows down

the logic simulation. Moreover, it needs intensive

simulation of cells to obtain lookup table values. By

contrast, the linear delay model, which is simple at the

cost of lower accuracy, alleviates the problem and is often

used for analysis of the synthesized circuit at the logic

level. Since the focus of our work is to devise a simple, yet

effective, delay and power characterization methods, we

employed the linear timing model due to its simplicity and

reasonable accuracy. Our linear delay model is largely

based on Penfield–Rubenstein-slope model described in

Eshraghian and Weste’s book [4]. Readers may refer to

Refs. [4,8] for the basics for cell delay and power

dissipation models employed in our method. Recently,

Auvergne proposed a method to improve the accuracy of

the linear modeling at the cost of higher complexity [1].

All existing power dissipation models including our

model are based on lookup tables. Cells were viewed

mostly as black box entities for our method, except in

the cases of simple combinational cells for which some

input transitions are known to dissipate little internal

energy. This means that, except for such simple cells,

the characterization process will be more or less

exhaustive. This approach is adopted because the

ISSN 1065-514X print/ISSN 1563-5171 online q 2002 Taylor & Francis Ltd

DOI: 10.1080/1065514021000012273

*E-mail: E-mail: [email protected]†Corresponding author E-mail: [email protected]

VLSI Design, 2002 Vol. 15 (3), pp. 667–678

characterization task may not be performed by the

designer of the cells. Additionally, the use of exhaustive

input patterns should improve accuracy. However,

knowledge about the internal structure of a cell allows

the elimination of unnecessary test patterns. For more

complicated cells and macrocells, some techniques to

simplify characterization tables, such as those proposed

by Jou et al. may be necessary [5,6]. Their techniques

are particularly useful for cases where the internal

structures of the cells are known, which is typically the

case with most arithmetic macrocells.

The paper is organized as follows. The second section

presents basics of the linear delay model and power

estimation. Third and fourth sections describe proposed

characterization techniques for timing and power dissipa-

tion, respectively. Fifth section shows experimental results

on the proposed techniques and describes observations

made from the experiments. Sixth section concludes the

paper.

PRELIMINARIES

As previously mentioned, our work focuses on a three-

term linear model, which provides a reasonable tradeoff

between simplicity and accuracy. In a linear model, the

delay of a cell is modeled as a one- to three-term linear

function of the load capacitance and the input ramp time.

The simplest one-term linear model is

Delay ¼ kcCL ð1Þ

where CL is the load driven by the cell. This model is

satisfactory for cells under heavy loading (i.e. the load is

much greater than the input and output capacitance of the

cell). When the cell is not heavily loaded, formula (1) is

inaccurate. Note that when a cell is not driving any loads,

it still exhibits a nonzero delay. Therefore, a two-term

linear model is often used to improve the accuracy of the

delay estimate.

Delay ¼ d0 þ kcCL ð2Þ

where d0 is the delay of the cell under the load-free

condition. The kcCL term is called the transition delay [11]

of the cell. Both of the one- and two-term linear models,

however, ignore the effect of the nonzero input ramp

times. Hence, a three-term linear model is expressed as:

Delay ¼ d0 þ kcCL þ kttr ð3Þ

where tr is the ramp time of the input. The krtr term is

called the slope delay [10].

The kcCL term could be viewed as a RC delay exhibited

by the cell. However, in two- and three-term models, the

RC-delay corresponds to output ramp times rather than

a 50%-to-50% delay of the cell. The d0 term corresponds

to the delay from input reaching 50% to output beginning

to switch. It is proposed in this work that the terms are

redefined to allow all the three delay components be

measured as 50%-to-50% delays.

We equate the 50% point as the cell’s switching point in

our work, although this is not strictly true. Even ignoring

process variations, the switching point depends on the cell

type, and a closed-form formula to calculate switching

points is not always possible. Hence, we employ 50%-to-

50% delays, whose reference points are 50% of the full

swing voltage. While the physical importance of the 50%

value is questionable, this leads to a more consistent

definition of delays. Hereafter, we use “logical delay” and

“50%-to-50% delay” interchangeably. The three import-

ant different delays are defined as follows. We refer to

Fig. 1 in the description of the delays, and the four cells,

Cell1 through Cell4, are identical in the figure.

Intrinsic delay (di): The intrinsic delay is defined as a

logical delay exhibited by a cell. The cell under

consideration does not drive any load; instead, it is driven

by other cells which themselves do not drive any load.

Since the statement is contradictory, it is hypothetical. It is

defined only to guarantee that its value is constant. Figure 1

shows the intrinsic delay of Cell2, which does not drive

any load, but is driven by an identical load-free Cell1

indirectly through the voltage source.

Transition delay (dt): The transition delay is defined as

an additional 50%-to-50% delay occurring when the cell

under consideration drives a capacitive load. Cell3 in

Fig. 1 exhibits the transition delay. For the linear model,

the sensitivity of the delay of a cell to any capacitive load

is characterized by its output resistance, as given in

Eq. (4).

Transition delay ¼ Output resistance

£ Load capacitance ð4Þ

Slope delay (ds): When the less steep output voltage Vo3

in Fig. 1 drives Cell4, it incurs additional delay in addition

to the intrinsic delay and the transition delay; this delay is

defined as the slope delay.

The energy dissipated by a cell is the sum of the static

energy and the dynamic energy. Static energy is dissipated

by a cell when there is no logical transition on any input

port. It is attributed to the leakage current for CMOS

circuits. Dynamic energy, on the other hand, is dissipated

by a cell that is undergoing one or more input transitions.

The dynamic energy could be further divided into

switching energy and internal energy. The switching

energy is attributed directly with the rail-to-rail switching

of the capacitive load at the output of the cell, which is

obtained as (1/2)CLV 2 for each low-to-high or high-to-

low transition. In the expression, CL is the load

capacitance, and V is the supply voltage. The internal

energy is the one dissipated internally, which is the

difference between the dynamic energy and the switching

energy. Note that the output resistance of a cell or

the routing resistance is not considered explicitly in the

J.B. SULISTYO AND D.S. HA668

energy dissipation of a cell. Therefore,

Total energy ¼ Static energy þ Switching energy

þ Internal energy ð5Þ

Since the internal energy dissipation depends on both the

input slope (represented by the input signal transition

delay) and the load capacitance [12], the energy

estimation process should be performed for various values

of the input transition delay and the load capacitance. The

results may be stored in a two-dimensional lookup table.

However, for simplicity, the internal energy is often

lumped and associated at input pins called input energy

and/or at the output pin called the output energy. In such a

case, only one-dimensional lookup table is required for

each pin. Some CAD tools model output energy as a

function of the load capacitance, and we adopt the

approach in our model as well.

Power estimation typically calculates the energy first,

and then divides the energy by the simulation time to

obtain the average power. Hence, the energy dissipation of

a cell is characterized for three parameters: static energy,

switching energy, and internal energy. The static energy

dissipation is different for high and low logic values at the

output, while the internal energy dissipation is also

dependent on rising and falling output transitions.

PROPOSED TIMING CHARACTERIZATION

TECHNIQUE

In this section, we describe the estimation of resistance

and capacitance of pins necessary to model transition and

slope delays. We also present characterization of setup

time and of tristate buffers. Finally, we discuss limitations

of the proposed techniques.

Resistance Estimation

The transition delay dt of a cell is modeled as follows in

the linear model.

dt ¼ Output resistance £ Load capacitance

The linear model gives one single value of the

transition delay. However, the transition delay may be

different for rising and falling transitions. Even for the

same rising or falling transitions, it may be different

depending on input pins. To address the problem, proper

resistance values must be associated with each transition

delay. There needs to be one resistance value for each

pair of an output transition (rising or falling) and an

input pin.

FIGURE 1 Intrinsic, transition, and slope delay.

STANDARD LIBRARY CELLS 669

The physical definition of output resistance, which is

unique and constant for the output pin, is obtained as

Rout ¼ dðItestÞ=dðV testÞjV in¼constant ð6Þ

In contrast, for our model, the resistance is estimated as

Rout ¼ DðDelayÞ=DðCapacitanceÞ ð7Þ

The setup to estimate output resistance is shown in

Fig. 2. It should be noted that several commercial tools

employ separate pairs of rising and falling resistance

values for each combination of input and output pins.

Slope Sensitivity Estimation

A cell exhibits the slope delay ds when it is driven by

another cell with a transition delay. Once a transition

delay is estimated, the output slope delay and input

transition delay are related as given in Eq. (8).

ds ¼ Input signal transition delay £ Slope sensitivity ð8Þ

The slope sensitivity is distinct for each input–output

pair delay, and is obtained from a test input transition and

the resultant output slope. It should be noted that a cell

exhibits both transition and slope delays. The separation is

made only for the mathematical convenience of the

estimation during simulation.

For sequential cells and complex combinational cells,

the requirements that the driving cell must be of the same

type may not hold well. For instance, a flip-flop may be

driven by an inverter or a clock driver.

Capacitance Estimation

Other parameters that need to be characterized are

dynamic input capacitance for all cells and output

capacitance for only tristate cells. The output capacitance

of tristate cells need to be characterized, since tristate

buffers are usually tied together, with several tristate cells

driving the same output node, but with only one, or none at

all, being enabled at any given time. In that case, the

output capacitance of the disabled tristate cells poses a

capacitive load to the enabled tristate cell and hence

increases the transition delay of the enabled cell. Even if

nontristate cells are tied together (such as parallel buffers

driving heavy loads), they are designed to have the same

logic state. The effect of their output capacitance on delays

is absorbed into the intrinsic delay term in Eq. (4).

The capacitance of a port is expressed as:

C ¼ Q=V ¼1

V

ðt

0

iðtÞ dt ð9Þ

where V is the rail-to-rail voltage and i(t ) is the current

flowing into the node. It is necessary to set reasonable

rise/fall time of the applied voltage V. It is set to 0.4 ns for

the 0.5mm technology and 0.2 ns for 0.25mm technology

for our method for both rise and fall times. Most SPICE-

like simulators have measurement statements that perform

an integral operation. Such a measurement statement can

be used to perform the necessary integral operation of

current i(t ) in Eq. (9).

Since the value of the dynamic capacitance may vary

depending on the particular transition, we propose the

measurement be performed for all input transitions

causing output transitions for combinational cells. This

may result in a reasonably pessimistic estimate, i.e. a

larger capacitance value. When an input change causes the

output change, the effective input capacitance increases

slightly due to the Miller effect. For sequential cells,

however, this requirement does not apply, since some

inputs may not affect the outputs immediately (e.g. the

change in the D input of a D flip-flop does not cause the

output change). Further, the dynamic capacitance not only

depends on an input transition, but also the state of the

clock. Therefore, it is prudent that all cases be tested and

the results be averaged.

For instance, we explain the capacitance estimation of a

rising-edge triggered D flip-flop. Four different cases as

given in Table I should be considered for the capacitance

estimation of D input.

The results of Case 1 and Case 2 are averaged to

estimate the input capacitance of the D input under

clock ¼ 0; while the results of Case 3 and Case 4 are

averaged to obtain the input capacitance of the D input for

clock ¼ 1: The final estimate is the larger of those two

cases. Similarly, the capacitance of all the inputs other

than the clock can be estimated. The capacitance for the

clock input is measured for rising clocks for four different

cases of the Q value as described below, and the average

value is taken as the estimate.

i) Q changes from 0 to 1.

ii) Q remains high.

iii) Q changes from 1 to 0.

iv) Q remains low.

FIGURE 2 Estimation of output resistance.

TABLE I Capacitance estimation for D input

Case D input Clock

1 Rising 02 Falling 03 Rising 14 Falling 1


Setup and Hold Time Estimation

A typical definition of the setup time, i.e. the time required

to guarantee the correct operation of a flip-flop, is too

loose to be used for characterization. Instead, we use a

stricter definition of the setup time based on earlier works

[9,11]. The setup time of a flip-flop is the time clearance

between the D input and the clock edge such that any later

arrival of the input signal would incur at least 30% of

additional delay to the reference clock-to-output delay.

The reference clock-to-output delay is the one in which

the input signal is applied sufficiently earlier than the

clock edge. It is possible that the setup time can be a

negative value. Since various synthesis or high-level

timing estimation tools cannot properly deal with negative

setup times, negative setup times are replaced with zero.

The hold time is defined in a similar manner as setup

time, i.e. the clearance after the clock edge for the input

such that if the input comes any earlier, the clock-to-

output delay increases by 30 % or higher above the

reference case. In general, hold time needs to be

characterized only if it is at least as long as the clock-to-

output delay. For master–slave structures including most

of the popular static CMOS implementations, the case

seldom occurs. Hence, the hold time was not characterized

in our work.

The procedure to measure the setup time is as follows.

First, apply the input well ahead of the clock edge and

measure the reference clock-to-output delay. Next, vary

the transaction time of the input until the clock-to-output

delay is 30% longer than the reference delay. The

optimization feature of HSPICE, for instance, could be

used for this purpose. The clock rise/fall time should be

made reasonably slow to guarantee a reasonably

pessimistic estimate. We used rise/fall times of 0.5 ns for

0.5mm technology and 0.4 ns for 0.25mm technology,

respectively, both for the clock and for the D input.

Tristate Buffers and Inverters

Tristate cells are basically characterized in the same way

as other combinational cells except for three consider-

ations, namely enable/disable delays, charge injection,

and output capacitance. The three considerations are

described in the following three sections.

Enable and Disable Delays

Enable and disable delays between the enable signal and

the output are defined as follows:

. Enable-rise delay: Enable signal to the Z-to-0

transition at the output.

. Enable-fall delay: Enable signal to the Z-to-1 transition

at the output.

. Disable-rise delay: Enable signal to the 0-to-Z


. Disable-fall delay: Enable signal to the 1-to-Z


It should be noted that several synthesis tools may not

define disable delays under the assumption that disable

delays are always shorter than enable times. Hence, they

are not observed nor have any effects during the normal

circuit operation.

Charge Injection

The charge injection phenomenon makes a tristate

buffer/inverter difficult to characterize in isolation; rather,

two or more cells need to be characterized while driving

the same node. Consider the tristate buffer and waveforms

shown in Fig. 3. The circuit enclosed by the dashed line in

Fig. 3(b) denotes the tristate output stage. When Ven is

deasserted at 2.5 ns, Vout is left floating, and the charge

accumulated at the channels of QP2 and QN1 are dumped

into the node Vout. Therefore, the voltage Vout exceeds the

rail voltage at 3 ns and cannot be used to characterize a

propagation delay of logic values. Note that voltage levels

of signals should stay within the rails to be used for a logic

delay characterization.

In order to address the problem, we propose a parallel

connection of two buffers which are driven by opposite

logic values as shown in Fig. 4(a). All the five buffers are

identical. One buffer in the circuit is always active, so that

the node Vop is limited to the rail voltage. The remaining

problem is how to obtain the intrinsic delay di of the

buffers. Since a buffer is a load of the other buffer for the

circuit in the figure, the propagation delay D1 measured

from the circuit is the case with one buffer. Consider the

FIGURE 3 Tristate cell exhibiting charge injection condition.


circuit shown in Fig. 4(b), in which the circuit inside the

dashed line is a true input block. When A2 and A3 buffers

switch simultaneously, the effective load is a half of B2

buffer. Let us denote D2 as the propagation delay of the

circuit. The difference D1 2 D2 is the transition delay due

to a half load of B2 buffer assuming the transition delay is

proportional to the load. Under the assumption, the

intrinsic delay (which is the delay without any load) is

obtained as D2 2 ðD1 2 D2Þ or 2D2 2 D1:The above parallel circuits in Fig. 4 can also be used for

resistance estimation. Consider the estimation of the

output resistance Rout, normal on the normal input (i.e. Vip

node in Fig. 4) of an enabled buffer. Suppose that a

capacitive load CL is attached at the output of the circuit in

Fig. 4. Let us denote DL as the propagation delay with the

capacitive load and D1 the propagation delay without

the capacitive load (i.e. the original circuit). Then, Rout

on the normal input is obtained as ðDL 2 D1Þ=CL: We

defer the estimation of output resistance on the enable

input to the next section.

The estimation of input capacitances of tristate buffers

also uses the parallel circuit in Fig. 4. It is important to note

that the input capacitance is estimated only for the enabled

buffer of the parallel circuit, since the disabled one does not

make the output change. The input capacitance estimation

for the normal input Vi of the enabled buffer is the same as

other cells described in the section on “Capacitance

Estimation”. There are four possible cases for the enable

input, namely, enable-rise, enable-fall, disable-rise, and

disable-fall (refer to “Enable and Disable Delays”

Section). The estimated value is the average of the four

capacitances obtained from the four cases.

Output Capacitance

The outputs of tristate buffers/inputs may be tied together,

and inactive ones exhibit output capacitances to the active

one. Therefore, it is necessary to consider the output

capacitance of an inactive buffer/inverter, which is a

unique feature of tristate cells. The estimation procedure is

the same except the voltage source is applied to the output

of an inactive cell under the normal input Vi ¼ 0 and 1.

The output resistances Rout, enable on the enable input

can use the output capacitance obtained above. The

propagation delay for the parallel circuit in Fig. 4(a) is

estimated by applying an enabling input under the normal

input V ip ¼ 0 and V ip ¼ 1: By subtracting intrinsic delay

di (obtained earlier) from the propagation delay, the

transition delay dt is obtained. Further, the output

resistance is computed as Rout; enable ¼ di=Cout; where

Cout is the output capacitance. The disable delay was not

measured in our work, as the disabling process is not

directly observable in terms of logic values. We assume

that the disabling process is sufficiently fast to avoid

conflicts on logic values.

Limitations and Problems

The main problem of the proposed method lies in the slope

sensitivity. We assumed that the slope delay is linear to an

input slope. However, the assumption fails for extreme

cases. For example, if the input waveform is very slow, the

total delay—including slope delay—is negative [8]. In

fact, some cells exhibit negative slope dependency for the

input range in our works.

Another problem arises if the cells being characterized

display a considerable variation in output slope even under

the loadless condition. In such a case, the driving cell

should be ideally an “average” cell instead of the identical

cell. Although it is surmised that the error resulting from

our approach should average out to zero for a large cell

library, it may pose some problem for significantly

different types of cells (such as an inverter drives a six-

input NOR gate).

Finally, the proposed approach becomes complex for

functionally complicated cells with large numbers of

input-to-output paths. In such a case, one may select

some typical transitions. Finding what would constitute

typical transitions, however, is beyond the scope of our

work.

PROPOSED POWER CHARACTERIZATION

TECHNIQUE

There are three parameters relevant to power estimation,

namely input/output capacitance, internal energy, and

static power. Since determination of capacitances is already

performed during timing characterization, we discuss

characterization techniques for the latter two parameters in

this section.

Static Power Estimation

We adopted the following static power estimation method.

The measurement of static power should be performed

long after all transients have settled down. For most cells,

a wait time of 20 ns (after the last input transition) should

be adequate.

FIGURE 4 Delay determination of tristate circuits.


. For a relatively simple combinational cell, the power is

estimated for all possible input combinations, and

averages are calculated for both output high and low. It

is assumed that all inputs are equally likely in this

approach.

. For sequential cells, the estimation is made for both

output high and low, using only “normal mode”

operations—for instance, only the D input and the

clock input for a D flip-flop. The set/reset inputs and

the scan operation should not be used, as they are

inactive during the normal operation.

It should be noted that the static power dissipation is

insignificant unless the circuit is likely to stay dormant for

a long time. Hence a characterization may not be needed at

all for most cells and operating conditions.

Internal Energy Measurement

The internal energy is measured by effecting a logic

transition on a cell’s input and by measuring the resulting

energy dissipation by the cell. Only one input should

change at any given time. For modeling purposes, the

internal energy is lumped into each of the cell’s input and

output pins, instead of the entire cell. For ease of

mathematical analysis, internal energy is divided into two

kinds:

i) input energy, representing the internal energy

dissipated by a cell due to input transitions which

do not cause output changes, and;

ii) output energy, representing the additional internal

energy caused by output transitions.

The input energy is lumped into the input pins, while

the output energy is lumped into the output pins. Hence, if

an input change causes an output change, it will dissipate

not only the input energy (due to the input change), but

also the output energy (because of the output change); and,

if a load capacitance is present, also the switching energy,

(because the output change also causes the charging or

discharging of the load capacitance). However, if the input

change causes no output change, only the input energy is

dissipated.

Internal Energy Characterization for SimpleCombinational Cells

Typically, input energy is measured first, and then the

output energy is obtained by subtracting the appropriate

input energy from the total internal energy. Note that the

total internal energy is obtained from subtracting the

switching energy from total energy dissipation over a

certain period. Also, it may be sufficient to consider only

one input change at any given moment; it may be

unnecessary to consider simultaneous input changes, as

the increase in accuracy is limited. It also complicates the

mathematical model for parameter extraction.

Internal Energy Characterization for Combinationaland Tristate Cells

For simple combinational cells, input energy is negligible;

the cells dissipate substantial dynamic energy only if their

outputs actually change. The internal energy of a cell is a

function of the input signal transition delay and of the load

capacitance. Generally, the dependence is stronger

towards the input slope; however, with some certain

tools, it is more natural or even required to associate the

output energy with the load capacitance. For such cases,

one needs to choose a typical input slope, and then vary the

load capacitance.

For instance, the circuit and waveform in Fig. 5 was

used in our characterization of a three-input NAND gate.

The NAND gates inside the dashed lines degrade the input

transition to represent a typical case, and the shaded

NAND gate is another load of the circuit. We assumed that

a typical load is two. The load capacitance of the cell

being evaluated, meanwhile, was varied between 0 and 15

times the input capacitance. This is an adequately wide

range.

For combinational cells with internal loads (such as a

complementary CMOS multiplexer or an AND-OR-

INVERT gate) or sequential cells, the internal energy

may be dissipated even if no output transition actually

occurs. For such a cell, it is prudent to estimate the average

energy for all possibilities. Therefore, to compute the

internal input energy, we apply all possible input

combinations, or a subset of them that does not change

the output. We then integrate the power dissipation.

To compute the internal output energy, we apply

input combinations which change the output and

measure the energy dissipated until all transients have

settled out, say E. The output energy is obtained by

subtracting the input energy and the switching energy

at the load (1/2)CLV 2 from E.

Tristate cells are characterized in the same way as

complex combinational cells, except we should consider

enable/disable energy and the tristate cells are tied

together. As an example, the circuit and waveforms in

Fig. 6 shows the characterization of noninverting tristate

buffers. The same values of input transition delays and

load capacitances as used in the D flip-flop characteri-

zation was used again here. The total capacitive loads of

the buffers also include those buffers’ own output

capacitances.

Internal Energy Characterization for Sequential Cells

We describe the power estimation of sequential cells. We

consider only one input change at any given time, and treat

the clock as an input.

1. Estimate the clock energy, i.e. the internal energy

associated with the clock change without any

associated output change. The clock energy may

vary for rising and falling transitions and also depends


on the logic values at the outputs. One value of the

clock energy is usually sufficient, using typical

rise/fall-time and typical loads on the flip-flop’s

outputs.

2. Estimate the input energy of synchronous inputs, i.e.

the internal energy caused by input transitions which

do not cause any output change, such as D input for a

D flip-flop. The input energy may have different values

depending on whether the clock is high or low.

3. Estimate the output energy for cases where the output

transition is caused by a clock transition.

4. Estimate the output energy due to asynchronous inputs

such as asynchronous reset or set inputs.

The above sequence was chosen deliberately to avoid

any important energy, such as the clock and the

synchronous input/output, from having negative values.

As the possible orders in the sequence are not unique, the

energy values are also not unique either.

The power characterization of a rising-edge triggered D

flip-flop with asynchronous reset is shown in Fig. 7. The

clock and input slope variations were generated by varying

the load capacitance of inverters at (0–20) £ typical pin

capacitance range, while the load capacitance of the flip-

flop was varied in the (0–10) £ typical pin capacitance

range. The different ranges were chosen since it was

surmised that the quality of the clock signal is likely to be

rather poor in a cell-based design. It is assumed that the

clock period is sufficiently long that the transients decay

within (1/4) £ clock period T from the time that the

stimuli are first effected. Therefore, the toggle energies are

obtained by integrating the flip-flop’s power dissipation

within an integration period of (1/4)T. The sequence used

for our method is as follows:

1. Measurement of the clock energy. The power

dissipation is integrated between nT and ðn þ 1=4ÞT:Four cases of clock fall energy are considered at n ¼ 1,

2, 3, and 4, for DQ ¼ 00, 10, 11, and 01, respectively.

The clock fall energy is the average of the four cases.

Two cases are considered for the clock rise energy,

namely the Q ¼ 0 and Q ¼ 1 at 1.5 T and at 3.5 T, and

the average of the two cases is taken. If one chooses to

simply use the average clock energy, the value used

would be the average of the clock rise and the clock fall

energy. The four time intervals for power measure-

ments are shaded on the D input in the figure.

2. Measurement of the D energy. Four cases, rise/fall

transition on D, under the clock ¼ 0 or 1, should be

considered.

3. Measurement of the Q energy. Both Q rise and fall

energy should be measured. P(xff) is integrated to

obtain E(xff). For instance, the rise energy of Q is

EðQriseÞ2 ð1=2ÞCloadV 2 2 ðrising clock energyÞ;where E(Qrise) is the measured energy dissipation of

the flip-flop.

4. Measurement of the set/reset energy. It should be

measured separately for rising and falling cases. For

the reset operation, it should measure four cases under

Q ¼ 0; 1 and clock ¼ 0; 1. The four durations are gray

in the waveform in Fig. 7. This is also true for the set

operation.

FIGURE 5 Power characterization of a 3-input NAND.


Depending on the structure of the flip-flop, some of the

above measurements may be unnecessary; for instance,

for a rising-edge triggered master–slave D flip-flop, the

master is inactive when the clock is high, and hence input

(D) energy would be 0 when clock is high and need not be

measured. Further, if a lower accuracy is deemed

acceptable, some of those measurements, such as the

reset energy, could be omitted as well.

Limitations of the Power Measurement Technique

The main problem for the proposed approach is the

possibility that some of the energy will be found negative.

For example, the output energy of the flip-flop is

calculated by subtracting the clock energy and load

switching energy from the total energy dissipation of the

flip-flop. While this is conceptually correct, it could result

in negative output energy. Most simulation-based power

estimation tools require that all energy be nonnegative. In

that case, the energy should be replaced with zero, and this

increases the inaccuracy. Another limitation of the

proposed approach is that the accuracy may suffer for

functionally complicated cells due to the lack of a

systematic selection mechanism of test patterns. In that

case, one may use the structure information of a cell to

obtain a reduced set of characterization pattern [2,5,6].

FIGURE 6 Circuit and waveforms for power characterization of tristate buffers.


EXPERIMENTAL RESULTS AND DISCUSSIONS

As previously mentioned in “Limitations and Problems”

Section, a problem with the proposed model is slope sen-

sitivity. The dependence of the slope delay on the input

transition delays is modeled as linear in our method, but it

may not hold well for a wide range of input transition

delay.

The circuit in Fig. 8(a) was used to experiment the

accuracy of the proposed method for slope delays.

The delay of the gray 2-input NAND gate was measured

and compared with the prediction of our model. The input

transition delay was varied by varying CL2 in the 0–50 fF

range, where the typical load of an input pin is about

5 fF. Two different values of CL3, 5 fF for a light load and

50 fF (< fan-out of 10) for a very heavy load were used.

The error is defined as:

Error ¼ ½1 2 ðPredicted delay=Measured delayÞ�

£ 100% ð10Þ

Figure 8(b) and (c) shows the experimental results, the

worst inaccuracy attained for the rise delay is about 7.3%

for the light-load and about 11.6% for the heavy-load. For

fall delay, the accuracy is 6.8% for the light load and 8.9%

for the heavy load. In both cases, the worst-case error was

negative, signifying that the prediction of our model is

somewhat optimistic. However, despite the simplified

assumption of linearity, our method attains a reasonably

good accuracy.

To evaluate the accuracy of our model on real circuits,

we experimented with a circuit of two identical 4-bit

counter in a 0.25mm technology. The circuit was

described in VHDL, synthesized using a library

characterized with our method and then laid out using a

placement and routing tool. Although the two counters are

identical in the VHDL description, the two counters are

slightly different in delay and power dissipation due to

differences in wire loads. Hence, the slower one dictates

the critical path delay. We experimented with three

different versions of the netlists of the circuit. The first one

is the gate level netlist, in which the delay and power

FIGURE 7 Circuit and waveform for power measurement for a rising-edge triggered D flip-flop with high-active reset/set.


dissipation parameters of gates were obtained using the

proposed model. The gate level netlist does not contain

parasitic wire delays. The second one is the transistor

netlist extracted from the layout. The netlist contains

parasitic capacitance on wires. It should be noted that our

tool does not extract parasitic resistance. The third netlist

is the same as the second one except the parasitic.

The critical path delay for the gate level circuit was

given by a commercial synthesis tool. Further, HSPICE

simulations were performed to obtain the critical delays

and power estimation for the two transistor netlists. All

simulations for power estimation of the circuits were

performed for the clock running at 100 MHz. Rise and fall

times of 0.4 ns were used for HSPICE simulation, while

gate level simulations used zero-delay model, as it is the

typical setting for large circuits.

The simulation results on critical path delays and power

estimation is shown in Table II. The critical path delay

based on our model is 1.52 ns, while it is 1.74 ns for SPICE

simulation with capacitive parasitics and 1.44 ns without

capacitive parasitics. Since the gate level netlist based on

our model does not consider the wire load, it may be fair to

compare our model with the HSPICE simulation without

the parasitic capacitances of the wire load. In such a

case, the error of our model is 5.6%, which is fairly

accurate. The accuracy of the proposed model for power

estimation is 17.9% compared with SPICE simulation

with parasitics and is reduced to 4.2% without parasitics.

TABLE II Critical path delay and power estimates

Gate-level netlist Transistor-level netlist with capacitive parasitics Transistor-level netlist without capacitive parasitics

Critical path delay 1.52 ns 1.74 ns (212.6%) 1.44 (5.6%)Power dissipation 543mW 661mW (217.9%) 567mW (24.2%)

FIGURE 8 Prediction error for the proposed delay model.


Again, our model for power estimation is fairly accurate

compared with SPICE simulation without parasitics.

Based on the two experiments, we conclude the following

ones. (i) Our model is fairly accurate for both delay and

power estimation. (ii) Characterization of the wire load

through back annotation is essential to increase accuracy

on both delay and power estimation.

SUMMARY

In this work, we proposed a method to characterize delay

and power dissipation of cells based on the linear timing

model. Our experimental results performed on 4-bit

counters show that the error on the critical delay for the

proposed method is 5.6% compared with SPICE

simulation without considering parasitic capacitance on

wires and the error on power estimation is 4.2%. The

accuracy of our model is sufficiently good to be practical.

The experiment reveals that consideration of parasitic

capacitances through back annotation is sensitive to delay

and power estimation at the gate level. Further research is

necessary in this area.

Acknowledgements

The authors would like to thank the current and former

members of the Virginia Tech VLSI for Telecommunica-

tions (VTVT) Group at the Bradley Department of

Electrical and Computer Engineering, Virginia Polytech-

nic Institute and State University, especially, Han Bin

Kim, Suk Won Kim, Carrie Aust, Meenatchi Jagasiva-

mani, Jia Fei, Nate August, Steve Richmond, and Andrew

Gouldey, for all their help and suggestions along the

course of this work.

References

[1] Auvergne, D., Daga, J.M. and Rezzoug, M. (2000) “Signaltransition time effect on CMOS delay evaluation”, IEEETransactions on Circuits and Systems—I: Fundamental Theoryand Applications 47(9), 1362–1369.

[2] Cheng, M., Irwin, M., Li, K. and Ye, W. (1999) “Powercharacterization of functional units”, Conference Record ofthe Thirty-Third Asilomar Conference on Signals, Systems, andComputers, IEEE 1, 775–779.

[3] Cirit, M.A. (1991) “Characterizing a VLSI standard cell library”,Proceedings of Custom Integrated Circuit Conference, IEEE,25.7.1–25.7.4.

[4] Eshraghian, K. and Weste, N.H.E. (1994) Principles of CMOSVLSI Design: A System Perspective (Addison-Wesley, Reading,MA).

[5] Jou, J.Y., Lin, J.Y. and Shen, W.Z. (1999) “A structure-orientedpower modeling technique for macrocells”, IEEE Transactions onVLSI 7(3), 380–391.

[6] Jou, J.Y., Lin, J.Y. and Shen, W.Z. (1990) “A power modeling andcharacterization method for the CMOS standard cell library”,

Digest of Technical Papers, International Conference on ComputerAided Design, IEEE, 400–404.

[7] Patel, D. (1990) “CHARMS: characterization and modeling systemfor accurate delay prediction of ASIC designs”, Proceedings ofCustom Integrated Circuit Conference, IEEE, 9.5.1–9.5.6.

[8] Shoji, M. (1988) CMOS Digital Circuit Technology (Prentice-Hall,Englewood Cliffs, NJ).

[9] Stojanovic, V. and Oklobdzija, V.G. (1999) “Comparative analysisof master-slave latches and flip-flops for high-performance and low-power systems”, IEEE Journal of Solid-State Circuits 34(4),536–548.

[10] Synopsys Inc., Library Compiler User Guide, Vol. II, ChaptersI–III, 1999.

[11] Unger, S.H. and Tan, C.J. (1986) “Clocking schemes for high-speeddigital systems”, IEEE Transactions on Computers C-35(10),880–895.

[12] Yeap, G.K. (1998) Practical Low Power VLSI design (KluwerAcademic Publishers, Dordrecht).

Dong Sam Ha received the B.S. degree in electrical

engineering from Seoul National University, Korea, in

1974, and the M.S. and Ph.D. degrees in electrical

engineering from the University of Iowa, Iowa City, in

1984 and 1986, respectively.Since Fall 1986, he has been a

faculty member of the Bradley Department of Electrical

Engineering, Virginia Polytechnic Institute and State

University, Blacksburg. Currently, he is Professor with the

department. Prior to his graduate study, he was a Research

Engineer for the Agency for Defense Development in

Korea from 1975 to 1979. While on leave from May to

December of 1996, he was with the Semiconductor

Research Center, Seoul National University, Seoul, Korea,

where he investigated built-in self-test synthesis. Along

with his students, he developed four computer-aided

design tools for digital circuit testing. The source code for

these four tools has been distributed to over 150

universities and research institutions worldwide, and the

tools have been used for various research and teaching

purposes at numerous universities.His research interests

include low-power VLSI design for wireless communi-

cation, low-power analog and mixed-signal VLSI design,

and low-power VLSI system design for wireless video. He

can be reached at E-mail: [email protected].

Jos Budi Sulistyo was born in Jember, Indonesia. He

obtained the B.S. degree in electrical engineering from

Texas A&M University 1993. Following his graduation,

he joined the Indonesian National Atomic Energy Agency,

at which he is currently a staff member. In August 1998,

Jos joined Virginia Tech VLSI for Telecommunications

(VTVT) group at the Bradley Department of Electrical

and Computer Engineering, Virginia Polytechnic Institute

and State University. Currently, he is enrolled in the Ph.D.

program there. His current research interest is develop-

ment of low power library cells including arithmetic units.

He can be reached at E-mail: [email protected].


International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks


A New Characterization Method for Delay and Power...

Documents

Transcript of A New Characterization Method for Delay and Power...