Efficient Adaptive Hold Logic Aging-Aware Reliable NR4SD...

Efficient Adaptive Hold Logic Aging-Aware Reliable NR4SD

Multiplier Design

P.Vishnuteja and Dr. CH. Ravi kumar

M.Tech (VLSI & ES),Department of ECE, Prakasam Engineering college

Professor, Department of ECE, Prakasam Engineering college

[email protected]

Abstaract

Digital multipliers area unit among the most essential arithmetic purposeful devices.

the common performance of those systems depends upon at the turnout of the multiplier

factor. within the meantime, the negative bias temperature instability impact happens

whereas a pMOS junction transistor is beneath negative bias (Vgs = −Vdd), increasing the

edge voltage of the pMOS junction transistor, and reducing multiplier factor pace. an

identical development, positive bias temperature instability, happens once associate

degree nMOS junction transistor is beneath positive bias. every impact degrades junction

transistor pace and within the future, the device may additionally fail because of temporal

arrangement violations. Therefore, it's essential to style dependable high-overall

performance multipliers. during this paper, counsel associate degree aging-aware

multiplier factor model with a novel adaptive hold logic (AHL) circuit. The multiplier

factor is ready to supply higher turnout through the variable latency and should modify the

AHL circuit to mitigate overall performance degradation this is often thanks to the aging

impact. what is more, the planned structure is often applied to a Pre-Encoded NR4SD

multiplier factor.

Keywords: Adaptive hold logic (AHL), Positive bias temperature instability (PBTI),

Negative bias temperature instability (NBTI), Reliable multiplier.

1. Introduction Digital multipliers area unit amongst the most essential arithmetic purposeful units in

several applications, beside the separate cos transforms, digital filtering and Fourier

transform. The turnout of these applications depends on multipliers, and if the multipliers

area unit too gradual, the performance of complete circuits are reduced. Moreover,

negative bias temperature instability (NBTI) happens whereas a pMOS junction transistor

is to a lower place negative bias (Vgs = −Vdd). On this case, the interaction between

inversion layer holes and hydrogen-passivated Si atoms breaks the Si–H bond generated

at some stage within the oxidation method, generating H or H2 molecules. once those

molecules diffuse away, interface traps area unit left. The increased interface traps

between the gate compound interface and also silicon material end in improved threshold

voltage (Vth), decreasing the circuit change speed. once the biased voltage is eliminated,

the alternative response happens, reducing the NBTI impact. However, the reverse

response doesn't get obviate all the interface traps generated for the period of the strain

phase, and Vth is extended within the future. Therefore, it's necessary to style a

dependable superior multiplier factor. The corresponding impact on associate degree

nMOS junction transistor is positive bias temperature instability (PBTI), that happens

while associate degree nMOS junction transistor is to a lower place positive bias. As

compared with the NBTI impact, the PBTI impact is way smaller on oxide/polygate

transistors, and consequently is generally unseen.

International Journal of Research

Volume VIII, Issue V, MAY/2019

ISSN NO:2236-6124

Page No:2371

However, for high-k/metal-gate nMOS transistors with huge rate saddlery, the PBTI

impact cannot be neglected. In reality, it's been shown that the PBTI impact is further

monumental than the NBTI impact on 32-nm high-k/metal-gate processes [1] – [4].

A traditional methodology to mitigate the aging impact is overdesign [5], [6], together

with such matters as guard-banding and gate oversizing; but this approach is often terribly

pessimistic and space and power inefficient. to stay off from this drawback, several NBTI-

aware methodologies were planned. associate degree NBTI-aware generation mapping

technique become planned in [7] to ensure the performance of the circuit throughout its

period.

In [8], associate degree NBTI-aware sleep junction transistor became designed to

reduce the ageing results on pMOS sleep-transistors, and also the period balance of the

power-gated circuits into consideration become improved. Wu and Marculescu [9] planned

a joint logic restructuring and pin rearrangement approach, that is based mostly all on

police investigation useful symmetries and junction transistor stacking effects. in addition,

they planned associate degree NBTI optimization technique that taken into thought path

sensitization [12]. In [10] and [11], dynamic voltage scaling and body-basing techniques

were planned to scale back power or extend circuit life. Those techniques, however, need

circuit amendment or don't provide optimization of distinctive circuits.

Traditional circuits use necessary path delay because the overall circuit clock cycle

with the intention to perform effectively. But the likelihood that the vital ways area unit

activated is low. In most cases, the trail delay is shorter than the vital route. For these

noncritical ways, the employment of the crucial path delay because the general cycle length

can bring on smart sized temporal arrangement waste. Therefore, the variable-latency style

becomes planned to scale back the temporal arrangement waste of ancient circuits. The

variable latency style divides the circuit into 2 parts: 1) shorter ways and 2) longer ways.

Shorter ways will execute effectively in one cycle, whereas longer ways want 2 cycles to

execute. while shorter ways area unit activated typically, the common latency of variable-

latency styles is more than that of ancient styles. As associate degree instance, various

variable-latency adders were projected victimization the hypothesis approach with

blunders detection and healing [13] – [15]. a brief path activation operates set of rules

become projected in [16] to boost the accuracy of the hold logic and to optimize the

performance of the variable-latency circuit. A coaching planning set of rules turned into

projected in [17] to schedule the operations on non-uniform latency sensible devices and

enhance the performance of terribly long instruction word processors. In [18], variable

latency pipelined multiplier factor design with a booth algorithmic rule became projected.

In [19], process-variant tolerant structure for arithmetic units was projected, wherever the

impact of method variation is taken into account to growth the circuit yield. Similarly, the

crucial ways area unit divided into 2 shorter ways that might be unequal and also the clock

cycle is close to the delay of the longer one. Those analysis styles are capable of reduce the

temporal arrangement waste of standard circuits to enhance performance, however they

failed to take into consideration the aging impact and could not alter themselves for the

period of the runtime. A variable-latency adder layout that considers the aging impact

become projected in [20] and [21]. However, no variable-latency multiplier factor style

that considers the aging impact and might alter dynamically has been accomplished.

1.1. Paper Contribution

In this paper, advise associate degree aging aware reliable multiplier factor style with

novel adaptive hold logic (AHL) circuit. The multiplier factor relies at the variable-latency



ISSN NO:2236-6124

Page No:2372

technique and would possibly alter the AHL circuit to realize reliable operation below the

have a control on of NBTI and PBTI results. To be precise, the contributions of this paper

area unit summarized as follows:

1) Novel variable-latency multiplier factor structure with associate degree AHL circuit.

The AHL circuit will verify whether or not or not the enter designs need one or 2 cycles

and might regulate the deciding criteria to form sure that there's minimum according

performance degradation when sizable aging occurs;

2) Complete analysis and distinction of the multiplier’s overall performance underneath

totally different skip numbers to reveal the effectiveness of our projected structure;

3) associate degree aging-aware reliable multiplier factor technique that's acceptable for

large multipliers. Despite the very fact that the take a look at is accomplished in 4-, 8-, 16-

and 32-bit multipliers, our projected design will be effortlessly prolonged to massive

designs;

4) The experimental outcomes show that our projected structure with the sixteen × sixteen

and 32 × 32 Non-Redundant radix-4 Signed Digit (NR4SD) multipliers will acquire

exceptional overall performance development compared with the sixteen ×16 and thirty

two × thirty two Non-Redundant radix-four Signed-Digit (NR4SD) multipliers.

The paper is ready as follows. phase a pair of introduces the overture of the Non-Redundant

radix-four Signed-Digit (NR4SD) multiplier factor and NBTI/PBTI models. phase three

details the aging-aware reliable multiplier factor based totally at the Non-Redundant radix-

four Signed-Digit (NR4SD) multiplier factor. The experimental results and comparisons

area unit equipped in phase four. phase five concludes this paper.

2. Overture

2.1. Modified Booth Algorithm

Modified Booth (MB) may be a redundant radix-4 secret writing technique [22], [23].

Considering the multiplication of the 2’s complement numbers A, B, all consisting of n=2k

bits, B is painted in MB sort as:

B = < bn-1 . . . b0 >2’s= -b2k-122k-1+

22

0

k

i

bi2i

= <bk-1MB…b0

MB>MB=

1

0

k

j

bjMB22j (1)

Digits bjMB ∈ {- 2,-1, 0, +1, +2}, 0 ≤ j ≤ k-1, are formed as follows:

bjMB = -2b2j+1+b2j+b2j-1 (2)

In which b-1 = 0. each MB digit is delineated by victimization the bits s, one and two

(table 1). The bit s suggests if the digit is negative (s=1) or positive (s=0). One indicates if

absolutely the fee of a digit equals 1 (one=1) or not (one=0). two suggests if fully the fee

of a digit equals 2 (two=1) or not (two=0). the employment of those bits, to calculate the

MB digits bjMB as follows:

bjMB= (-1)sj.(onej + 2twoj). (3)

Equations (4) form the MB encoding signals.

sj = b2j+1; onej = b2j-1 ⊕ b2j;

twoj = (b2j+1 ⊕ b2j) ^ ~(onej):



ISSN NO:2236-6124

Page No:2373

Table 1. Modified Booth Encoding

b2j+1 b2j

b2j-1

bjMB sj onej Twoj

0 0 0 0 0 0 0 0 0 1 +1 0 1 0 0 1 0 +1 0 1 0 0 1 1 +2 0 0 1 1 0 0 -2 1 0 1 1 0 1 -1 1 1 0 1 1 0 -1 1 1 0 1 1 1 0 1 0 0

2.2. Non-Redundant Radix-4 Signed Digit Algorithm

In this section have a bent to gift the Non-Redundant radix-4 Signed-Digit (NR4SD)

technique. As in MB form, the vary of partial merchandise is reduced to half. whereas

secret writing the 2’s complement amount B, digits bjNR - take thought of one in all four

values: ∈{-2, -1, 0, +1} or bjNR+ take one in all four values: ∈ {-1, 0, +1, +2} at the

NR4SD- or NR4SD+ formula, severally. solely four specific values ar used and now not 5

as in MB set of rules, which ends up in 0 ≤ j ≤ k-2. As have to be compelled to cowl the

dynamic vary of the 2’s complement kind, the most Brobdingnagian digit is MB encoded

(i.e.,bk-1MB∈{-2,-1, 0, +1, +2).The NR4SD- and NR4SD+ algorithms ar illustrated well

in Figure 1 and 2, severally

(a)

(b)

Figure 1. Block Diagram of the NR4SD-

Encoding Scheme at the (a) Digit and

(b) Word Level.

(a)



ISSN NO:2236-6124

Page No:2374

(b)

Figure 2. Block Diagram of the NR4SD+

Encoding Scheme at the Word Level.

2.2.1 NR4SD- Algorithm

Step 1: ponder the initial values j = 0 and c0=0.

Step 2: Calculate the convey c2j+1 and also the add n+2j of a half of Adder (HA) with inputs

b2j and c2j (Figure 1a).

c2j+1 = b2j ^ c2j; n+2j = b2j ⊕ c2j (4)

Step3: Calculate the completely signed carry c2j+2 (+) and so the negatively signed add n-

2j+1 (-) of a Half Adder* (HA*) with inputs b2j+1 (+) and c2j+1 (+) (Figure 1a). The

outputs c2j+2 and n-2j+1 of the HA* relate to its inputs as follows:

2c2j+2 - n-2j+1 = b2j+1 + c2j+1:

The following Boolean equations summarize the HA*

operation:

c2j+2 = b2j+1 ^ c-2j+1, n-

2j+1 = b2j+1 ⊕ c2j+1.

Step 4: Calculate the value of the bjNR - digit.

bjNR- = -2n-

2j+1 + n+2j. (5)

Equation (5) shows results from the n-2j+1 is negatively signed and n+

2j is completely signed.

Step 5: j: = j + 1.

Step 6: If (j < k-1), then forward to Step two. If (j = k-1), inscribe that the foremost

important worth supported the MB technique and considering the 3 consecutive bits to be

b2k-1, b2k-2 and c2k-2 (Figure 1b). If (j = k), stop.

Table two shows however the NR4SD- digits area unit fashioned. Equations (6) show

however the NR4SD- encryption signals one+j, one-j and two-j of Table two area unit

generated.

one+j = ~ (n-

2j+1) ^ n+2j, one-

j = n-2j+1 ^ n+

2j,

two-j = n-

2j+1 ^ n+2j (6)

The minimum and most limits of the dynamic direct the NR4SD- kind area unit -2n-1 - 2n-

3 - 2n-5 -…. - 2 < -2n-1 and 2n-1 + 2n-4 + 2n-6 +…… + 1 >2n-1-1. The NR4SD- kind has

larger dynamic vary than the 2’s complement kind.

2.2.2 NR4SD+ Algorithm

Step 1: ponder the initial values j = 0 and c0=0.

Step 2: Calculate the carry completely signed worth c2j+1 (+) and also the negatively

signed worth add n-2j (-) of a HA* with inputs b2j (+) and c2j (+) (Figure 2a). The carry

c2j+1 and also the add n-2j of the HA* relate to its inputs as follows:

2c2j+1 - n-2j = b2j + c2j ,

The outputs of the HA* area unit will calculate at gate level within the following equations

as:

c2j+1 = b2j ˅ c2j , n-2j = b2j ⊕ c2j ,



ISSN NO:2236-6124

Page No:2375

Step 3: Calculate the carry c2j+2 and also the add n+2j+1 of a HA with inputs b2j+1 and

c2j+1

c2j+2 = b2j+1 ^ c2j+1 , n+2j+1 = b2j+1 ˅ c2j+1 ,

Step 4: Calculate the worth of the bjNR+ digit.

bjNR+ = 2n+

2j+1 - n-2j (7)

Equation (7) results from the n+2j+1 is completely signed and n-

2j is negatively signed.

Step 5: j := j + 1.

Step 6: If (j < k-1), move to Step two. If (j = k-1), inscribe the foremost important worth

per MB technique and considering the 3 consecutive bits to be b2k-1, b2k-2 and c2k-2

(Figure 2b). If (j = k), stop.

Table three shows however the NR4SD+ digits area unit fashioned. Equations (8) show

however the NR4SD+ encryption signals one+j, one-j and two+j of Table four area unit

generated.

one+j = n+

2j+1 ^ n-2j;

one-j = n+

2j+1 ^ n-2j;

two+j = n+

2j+1 ^ n-2j: (8)

The minimum and most values of the dynamic direct the NR4SD+ kind area unit -2n-1 - 2n-

4 - 2n-6 -……. -1 < -2n-1 and 2n-1+2n-3+2n-5+…. +2 > 2n-1-1.

Table 2. Numerical Examples of the Encoding Techniques

2’s

Complement

10000000 10011010 01011001 01111111

Integer -128 -102 +89 +127

NR4SD− 2̄000 1̄1 2̄¯̄11 ̄2 2 2̄ 2̄1 200 1̄1

NR4SD+ 2̄000 2̄122 1121 200 1̄1

As determined within the NR4SD- encryption technique, the NR4SD+ kind has giant

dynamic selection than the two’s complement kind. pondering the eight-bit 2’s

complement selection N, Table two shows the restriction values -28 = -128, 28 - 1 =127,

and 2 normal values of N, and presents the MB, NR4SD- and NR4SD+ digits that end

product once creating use of the corresponding encryption ways to each value of N taken

into thought. A bar on top of the negatively signed digits so as to tell apart them from the

completely signed ones.

2.3. Pre-Encoded NR4SD Multipliers Design

The device style for the pre-encoded NR4SD multipliers is conferred in Figure 6. 2

bits at the instant area unit keep in ROM: n-2j+1, n+2j (Table 3) for the NR4SD- or n+2j+1,

n-2j (Table 4) for the NR4SD+ kind. On this way, will scale back the storage demand to

n+1 bits according to constant whilst the corresponding memory needed for the pre-

encoded MB theme is 3n/2 bits per constant. Consequently, the number of saved bits is the

image of that of the standard MB style, besides for the utmost widespread digit that desires

an additional bit as its so much MB encoded. Compared to the pre-encoded MB number,

within which the MB encryption blocks area unit unheeded, the pre-encoded NR4SD

multipliers would like extra hardware to get the values of (6) and (8) for the NR4SD- and

NR4SD+ kind, severally. The NR4SD encryption blocks of Figure four places into impact

the electronic equipment of Figure 5.

Partial product is currently given by the relation:



ISSN NO:2236-6124

Page No:2376

P =A.B =COR+

1

1

PPk

j

J22J (9)

COR =

1

0

k

j

Cin,j22J + 2n(1 +

1

0

k

j

22j+1) (10)

Every partial product of the pre-encoded NR4SD- and NR4SD+ multipliers is

dispensed based totally on Figure 3b and 3c, severally, apart from the PPk-1 that

corresponds to the vastest digit. As this digit is in MB type, will use the PPG of Figure 3a

creating use of the sj bit. The partial merchandise, well weighted, and also the correction

period (COR) of (10) square measure fed right into a CSA tree. The input carry cinj of (10)

is calculated as cinj = two-j ^ one-j and cinj = one-j for the NR4SD- and NR4SD+ pre-

encoded multipliers, severally, based totally on Tables two and three. The carry save output

of the CSA tree is eventually summed the usage of a fast CLA adder.

(a)

(b)

(c)

(d)

(e)

Figure 3. Generation of the ith Bit pj,i of PPj for a) Pre-Encoded MB Multipliers, b) NR4SD−,c) NR4SD+Pre-

Encoded Multipliers, and d ) NR4SD−, e) NR4SD+Pre-Encoded Multipliers after reconstruction.

Figure 4. System Architecture of the NR4SD Multipliers.



ISSN NO:2236-6124

Page No:2377

(a)

(b)

Figure 5 Extra Circuit Needed in the NR4SD Multipliers to

Complete the (a) NR4SD−and (b) NR4SD+ Encoding.

Table 3. NR4SD− Encoding

2’s

complement NR4SD−

form

Digit NR4SD-Encoding

b2j+1 b2j c2j c2j+2 n-2j+1

n+ 2j

bjNR- one+

j

one-

j

two- j

0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 +1 1 0 0

0 1 0 0 0 1 +1 1 0 0

0 1 1 1 1 0 -2 0 0 1

1 0 0 1 1 0 -2 0 0 1

1 0 1 1 1 1 -1 0 1 0

1 1 0 1 1 1 -1 0 1 0

1 1 1 1 0 0 0 0 0 0

Table 4. NR4SD+ Encoding

2’s complement NR4SD+ form Digit NR4SD+ Encoding

b2j+1 b2j c2j c2j+2 n+2j+1 n-

2j bjNR+ one+

j one- j Two+

j

0 0 0 0 0 0 0 0 0 0

0 0 1 0 1 1 +1 1 0 0

0 1 0 0 1 1 +1 1 0 0

0 1 1 0 1 0 +2 0 0 1

1 0 0 0 1 0 +2 0 0 1

1 0 1 1 0 1 -1 0 1 0

1 1 0 1 0 1 -1 0 1 0

1 1 1 1 0 0 0 0 0 0

3. Proposed Aging-Aware Multiplier Here advocates an aging-aware reliable multiplier factor layout with a unique adaptive

hold logic (AHL) circuit. The multiplier factor relies wholly at the variable-latency

approach and should regulate the AHL circuit to reap reliable operation beneath the have

an effect on of NBTI and PBTI effects.

To be distinctive, the contributions of this enterprise square measure summarized as

follows:

1) Novel variable-latency multiplier factor design with an AHL circuit. The AHL circuit



ISSN NO:2236-6124

Page No:2378

will confirm whether or not the input designs need one or 2 cycles and may modify the

judgement criteria to confirm that there is minimum performance degradation when nice

obtaining aging happens;

2) Comprehensive analysis and comparison of the multiplier’s performance beneath one-

of-a-kind cycle intervals to reveal the effectiveness of our planned structure;

3)AN aging-aware dependable multiplier factor style methodology this can be applicable

for large multipliers. Despite the very fact that the experiment is completed in 16- and 32-

bit multipliers, our planned structure is effortlessly extended to giant designs;

3.1. Proposed Architecture

Figure 6 indicates our planned aging-aware multiplier factor design, which includes 2

m-bit inputs (m may be a positive number), one 2m-bit output, one NR4SD- or NR4SD+

multiplier factor, 2m 1-bit Razor flip-flops [21], and an AHL circuit.

Inside the planned design, the NR4SD multiplier factors could also be tested by the vary

of zero’s in either the multiplicand or multiplicator to expect whether or not the operation

needs one cycle or 2 cycles to complete. whereas enter patterns square measure random,

the quantity of zero’s and one’s within the multiplicator and multiplicand follows a

traditional distribution. Consequently, the employment of the amount of zero’s or one’s

because the judgement criteria leads to similar results. later on, the 2 aging-aware

multipliers could also be applied the employment of comparable structure, and also the

distinction between the two multipliers lies inside the enter signals of the AHL. Razor flip-

flops is wont to find whether or not or not temporal arrangement violations arise before

subsequent input pattern arrives.

Figure 6. planned design (md suggests that

multiplicand; mr suggests that multiplicator)

Figure 7 suggests the data of Razor flip-flops. A 1-bit Razor flip-flop includes a main

flip-flop, shadow latch, XOR gate, and mux. the most flip-flop catches the execution final

result for the mix circuit employing a standard clock signal, and also the shadow latch

catches the execution result the usage of a delayed clock signal, that's slower than the

regular clock signal. If the barred little of the shadow latch is not the same as that of the

most flip-flop, this means the course delay of the present operation exceeds the cycle

amount, and also the main flip-flop catches an incorrect result. If errors arise, the Razor

flip-flop can set the error signal to a minimum of one to give notice the system to re-execute

the operation and give notice the AHL circuit that a mistake has befell. Here use Razor

flip-flops to find whether or not an operation this is thought of to be a one-cycle sample



ISSN NO:2236-6124

Page No:2379

can sure enough finish during a cycle. If now not, the operation is re-executed with a pair

of cycles. Despite the very fact that the re-execution could appear steeply-priced, the

general worth is low as a result of the re-execution frequency is low. additional data for the

Razor flip-flop could also be discovered in [21].

Figure 7. Razor flip flops

The AHL circuit is that the key facet within the ageing-aware variable-latency

multiplier factor. Figure 8 shows the main points of the AHL circuit. The AHL circuit

incorporates an aging indicator, judgement blocks, one mux, and one D turn-flop. The

aging indicator suggests whether or not or not the circuit has suffered tremendous overall

performance degradation thanks to the aging impact. The aging indicator is distributed

during a simple counter that counts quantity of errors over a particular amount of operations

and is reset to zero at the surrender of these operations. If the cycle amount is just too low,

the NR4SD multiplier factor is not able to end those operations effectively, inflicting

temporal arrangement violations. These temporal arrangement violations may well be

stuck by the Razor turn-flops, that generate error alerts. If errors happen frequently and

exceed a predefined threshold, it method the circuit has suffered vital temporal arrangement

degradation because of the aging impact, and also the aging indicator can output sign 1; in

the other case, it's getting to output zero to point the aging impact remains now not

intensive, and no moves are wanted.

Figure 8 Diagram of AHL (md means multiplicand,

mr means multiplicator)

The primary judgement block within the AHL circuit can output one if the quantity of

zeros within the multiplicand (multiplicator) is larger than n, and also the second judgement

block within the AHL circuit can output one if the number of zeros within the number

(multiplicator) is larger than n + 1. They’re every employed to make a decision whether or

not an enter sample needs one or 2 cycles, however handiest one in every of them is chosen

at a time. within the starting, the ageing impact is not vital, and also the aging indicator

produces zero, therefore the first judgement block is employed. when a time, frame



ISSN NO:2236-6124

Page No:2380

whereas the aging impact can become vital, the second judgement block is chosen.

compared with the first judgement block, the second judgement block permits a smaller

variety of patterns to show bent on be one-cycle designs as a result of it needs a lot of zero’s

inside the multiplicand (multiplicator) the information of the operation of the AHL circuit

square measure as follows: whereas an input pattern arrives, each judgement blocks can

confirm whether or not or not the sample needs for one cycle or 2 cycles to finish and pass

each outcomes to the multiplexer.

The multiplexer device selects thought of one among each result supported the output

of the obtaining older indicator. Then an OR operation is accomplished between the top

results of the multiplexer device, and also the Q¯ signal is employed to make a decision

the input of the D flip-flop. once the pattern needs one cycle, the output of the multiplexer

device is one. The !(gating) sign turns into one, and also the input flip flops can latch new

information inside subsequent cycle. on the opposite hand, whereas the output of the

multiplexer device is zero, which suggests that the input sample needs for two cycles to

complete, the logic gate can output zero to the D flip-flop. Consequently, the !(gating)

signal is zero to disable the clock signal of the input flip-flops within the following cycle.

bear in mind that best a cycle of the input flip-flop could also be disabled as a result of the

D flip-flop can latch one within the future cycle.

The overall flow of our planned design is as follows: once input patterns arrive, the

NR4SD multiplier, and also the AHL circuit execute at the same time. In line with the

quantity of zero’s inside the number (multiplicator), the AHL circuit involves a call if the

enter patterns need one or 2 cycles. If the input pattern needs 2 cycles to complete, the AHL

can output zero to disable the clock signal of the flip-flops. Otherwise, the AHL can output

one for normal operations. once the NR4SD multiplier factor finishes the operation, the

result could also be passed to the Razor flip-flops. The Razor flip-flops check whether or

not or not there could also be the course shelve temporal arrangement violation. If temporal

arrangement violations occur, it approaches the cycle amount is not prolonged enough for

the present operation to complete which the execution final result of the multiplier factor

is wrong. consequently, the Razor flip-flops can output a blunder to inform the device that

the trendy operation needs to here execute the employment of cycles to create sure the

operation is correct. during this case, the additional re-execution cycles as a results of

temporal arrangement violation incurs a penalty to universal average latency.

But, our planned AHL circuit will accurately expect whether or not the input patterns

need one or 2 cycles in most instances. solely a couple of input designs could in addition

purpose temporal arrangement variations once the AHL circuit judges incorrectly. during

this scenario, a lot of re-execution cycles did now not turn out sensible sized temporal

arrangement degradation.

In précis, our planned multiplier factor style has three key capabilities. First, its miles

a variable-latency style that minimize the temporal arrangement waste of the noncritical

methods. 2nd, it's able to provide dependable operations even when the aging impact

happens. The Razor flip-flops hit upon the temporal arrangement violations and re-execute

the operations exploitation 2 cycles. Ultimately, our design will modify the share of 1-cycle

patterns to cut back performance degradation because of the aging impact. whereas the

circuit is aged, and lots of errors occur, the AHL circuit uses the second judgement block

to make a decision if an input is one cycle or a pair of cycles.



ISSN NO:2236-6124

Page No:2381

4. Results A simulation conclusion for NR4SD multiplier factor is simulated in a very Xilinx ISE

14.1. These tools can facilitate to analysis its performance and calculate the ability, delay

and space. Snapshot is nothing however every and each moment of the appliance whereas

running shown in Figs. 9 to 10. These snapshots offer the clear read of application

developed. it'll be most helpful to the new peoples to grasp for the longer-term steps.

In the below Table 5-6 4*4,8*8,16*16,32*32 NR4SD multipliers are compared for space

of NR4SD multiplier factor. By observant the Table-5-6 it shows the clear read relating to

range of elements needed to develop explicit NR4SD multiplier factor supported input vary

with and while not AHL circuit.

Table 5. Device Utilization (area) Summary of NR4SD Multiplier with

AHL circuit

Logic Utilization 32-bit 16-bit 8-bit 4-bit

Number of Slice

Flip Flops

226 130 87 58

Number of 4

input LUTs

4608 1,050 272 106

Number of

occupied Slices

2,304 607 180 72

Average Fanout

of Non-Clock

Nets

3.97 3.41 3.15 2.86

Table 6. Device Utilization(area) Summary of NR4SD Multiplier

without AHL circuit

Logic Utilization 32-bit 16-bit 8-bit 4-bit

Number of 4 input

LUTs

2094 553 155 36

Number of

occupied Slices

1246 315 84 19

Number of bonded

IOBs

128 64 32 16

Average Fan-out of

Non-Clock Nets

4.44 4.04 3.60 3.26

Here in Table 7 it shows the time delay intense of NR4SD multiplier with and without

AHL circuit. By perceptive this it shows the vary of delay increasing because of increasing

variety of input ranges. Here the time intense is within the ranges of Nano seconds.

Table 7. Time Delay consuming of NR4SD Multiplier

Input

range

Delay with

AHL circuit

Delay without

AHL circuit

4-bit 8.068ns 12.063ns

8-bit 13.379ns 22.326ns

16-bit 26.755ns 43.790ns

32-bit 40.537ns 77.345ns



ISSN NO:2236-6124

Page No:2382

Figure 9 shows the simulation result of NR4SD multiplier with AHL circuit. Here

Figure 9 (a) to (d) shows 4*4, 8*8, 16*16, 32*32 bit NR4SD multiplier response and Figs

10 shows the simulation result of NR4SD multiplier without AHL circuit. Here Figs 10 (a)

to (d) shows 4*4, 8*8, 16*16, 32*32 bit NR4SD multiplier response. In these figures shows

the input and output responses. Table 8 shows the 4*4, 8*8, 16*16, 32*32 NR4SD

multipliers power analysis with and without AHL circuit.

Table 8. Power analysis of NR4SD Multiplier

Input

range

Power(W)

without AHL

circuit

Power(W) with

AHL circuit

4-bit 0.271 0.209

8-bit 0.343 0.219

16-bit 0.508 0.233

32-bit 0.883 0.306

In the below Table 9-12 the 4*4 ,8*8,16*16,32*32 NR4SD multipliers square measure

compared for Error count supported no.of i/p and no.of Zero’s in number. Here applying

the various variety of inputs that square measure every which way generated and shows

{the variety|the amount|the quantity} of error counts supported number of zero’s gift within

every which way generated input bits.

Table 9. Error count based on no.of i/p and no.of Zero’s in

multiplicand of NR4SD 4 bit multiplier

No of i/p No of 0’s-2 No of 0’s-1

13000 5237 2891

11000 4429 2440

9000 3625 2007

5000 2000 1131


multiplicand NR4SD 8 bit multiplier

No.of i/p 0’s-5 0’s-3 0’s-7

5000 2337 1333 3461

9000 4185 2374 4436

11000 5113 2896 5421

13000 6047 3436 6406


multiplicand NR4SD 16 bit multiplier

No.of i/p 0’s-3 0’s-5 0’s-7 0’s-9 0’s-11 0’s-13

5000 40 467 1432 2162 2416 2422

9000 78 835 2569 3999 4379 4420

11000 102 1030 3151 4761 5359 5420

13000 119 1220 3723 5631 6339 6419



ISSN NO:2236-6124

Page No:2383


multiplicand of NR4SD 32 bit multiplier

No.of i/p 0’s-10 0’s-15 0’s-20 0’s-25

5000 189 1490 2402 2382

9000 334 2730 4322 4381

11000 413 3339 5284 5381

13000 494 3952 6246 6380

5. Conclusions This paper projected associate degree aging-aware reliable multiplier style with the

AHL. The multiplier is in a position to switch the AHL to mitigate overall performance

degradation because of increased delay. remember that additionally to the BTI impact that

will increase semiconductor delay, interconnect to boot has its aging issue, that's remarked

as electromigration. Electromigration happens while this density is high enough to cause

the drift of metal ions on the direction of electron flow. The metal atoms is frequently

displaced when a fundamental measure, and also the geometry of the wires can

modification. If a wire becomes narrower, the resistance and delay of the wire are going to

be dilated, and within the finish, electromigration may additionally cause open circuits.

This drawback is additionally a lot of severe in advanced manner technology as a result of

metal wires square measure narrower, and changes within the wire breadth can cause larger

resistance variations. If the aging results as a results of the BTI effect and electromigration

square measure thought-about put together, the delay and overall performance degradation

are going to be a lot of important. luckily, our projected variable latency multipliers is used

underneath the influence of each the BTI result and electromigration. Similarly, our

projected variable latency multipliers have a lot of less performance degradation as a result

of variable latency multipliers have a lot of less temporal order waste, however typical

multipliers ought to think about the degradation ensuing from each the BTI result and

electromigration and use the worst case delay because the cycle amount.

6. REFERENCES

[1] Ing-Chao Lin, Member, IEEE, Yu-Hung Cho, and YiMing Yang, ”Aging-Aware

Reliable Multiplier Design With Adaptive Hold Logic” IEEE Transactions On Very Large

Scale Integration (VLSI) Systems.

[2] S. Zafar et al., “A comparative study of NBTI and PBTI (charge trapping) in SiO2/HfO2

stacks with FUSI, TiN, Re gates,” in Proc.IEEE Symp. VLSI Technol. Dig. Tech. Papers,

2006, pp. 23–25.

[3] S. Zafar, A. Kumar, E. Gusev, and E. Cartier, “Threshold voltage instabilities in high-k

gate dielectric stacks,” IEEE Trans. Device Mater.Rel., vol5, no.1, pp.45–64, Mar.2005.

[4] H.-I. Yang, S.-C. Yang, W. Hwang, and C.-T. Chuang, “Impacts of NBTI/PBTI on

timing control circuits and degradation tolerant design in nanoscale CMOS SRAM,” IEEE

Trans. Circuit Syst., vol. 58, no. 6, pp. 1239–1251, Jun. 2011.

[5] R.Vattikonda, W.Wang, and Y. Cao, “Modeling and minimization of pMOS NBTI

effect for robust naometer design,” in Proc. ACM/IEEE DAC, Jun. 2004, pp. 1047–1052.

[6] H. Abrishami, S. Hatami, B. Amelifard, and M. Pedram, “NBTI-aware flip-flop

characterization and design,” in Proc. 44th ACM GLSVLSI, 2008, pp. 29–34.

[7] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, “NBTI-aware synthesis of digital

circuits,” in Proc. ACM/IEEE DAC, Jun. 2007, pp. 370–375.

[8] A. Calimera, E. Macii, and M. Poncino, “Design techniqures for NBTItolerant power-



ISSN NO:2236-6124

Page No:2384

gating architecture,” IEEE Trans. Circuits Syst., Exp. Briefs, vol. 59, no. 4, pp. 249–253,

Apr. 2012.

[9] K.-C. Wu and D. Marculescu, “Joint logic restructuring and pin reordering against

NBTI-induced performance degradation,” in Proc. DATE, 2009, pp. 75–80.

[10] M. Basoglu, M. Orshansky, and M. Erez, “NBTI-aware DVFS: A new approach to

saving energy and increasing processor lifetime,” in Proc.ACM/IEEE ISLPED, Aug. 2010,

pp. 253–258.

[11] Y. Lee and T. Kim, “A fine-grained technique of NBTI-aware voltage scaling and body

biasing for standard cell based designs,” in Proc. ASPDAC, 2011, pp. 603–608.

[12] K.-C. Wu and D. Marculescu, “Aging-aware timing analysis and optimization

considering path sensitization,” in Proc. DATE, 2011, pp. 1–6.

[13] K. Du, P. Varman, and K. Mohanram, “High performance reliable variable latency

carry select addition,” in Proc. DATE, 2012, pp. 1257–1262.

[14] A. K. Verma, P. Brisk, and P. Ienne, “Variable latency speculative addition: A new

paradigm for arithmetic circuit design,” in Proc. DATE, 2008, pp. 1250–1255.

[15] D. Baneres, J. Cortadella, and M. Kishinevsky, “Variable-latency design by function

speculation,” in Proc. DATE, 2009, pp. 1704–1709.

[16] Y.-S. Su, D.-C. Wang, S.-C. Chang, and M. Marek-Sadowska, “Performance”

optimization using variable-latency design style,” IEEE Trans. Very Large Scale Integr.

(VLSI) Syst., vol. 19, no. 10, pp. 1874–1883, Oct. 2011.

[17] N. V. Mujadiya, “Instruction scheduling on variable latency functional units of VLIW

processors,” in Proc. ACM/IEEE ISED, Dec. 2011, pp. 307–312.

[18] M. Olivieri, “Design of synchronous and asynchronous variable-latency pipelined

multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 4, pp. 365–376,

Aug. 2001.

[19] D. Mohapatra, G. Karakonstantis, and K. Roy, “Low-power processvariation tolerant

arithmetic units using input-based elastic clocking,” in Proc. ACM/IEEE ISLPED, Aug.

2007, pp. 74–79.

[20] Y. Chen, H. Li, J. Li, and C.-K. Koh, “Variable-latency adder (VL-Adder): New

arithmetic circuit design practice to overcome NBTI,” in Proc. ACM/IEEE ISLPED, Aug.

2007, pp. 195–200.

[21] Y. Chen et al., “Variable-latency adder (VL-Adder) designs for low power and NBTI

tolerance,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 11, pp. 1621–

1624, Nov. 2010.

[22] Z. Huang, “High-level optimization techniques for low-power multiplier design,”

Ph.D. dissertation, Department of Computer Science, University of California, Los

Angeles, CA, 2003.

[23] Z. Huang and M. Ercegovac, “High-performance low-power left-to-right array

multiplier design,” IEEE Trans. Comput., vol. 54, no. 3, pp. 272–283, Mar. 2005.



ISSN NO:2236-6124

Page No:2385

Efficient Adaptive Hold Logic Aging-Aware Reliable NR4SD...

Documents

Transcript of Efficient Adaptive Hold Logic Aging-Aware Reliable NR4SD...