Low voltage logic circuits exploiting gate level dynamic ... · VB Fig. 1. The gate level body bias...

8
Low voltage logic circuits exploiting gate level dynamic body biasing in 28 nm UTBB FD-SOI Ramiro Taco a , Itamar Levi b , Marco Lanuzza a,, Alexander Fish b a Dept. of Computer Science, Modeling, Electronics and System Engineering, University of Calabria, Rende, Italy b Dept. of Electrical Engineering, Bar-Ilan University, Ramat-Gan, Israel article info Article history: Available online 2 December 2015 Keywords: Low-voltage logic design Dynamic body biasing UTBB FD-SOI abstract In this paper, the recently proposed gate level body bias (GLBB) technique is evaluated for low voltage logic design in state-of-the-art 28 nm ultra-thin body and box (UTBB) fully-depleted silicon-on- insulator (FD-SOI) technology. The inherent benefits of the low-granularity body-bias control, provided by the GLBB approach, are emphasized by the efficiency of forward body bias (FBB) in the FD-SOI tech- nology. In addition, the possibility to integrate PMOS and NMOS devices into a single common well con- figuration allows significant area reduction, as compared to an equivalent triple well implementation. Some arithmetic circuits were designed using GLBB approach and compared to their conventional CMOS and DTMOS counterparts under different running conditions at low voltage regime. Simulation results shows that, for 300 mV of supply voltage, a 4 4-bit GLBB Baugh Wooley multiplier allows performance improvement of about 30% and area reduction of about 35%, while maintaining low energy consumption as compared to the conventional CMOSn DTMOS solutions. Performance and energy benefits are maintained over a wide range of process-voltage-temperature (PVT) variations. Ó 2015 Elsevier Ltd. All rights reserved. 1. Introduction With the rapid expansion of power constrained electronic systems, such as battery-powered portable devices [1], energy scavenging sensor networks [2], and devices for biomedical or environmental monitoring [3], the reduction of energy consump- tion has become a main topic in digital circuit design. One of the widely used techniques to achieve reduced energy is to supply cir- cuits with a voltage (V DD ) near or below the transistors’ threshold (V TH ). As the dynamic energy of digital circuits decreases with V DD 2 , such an approach results in a very effective way of saving energy per operation. However, as V DD approaches V TH , performance is significantly sacrificed [4–7]. Furthermore, in nearnsub-threshold operating regions circuits become largely sensitive to process variations and temperature fluctuations [8,9]. Both of these issues need to be addressed to guarantee a wider range of applications for ultra-low voltage (ULV) circuits [8,10] The UTBB FD-SOI technology is emerging as a valid platform to cope with ULV design bottlenecks in the more scaled technology nodes [11–13]. The fully depleted channel of devices in this technology avoids the issue of the random dopant fluctuation, thus reducing the impact of process variability. Moreover, the ultra-thin (<30 nm) buried oxide (BOX) guarantees a good electrostatic con- trol of the channel and allows more effective body biasing with respect to bulk CMOS to be applied [13]. The latter is a key feature of the UTBB FD-SOI technology which improves the benefits of the FBB technique for ULV designs in advanced technologies [12,13]. The FBB technique can be used at different levels of granularity ranging from macro-block level to the transistor level [8–10, 13–18]. At the macro-block level body bias is usually managed by an external component (e.g. a body bias generator/controller as in [13]) thus limiting the area overhead and the control com- plexity. As a drawback, when V TH is reduced (i.e. through FBB) at the block level to compensate for variations and/or to provide a temporary speed boost, leakage power is increased for all the gates in the block, while speed-up would be needed only on timing critical gates. Better energy-delay tradeoffs can be obtained by reducing the body-bias control granularity, at the cost of higher design complexity [10,15,16]. The dynamic threshold voltage MOSFET (DTMOS) logic [9] pro- vides a solution to exploit the FBB technique at the transistor level. Such logic uses transistors with gates directly connected to their bodies. In this way, devices are forward body biased when they are turned ON. On the contrary, they are conventionally body biased in the OFF state. A major limitation for the use of DTMOS devices in triple-well bulk CMOS technologies is that large silicon http://dx.doi.org/10.1016/j.sse.2015.11.013 0038-1101/Ó 2015 Elsevier Ltd. All rights reserved. Corresponding author at: DIMES, University of Calabria, Via P. Bucci 42C, 87036 Arcavacata Di Rende, CS, Italy. E-mail address: [email protected] (M. Lanuzza). Solid-State Electronics 117 (2016) 185–192 Contents lists available at ScienceDirect Solid-State Electronics journal homepage: www.elsevier.com/locate/sse

Transcript of Low voltage logic circuits exploiting gate level dynamic ... · VB Fig. 1. The gate level body bias...

Solid-State Electronics 117 (2016) 185–192

Contents lists available at ScienceDirect

Solid-State Electronics

journal homepage: www.elsevier .com/locate /sse

Low voltage logic circuits exploiting gate level dynamic body biasing in28 nm UTBB FD-SOI

http://dx.doi.org/10.1016/j.sse.2015.11.0130038-1101/� 2015 Elsevier Ltd. All rights reserved.

⇑ Corresponding author at: DIMES, University of Calabria, Via P. Bucci 42C, 87036Arcavacata Di Rende, CS, Italy.

E-mail address: [email protected] (M. Lanuzza).

Ramiro Taco a, Itamar Levi b, Marco Lanuzza a,⇑, Alexander Fish b

aDept. of Computer Science, Modeling, Electronics and System Engineering, University of Calabria, Rende, ItalybDept. of Electrical Engineering, Bar-Ilan University, Ramat-Gan, Israel

a r t i c l e i n f o

Article history:Available online 2 December 2015

Keywords:Low-voltage logic designDynamic body biasingUTBB FD-SOI

a b s t r a c t

In this paper, the recently proposed gate level body bias (GLBB) technique is evaluated for low voltagelogic design in state-of-the-art 28 nm ultra-thin body and box (UTBB) fully-depleted silicon-on-insulator (FD-SOI) technology. The inherent benefits of the low-granularity body-bias control, providedby the GLBB approach, are emphasized by the efficiency of forward body bias (FBB) in the FD-SOI tech-nology. In addition, the possibility to integrate PMOS and NMOS devices into a single common well con-figuration allows significant area reduction, as compared to an equivalent triple well implementation.Some arithmetic circuits were designed using GLBB approach and compared to their conventionalCMOS and DTMOS counterparts under different running conditions at low voltage regime. Simulationresults shows that, for 300 mV of supply voltage, a 4 � 4-bit GLBB Baugh Wooley multiplier allowsperformance improvement of about 30% and area reduction of about 35%, while maintaining low energyconsumption as compared to the conventional CMOSn DTMOS solutions. Performance and energybenefits are maintained over a wide range of process-voltage-temperature (PVT) variations.

� 2015 Elsevier Ltd. All rights reserved.

1. Introduction

With the rapid expansion of power constrained electronicsystems, such as battery-powered portable devices [1], energyscavenging sensor networks [2], and devices for biomedical orenvironmental monitoring [3], the reduction of energy consump-tion has become a main topic in digital circuit design. One of thewidely used techniques to achieve reduced energy is to supply cir-cuits with a voltage (VDD) near or below the transistors’ threshold(VTH). As the dynamic energy of digital circuits decreases withVDD

2, such an approach results in a very effective way of savingenergy per operation. However, as VDD approaches VTH,performance is significantly sacrificed [4–7]. Furthermore, innearnsub-threshold operating regions circuits become largelysensitive to process variations and temperature fluctuations [8,9].Both of these issues need to be addressed to guarantee a widerrange of applications for ultra-low voltage (ULV) circuits [8,10]

The UTBB FD-SOI technology is emerging as a valid platform tocope with ULV design bottlenecks in the more scaled technologynodes [11–13]. The fully depleted channel of devices in thistechnology avoids the issue of the random dopant fluctuation, thus

reducing the impact of process variability. Moreover, the ultra-thin(<30 nm) buried oxide (BOX) guarantees a good electrostatic con-trol of the channel and allows more effective body biasing withrespect to bulk CMOS to be applied [13]. The latter is a key featureof the UTBB FD-SOI technology which improves the benefits of theFBB technique for ULV designs in advanced technologies [12,13].

The FBB technique can be used at different levels of granularityranging from macro-block level to the transistor level [8–10,13–18]. At the macro-block level body bias is usually managedby an external component (e.g. a body bias generator/controlleras in [13]) thus limiting the area overhead and the control com-plexity. As a drawback, when VTH is reduced (i.e. through FBB) atthe block level to compensate for variations and/or to provide atemporary speed boost, leakage power is increased for all the gatesin the block, while speed-up would be needed only on timingcritical gates. Better energy-delay tradeoffs can be obtained byreducing the body-bias control granularity, at the cost of higherdesign complexity [10,15,16].

The dynamic threshold voltage MOSFET (DTMOS) logic [9] pro-vides a solution to exploit the FBB technique at the transistor level.Such logic uses transistors with gates directly connected to theirbodies. In this way, devices are forward body biased when theyare turned ON. On the contrary, they are conventionally bodybiased in the OFF state. A major limitation for the use of DTMOSdevices in triple-well bulk CMOS technologies is that large silicon

PUN

VDD

PDN

VOUT

INPUTs

Body BiasGenerator

VBVB

VB

Fig. 1. The gate level body bias (GLBB) technique [14].

0

0.1

0.2

0.3

0.4

0

30n

60n

90n

120nVo

ltage

(V)

Vin Vout Vb

(b)

(a)

Cur

rent

(A)

Time

Ilogic IBBG

Ilogic,leak=92 pAIBBG,leak=11 pA

Fig. 2. Transient behavior for BBG falling output.

0.0

0.1

0.2

0.3

200µ

Volta

ge (V

)

Vin Vout Vb

Ilogic IBBG

(a)

186 R. Taco et al. / Solid-State Electronics 117 (2016) 185–192

area is wasted in order to ensure correct electrical isolationbetween differently body-biased devices [19]. As explained in[20], this issue is alleviated in advanced UTBB FD-SOI technologies.As a further drawback, the input capacitance of a DTMOS logic gateis increased in comparison to that of its static CMOS counterpartwith an adverse impact on both performance and dynamic energyconsumption [14].

Recently, the GLBB technique was proposed [14,17] to over-come the speed and energy limits of ULV DTMOS logic gates intriple-well bulk CMOS designs. Exploiting this technique, thecharge/discharge of the body capacitances does not affect the per-formance of logic gates. Additionally, when input signals switchwithout changing the logic gate state, the body capacitances arenot needlessly charged/discharged. Despite the above benefits,silicon area occupancy of GLBB circuits remains larger thanconventional CMOS solutions albeit essentially reduced comparedto the equivalent DTMOS implementations.

In this paper, for the first time, the GLBB technique is imple-mented and evaluated in 28 nm STM UTBB FD-SOI technology forULV logic design. We exploited the unique feature offered by thetechnology to integrate PMOS and NMOS devices into a commonwell configuration [13,21,22], achieving improvements in termsof both performance and area, for the same energy budget, ascompared to standard CMOS solution.

In order to demonstrate the potential of the suggestedapproach, the GLBB technique was compared to the standardCMOS and DTMOS solutions. For this purpose basic logic gateswere initially laid-out and comparatively characterized. After thatmore complex arithmetic circuits were evaluated under differentrunning conditions at ULV regime. For a 4 � 4-bit Baugh Wooleymultiplier evaluated at VDD = 0.3 V, the suggested approach leadsto a delay reduction of about 30% with respect to a conventionalstatic CMOS design. Such results were obtained while maintainingsimilar energy consumption and at the only expense of about 13%larger area. Significantly better energy (�39%) and area (�34%)results were recorded in comparison to the DTMOS solution whileachieving reduced delay. The above delay and energy benefits aremaintained over a wide range of PVT variations.

The rest of the paper is organized as follows. Section 2 discussesoptimization and main characteristics of ULV GLBB logic gatesdesigned in advanced UTBB FD-SOI technologies. To better evalu-ate our proposal, more complex arithmetic benchmark circuitsare considered and comparatively characterized in Section 3.Finally, Section 4 concludes the paper.

0

50µ

100µ

150µ

Cur

rent

(A)

Time

Ilogic,leak=190 pAIBBG,leak=10 pA

(b)

2. Gate level body bias logic in UTBB FD-SOI technology

This section briefly discusses the inherent properties of theGLBB technique. After that, optimization strategies specific forthe chosen implementation technology are presented and finallythe energy-delay-area properties offered by the suggestedapproach are evaluated in the case of commonly used logic gates.

Fig. 3. Transient behavior for BBG rising output.

2.1. GLBB logic overview

As shown in Fig. 1, the generic GLBB logic gate consists of twocircuit blocks: the logic sub-circuit which is responsible for thelogical functionality and the embedded body bias generator(BBG) which manages the body voltage. The BBG acts as a voltagefollower for the output voltage, while decoupling the bodycapacitances from the output node.

In Figs. 2 and 3 the transient behavior for the input voltage (VIN),the output voltage (VOUT) and the body voltage (VB) is shown for thefalling and rising output transitions, respectively. When VOUT isequal to VDD/0 V, the BBG transfers a high/low voltage on the VB

net, thus preparing the pull-down (pull-up) network for a fasterlogic gate switching. Since the MOSFETs of the switching network(either pull-up or pull-down) are already forward body biasedbefore gate inputs’ arrival, the output transition is largely favoredby a switching current significantly higher in comparison to thecase of conventional CMOS scheme. Speed benefits are alsoexpected in comparison to the DTMOS configuration. Differentlyfrom DTMOS logic gates, the transition of the input signals is notslowed down from the body-induced delay, whereas the capacitiveload seen by the BBG does not constitute a speed bottleneck, sinceVB voltage is always established well before the inputs’ transition.

R. Taco et al. / Solid-State Electronics 117 (2016) 185–192 187

Indeed, by inspecting the transient behavior reported in Figs. 2 and3, it is easy to understand that the RC delay at the output of theBBG represents a benefit since it allows a slower transition forthe body voltage and consequently a faster transition on the logicgate output.

Despite of the above advantages, when equally sized the GLBBlogic gates show somewhat increased leakage current with respectto their conventional CMOS and DTMOS counterparts [14].However, as demonstrated in the following, this issue can be easilymanaged since the boosting action of the BBG allows significantlyhigher performance to be achieved with gate size reduction.

2.2. Design optimization

As clearly shown in Fig. 4a, the layout of a GLBB circuit,implemented in the conventional triple well process, leads to alarge silicon area. A deep n-well layer is needed to shieldn-channel devices from the p-type substrate, thus obtainingp-well regions isolated from the underlying substrate. Each ofthese areas is vertically surrounded by an n-well region to also pro-vide lateral isolation. Note that significant distance constraints

n-wellVout

LOGIC SUBCIRCUIT

p-substra

PUNBBG

PDNp-well isolated

GND

p-well

VDD

n-well Vout

c

LOGIC SUBCIRCUIT

p-substrat

RVT

PDN

PB = 10nm

PUNBBG

p-well

RVT

LVT

GNDLVT

(a)

(b)Fig. 4. Layout strategy of GLBB circuits for conventional t

need to be satisfied to assure the correct electrical isolationbetween differently body-biased devices. This is even worse inthe case of a DTMOS implementation which increases the numberof isolated p/n-well regions (one for each logic gate input). Theunique property of the FD-SOI technology of integrating bothNMOS and PMOS devices into a common well configuration (eithern-well or p-well) [13,21–23] allows the silicon area overhead to besignificantly reduced. As illustrated in Fig. 4b, such a distinctivefeature was utilized in this work to design area-efficient GLBBcircuits. To allow abutment of different GLBB gates, the logic sub-circuits, including regular VTH (RVT) PMOS and low VTH (LVT) NMOSdevices, are implemented on single n-well (SNW) areas. On thecontrary, the BBG sub-circuits, each formed by a LVT PMOS and aRVT NMOS, exploit the single p-well (SPW) option, with thep-well and the p-substrate jointly biased at ground. It is worthemphasizing that the proposed layout strategy allows better areautilization with respect to the opposite configuration (i.e. withthe logic sub-circuit, implemented on the SPW and the BBGembedded in a SNW region). This is mainly because the distanceneeded to isolate different SNW areas is considerably reduced(i.e. more than three times) in comparison to that is required to

Vb

distanceconstraint

te

BBG LOGIC SUBCIRCUIT

n-well

PDN

PUN

p-well isolateddeep n-well

V DD

p-wellGND

VDD

Vb

distanceonstrainte

BBG

p-well

RVT

LVT

LOGIC SUBCIRCUIT

n-well

RVT

PDN

PB = 10nm

PUN

GNDLVT

riple (a) and single (b) well options in UTBB FD-SOI.

188 R. Taco et al. / Solid-State Electronics 117 (2016) 185–192

isolate different SPW regions. Moreover, the adopted solutionallows avoiding the use of deep n-well layer, thus further reducingthe complexity of the physical design.

In our physical design, all the parasitic diodes (i.e. verticalp-sub/n-well and horizontal p-well/n-well junctions) are alwaysmaintained in reverse mode during the circuit operation. Thiscan be easily observed in Fig. 5, which shows the cross section(Fig. 5a) and the adopted physical design strategy (Fig. 5b) in thecase of a GLBB inverter. Since the BBG output voltage is alwayspositive (i.e. GND < Vb < VDD), no significant current can flow intothe substrate.

For a fair comparison, all the DTMOS circuits, discussed in thispaper (except where explicitly stated otherwise), were laid outexploiting the SNW approach (i.e. a single SNW region forcommonly body-biased devices). This way, we have surpassedthe less area efficient solution, including deep n-well layer, to iso-late the differently biased p-well areas from the p-type substrate,commonly biased at ground. To facilitate N/P-MOS balancing inSNW regions for both DTMOS and GLBB designs, the gate lengthof LVT NMOS devices was extended by 10 nm (i.e. poly biasing(PB) [13] of 10 nm was used). The minimum channel length (i.e.Lmin = 30 nm) was used for all the other devices belonging to thecompared circuits.

2.3. Operating characteristics

Three basic logic gates (i.e. NAND2, NOR2 and XOR2) wereinitially considered and laid-out for conventional, triple and singlewell options of the 28 nm STM UTBB FD-SOI technology. For a fair

Table 1Comparison results for basic logic gates (TT process corner, VDD = 0.3 V and T = 27 �C).

Delay (ns) Avg. energyinput signals

NAND2 CMOS (RVT) 5.91 0.43DTMOS (triple well) 9.63 0.91GLBB (triple well) 8.11 0.52DTMOS (SNW) 5.90 0.51GLBB (SNW + SPW) 4.92 0.49

NOR2 CMOS (RVT) 9.59 0.69DTMOS (triple well) 9.52 1.19GLBB (triple well) 9.37 0.72DTMOS (SNW) 8.27 0.81GLBB (SNW + SPW) 7.85 0.61

XOR2 CMOS (RVT) 12.69 0.73DTMOS (triple well) 15.92 1.17GLBB (triple well) 12.35 0.70DTMOS (SNW) 11.45 0.94GLBB (SNW + SPW) 9.17 0.68

Fig. 5. GLBB inverter architecture in UTBB FD-SO

comparison, all the logic cells were sized for similar leakagecurrent. Moreover, RVT CMOS and single well logic gates werephysically designed as 12 tracks (height = 1.2 um) standard cells.Instead, a higher cell height is needed to accommodate the specifictechnology rules for the triple well DTMOS and GLBB logic gates.

Table 1 summarizes the comparative simulation results. It iseasy to observe that triple well implementations lead to large sili-con area occupancy. However, triple well GLBB gates reduce areaoccupancy from 45% (NAND2) to 51% (XOR2) in comparison tothe equivalent DTMOS solutions, while also resulting faster andlower energy hungry. As expected, the single well implementationallows area to be significantly reduced for both DTMOS and GLBBsolutions (about 77% on average). In addition, also better delayand energy results are recorded in comparison to their triple wellcounterparts. For all the considered cells, the single well GLBBgates always demonstrate the best delay results, while maintainingthe lowest levels of energy consumption and competitive siliconarea occupancy. More precisely, a delay reduction ranging from17% to 28% and from 5% to 20% is obtained in comparison to CMOSand single well DTMOS circuits, respectively.

3. Benchmarks and results

In this section we deeply evaluate the efficiency of the GLBBtechnique for ULV design in UTBB FD-SOI by considering twoarithmetic benchmarks in ascending order of complexity. As a firstbenchmark, the GLBB mirror full adder (FA), presented in [24], wasdesigned and post-layout characterized in comparison to thecorrespondent CMOS and DTMOS implementations.

for 1 MHz(fJ)

Avg. leakage current (nA) Height – Width (lm)

0.10 1.20–1.100.11 4.7–2.550.11 4.7–1.400.11 1.20–1.920.11 1.20–1.40

0.10 1.20–1.700.10 4.70–3.620.10 4.70–1.770.09 1.20–2.980.10 1.20–1.77

0.24 1.20–2.850.25 4.7–7.630.25 4.7–2.830.25 1.2–6.270.25 1.20–2.83

I: (a) cross section and (b) design strategy.

Table 2Pull up/pull down width ratio of transistors stacks.

CMOS(W = 240 nm)

DTMOS(W = 380 nm)

GLBB(W = 220 nm)

Tr. Stack Wp/Wn Wp/Wn Wp/Wn

1 4 W/W 4W/W 4.3 W/W2 11W/2.5 W 9W/2W 10W/2W3 16.5 W/4W 13.5 W/3.3 W 15W/3.3 W

0.20 0.25 0.30 0.35 0.40 0.45 0.500.3

0.6

0.9

1.2

1.5

1.8

+54%

-15%

Leak

age

Cur

rent

[nA

]

VDD[V]

GLBB CMOS DTMOS

similar leakage point

Fig. 7. Leakage comparison for FA designs.

0.20 0.25 0.30 0.35 0.40 0.45 0.505x10-1

100

101

102

103

0.25 0.30 0.35 0.400.7

0.8

0.9

1.0

Del

ay (l

og s

cale

) [ns

]

VDD[V]

GLBB CMOS DTMOS

27%22%

4%

Del

ay n

orm

aliz

ed to

CM

OS

data

VDD[V]

11%

Fig. 8. Delay comparison for FA designs.

R. Taco et al. / Solid-State Electronics 117 (2016) 185–192 189

To minimize the capacitive effects on the output node of a givenlogic gate, transistors belonging to the BBGs should be close to theminimum sized. Taking into account such a remark, the minimumsize (W = 80 nm and L = 30 nm) allowed by the chosen design kitwas used for nMOS devices belonging to the BBGs, whereas theBBG pMOS transistors were sized with W = 100 nm and L = 30 nmto assure symmetrical (with respect to VDD/2) low and high BBGoutput voltage transitions. The compared FA circuits were sizedto obtain similar leakage current @ VDD = 0.3 V and T = 27 �C. Forthis purpose, the pull-up/pull-down channel-width ratio of theanalyzed solutions was chosen to obtain comparable strength,while imposing equal width for series-connected transistors.Table 2 presents the width ratio between pull-up and pull-downnetworks for different stacking configurations. The sizing factorW was chosen by iterative simulations to achieve the above men-tioned optimization goal.

Layouts of the FA circuits are shown in Fig. 6(a)–(c) for CMOS,DTMOS and GLBB implementations, respectively. It is worth noth-ing that the sizing strategy and the adopted layout technique allowarea occupancy of the GLBB circuit to be only slightly increased(+11%) to that of the conventional CMOS design. On the other hand,the area saving in comparison to the DTMOS FA is about 35%.

Fig. 7 shows the leakage current (Ileak) versus VDD for the com-pared circuit topologies. Due to the adopted sizing criterion, allthe circuits have similar Ileak for VDD = 0.3 V (about 0.7 nA). Asexpected, both DTMOS and conventional CMOS designs show verysimilar leakage trends with VDD varying, whereas the GLBB circuitappears to be more sensitive to changes in VDD. As VDD drops lowerthan 0.3 V, the reduced sizes of the GLBB circuit begin to show theirbenefits in terms of Ileak.

Fig. 8 illustrates comparative post-layout delay results,obtained by the simulation setup discussed in [24]. By observingthe insert of Fig. 8, it is easy to note that the GLBB techniquereaches the maximum performance advantage for VDD = 0.35 V.

(a)

(b)

(c)

9.8 µm

16.8 µm

10.9 µm

n-w

ell

p-w

ell

Fig. 6. FA layouts for CMOS (a), DTMO

However, speed improvements are demonstrated for the wholeconsidered VDD range.

Fig. 9(a) and (b) compares the average energy per operation(EOP) under two different running scenarios. In the first operating

1.2

µm

1.2

µm

1.2

µm

S (b) and GLBB (c), respectively.

0123456789

0.20 0.25 0.30 0.35 0.40 0.45 0.500

0.51.01.52.02.53.0

GLBB CMOS DTMOS

Ener

gy p

er O

pera

tion

[fJ]

VDD[V]

GLBB CMOS DTMOS

MEP

Fig. 9. Energy comparison for FA designs.

Table 34 � 4-bit Baugh Wooley multiplier characteristics (TT process corner, VDD = 0.3 V andT = 27 �C).

Delay (ns) Energy (worst case operation) (fJ) Area (lm2)

CMOS 403 15.1 408.6DTMOS 306 25.2 703.9GLBB 285 15.3 461.7

190 R. Taco et al. / Solid-State Electronics 117 (2016) 185–192

condition (Fig. 9a), an activity factor (a) of 0.4 and clock cycle time(Tclk) of 40 FO4 (FO4 being the delay of a CMOS inverter drivingfour identical inverters) were considered. In this scenario, whichis typical of ultra-energy efficient microprocessor core [25], theGLBB FA always shows the lowest energy consumption mainlydue to the reduced transistors’ sizes (see Table 2). Fig. 9b illustratesa second scenario with a = 0.1 and Tclk = 100 FO4, which is moretypical for low power VLSI circuits [26]. Under such running condi-tions the impact of the leakage energy considerably increases as itis evident by looking at the minimum energy point (MEP) whichmoves toward higher VDDs [26]. However, even in this disadvanta-geous scenario the GLBB circuit tracks very well (it is even betterfor lower VDDs) the energy consumption of a conventional CMOSdesign.

As a second and more complex benchmark, the 4 � 4-bit BaughWooley multiplier, shown in Fig. 10, was laid out and compara-tively evaluated with equivalent CMOS and DTMOS designs.

4 bit RCA

b[0] b[1] b[2]

a[0]

a[1]

a[2]

a[3]

p[5]p[6]p[7]

Fig. 10. 4 � 4-bit Baugh

The main characteristics of the compared multiplier circuits areprovided in Table 3, for VDD = 0.3 V. The GLBB multiplier reducesdelay of about 29% and 7% with respect to its CMOS and DTMOScounterparts, respectively. Moreover, the GLBB implementationexhibits energy consumption for the worst case delayoperation (Ew.c.o.) very similar to that of the conventional CMOSsolution and reduced of about 40% in comparison to the DTMOSmultiplier. Such results were achieved at the expense of only 13%larger area with respect to the CMOS design, whereas about 34%area is saved in comparison to the DTMOS circuit.

Fig. 11(a)–(c), plots Ew.c.o. and maximum frequency versus VDD,considering three different process-temperature (PT) corners. Thetypical case PT corner involves typical N/PMOS transistors and anoperating temperature of 27 �C. To cover a wide range of possibleoperating conditions, the second and the third PT corners involveslow N/PMOS @ T = 0 �C and fast N/PMOS @ T = 100 �C, respec-tively. In all the simulated conditions, the GLBB approach assuresthe highest operation frequency while showing very competitiveEw.c.o. results.

ULV circuits are usually very sensitive to random process vari-ability [26]. For this reason, the tolerance to intra-die variationswas analyzed for different PT corners. Fig. 12(a)–(c) illustratesthe Ew.c.o. versus delay spreads obtained from a 1 K-point Monte-Carlo simulation performed for VDD = 0.3 V. Mean (l) and standarddeviation (r) values are also reported in Fig. 12. As expected, theGLBB approach, leads to the best mean delay values for all theevaluated PT corners, while it exhibits larger delay variability(evaluated in terms of r/l) mainly due to the use of smallerdevices. The mean energy values of the GLBB multiplier are closely

ci

co so

si

a

b+

FA

ci

co so

si

a

b

b[3]

p[0]

p[1]

p[2]

p[3]

p[4]

NAND + FA

AND + FA

+FA

Wooley multiplier.

1.1 1.5 2.0 2.5 3.0 3.5

6

8

10

12

14

16

18

20

22

0.25 0.30 0.35 0.40 0.45 0.50

16

18

20

22

24

26

28

30

25 30 35 40 45 50 55550

600

650

700

750

800

850

Ener

gyw

.c.o (@

Tcl

k = 8

0 FO

4) [f

J]En

ergy

w.c

.o (@

Tcl

k = 8

0 FO

4) [f

J]En

ergy

w.c

.o (@

Tcl

k = 8

0 FO

4) [f

J]

Delay [ns]

Fig. 12. Energy-delay Monte Carlo comparison results for VDD = 0.3 V: SS, 0 �C (a);TT, 27 �C (b) and FF, 100 �C (c).

0.20 0.25 0.30 0.35 0.40 0.45 0.5010k

100k

1M

10M

100M200M

SS ; T=0°C CMOSDTMOSGLBB

VDD[V]

Freq

uenc

y (lo

g sc

ale)

(Hz)

10

20

30

40

50

Ene

rgy w

.c.o

(@ T

clk

= 80

FO

4) [f

J]

0.20 0.25 0.30 0.35 0.40 0.45 0.50180k

1M

10M

100M

300M CMOSDTMOS

GLBB

VDD[V]

Freq

uenc

y (lo

g sc

ale)

(Hz)

10

20

30

40

50

60TT ; T=27°C

Ene

rgy w

.c.o

(@ T

clk

= 80

FO

4) [f

J]

0.20 0.25 0.30 0.35 0.40 0.45 0.501M

10M

100M

600M

VDD[V]

Freq

uenc

y (lo

g sc

ale)

(Hz)

0.0

500.0

1.0k

1.5k

2.0k

2.5k

3.0k

3.5k

4.0k

(c)

(b)

CMOSDTMOSGLBB

FF; 100°C

Ene

rgy w

.c.o

(@ T

clk

= 80

FO

4) [f

J]

(a)

Fig. 11. Energy-Frequency Multiplier Comparison Results: SS, 0 �C (a); TT, 27 �C (b)and FF, 100 �C (c).

R. Taco et al. / Solid-State Electronics 117 (2016) 185–192 191

similar to that of the equivalent CMOS design with reduced energyvariability in almost all the cases.

The impact of process variability on FDSOI CMOS circuits atlow voltage can be effectively mitigated by exploiting backbiasing. Usually, this approach requires an additional circuitryto adjust the levels of body bias voltages for pMOS and nMOSdevices to the optimum values [13]. To limit the overall designcomplexity, the proposed technique avoids the use of acentralized body bias generator. As a counter effect, the possibleoptions to deal with the effect of variations are restricted to thegeneration of only positive body voltages (i.e. FBB) whose valuescould be modulated by varying the supply voltage of the BBGsub-circuits.

4. Conclusions

In this paper the GLBB technique was evaluated in ULV regimeexploiting an advanced UTBB FD-SOI technology. The single wellflavors allowed by the technology permit to significantly reducethe area penalty of low-granularity body-biasing voltage control.Additionally, the higher efficiency of FBB techniques in UTBB FD-SOI technologies emphasizes the inherent performance and energycharacteristics of the GLBB approach. As a result, the GLBB tech-nique was shown to be superior for ULV designs in advanced UTBBFD-SOI technology nodes. This was demonstrated by comparingGLBB logic gates and arithmetic circuits to their conventionalCMOS and DTMOS counterparts. Post layout results have shown

192 R. Taco et al. / Solid-State Electronics 117 (2016) 185–192

high energy efficiency, smallerncomparable silicon area occupancyand higher performance with respect to the DTMOSnCMOSsolutions. Such benefits are maintained over a wide range of PVTvariations.

Acknowledgement

The authors would like to thank Dr. Philippe Flatresse fromSTMicroelectronics for valuable discussions.

References

[1] Chong K-S, Gwee B-H, Chang JS. Energy-efficient synchronous-logic andasynchronous-logic FFT/IFFT processors. IEEE J Solid-State Circuits Sep.2007;42(9):2034–45.

[2] Nazhandali L et al. Energy optimization of subthreshold-voltage sensornetwork processors. In: Proc ISCA; 4–8 June 2005. p. 197–207.

[3] Hanson S et al. Energy optimality and variability in subthreshold design. In:Proc IEEE ISLPED; Oct. 2006. p. 363–65.

[4] Wang A, Chandrakasan A. A 180-mV subthreshold FFT processor using aminimum energy design methodology. IEEE J Solid-State Circuits Jan. 2005;40(1):310–9.

[5] Zhai B et al. A 2.60 pJ/inst subthreshold sensor processor for optimal energyefficiency. In: IEEE Proc of symposium on VLSI; June 2006. p. 154–55.

[6] Rabaey J et al. Ultra-low-power design—The Roadmap to disappearingelectronics and ambient intelligence. IEEE Circuits Dev Mag 2006;22(4):23–9.

[7] Hanson S et al. A low-voltage processor for sensing applications with picowattstandby mode. IEEE J Solid-State Circuits Apr. 2009;44(4):1145–55.

[8] Hanson S et al. Exploring variability and performance in a sub-200 mVprocessor. IEEE J Solid-State Circuits Apr. 2008;43(4):881–91.

[9] Soeleman H, Roy K, Paul B. Robust subthreshold logic for ultra-low poweroperation. IEEE Trans Very Large Scale Integr (VLSI) Syst Aug. 2001;9(1):90–9.

[10] ZhaoW, Ha Y, Alioto M. Novel self-body-biasing and statistical design for near-threshold circuits with ultra energy-efficient AES as case study. IEEE TransVery Large Scale Integr (VLSI) Syst Aug. 2015;23(8):1390–401.

[11] Abouzeid F et al. 28 nm CMOS, energy efficient and variability tolerant,350 mV-to-1.0 V, 10 MHz/700 MHz, 252 bits Frame Error-Decoder. In: ProcESSCIRC; Sept. 2012.

[12] Reyserhove H, Reynders N Dehaene W. Ultra-low voltage datapath blocks in28 nm UTBB FD-SOI. In: IEEE Asian solid-state circuits conference; Nov. 2014.p. 49–52.

[13] Jacquet D et al. A 3 GHz Dual Core Processor ARM CortexTM-A9 in 28 nm UTBBFD-SOI CMOS With Ultra-WideVoltage Range and Energy EfficiencyOptimization, IEEE. J Solid-State Circuits Apr. 2014;49(4).

[14] Corsonello P, Lanuzza M, Perri S. Gate-level body biasing technique for highspeed sub-threshold CMOS logic gates. Int J Circ Theor Appl Jan. 2014;42(1):65–70.

[15] Meijer M, de Gyvez JP. Body-bias-driven design strategy for area- andperformance-efficient CMOS circuits. IEEE Trans Very Large Scale Integr(VLSI) Syst Jan. 2012;20(1):42–51.

[16] Kakoee MR, Benini L. Fine-grained power and body-bias control for near-threshold deep sub-micron CMOS circuits. Emerg Sel Topics Circuits Syst June2011;1(2):131–40.

[17] Albano D, Lanuzza M, Taco R, Crupi F. Gate-level body biasing for subthresholdlogic circuits: analytical modeling and design guidelines. Int J Circ Theor ApplNov. 2015;43(11):1523–40.

[18] Mostafa H, Anis M, Elmasry M. A novel low area overhead direct adaptive bodybias (D-ABB) circuit for die-to-die and within-die variations compensation.IEEE Trans Very Large Scale Integr (VLSI) Syst Oct. 2011;19(10):1848–60.

[19] Hwang M-E Roy K. A 135 mV 0.13 lW process tolerant 6T subthresholdDTMOS SRAM in 90nm technology. In: Proc IEEE CICC; Sept. 2008. p. 419–422.

[20] Le Coz J et al. DTMOS power switch in 28 nm UTBB FD-SOI technology. SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S); Oct.2013. p. 1–2.

[21] Jean-Philippe N et al. Multi-Vt UTBB FDSOI device architectures for low-powerCMOS circuit. IEEE Trans Electron Dev Aug. 2011;58(8):2473–82.

[22] Taco R, Levi I, Fish A, Lanuzza M. Exploring back biasing back biasingopportunities in 28 nm UTBB FD-SOI technology for subthreshold digitaldesign. IEEE 28th Convention of Electrical & Electronics Engineers in Israel(IEEEI); 2014.

[23] Pelloux-Prayer B et al. Fine grain multi-VT co-integration methodology inUTBB FD-SOI technology. IFIP/IEEE 21st International Conference on VeryLarge Scale Integration (VLSI-SoC); Oct. 2013. p. 168–173.

[24] Lanuzza M, Taco R Albano D. Dynamic gate-level body biasing for subthresholddigital design. Circuits and Systems IEEE LASCAS; Feb. 2014. p. 1–4.

[25] Jeon D, Seok M, Chakrabarti C, Blaauw D, Sylvester D. A super-pipelined energyefficient subthreshold 240 MS/s FFT core in 65 nm CMOS. IEEE J Solid-StateCircuits Jan. 2012;47(1):23–34.

[26] Alioto M. Ultra-low power VLSI circuit design demystified and explained: Atutorial. IEEE Trans Circuits Syst I (TCAS I). 2012;59(1):3–29.