Design of High Performance Arithmetic Circuits using Novel...

Design of High Performance Arithmetic

Circuits using Novel Two Transistor

(2T) XOR Gates

Thesis submitted in partial fulfilment

of the requirements for the degree of

Master of Science by Research

In

Electronics and Communication Engineering

by

Himani Upadhyay

201232697

[email protected]

International Institute of Information Technology, Hyderabad

(Deemed to be University)

Hyderabad-500032, INDIA

October 2015

i

Copyright© Himani Upadhyay, 2015

All Rights Reserved

ii

Dedicated to my Guide, Family and Friends

iii

INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY

HYDERABAD, INDIA

CERTIFICATE

This is to certify that the work presented in this thesis, titled “Design of High Performance

Arithmetic Circuits using Novel Two Transistor (2T) XOR Gate” by Himani Upadhyay

(201232697) submitted in partial fulfilment for the award of the degree of Master of Science

(by Research) in VLSI & Embedded Systems, has been carried out under my supervision and

it is not submitted elsewhere for a degree.

Date Advisor: Prof. S R Chowdhury

Assistant Professor

IIIT, Hyderabad

iv

Acknowledgements

The journey through the completion of this dissertation has been an amazing one. I would like

to take this opportunity to acknowledge and appreciate the efforts of the people who have

helped me during my research and documenting this thesis. I extend my deepest gratitude to

my advisor, Prof. Shubhajit Roy Chowdhury for his motivation, guidance, support and

immense knowledge which played a great role during development of ideas in the thesis. I

could have never imagined getting a better advisor and mentor for my Master’s study. His

valuable feedback and flexible nature lead to improvement in different aspects of the work and

approach of the work.

I would also thank my colleagues in CVEST lab for extending support and being a wonderful

company. I am deeply grateful to my M. Tech friends Pankaj, Rutanshu, Rachit, Aswathy,

Anamika, Rajeev and many more who had always been there for encouraging and supporting

me throughout my work.

I would like to appreciate the love and support of my family who had always been there through

thick and thin of my degree. I thank my mother Vindvasini Upadhyay and father Narendra

Prakash Upadhyay for the selfless love, infinite trust and true confidence in me. I thank my

sisters Aekta and Shivangi for their encouragement during my study. I would also like to thank

my brother Himanshu for the strong will he builds in me. With that, I hope to work well and

proud in my future life.

I offer my thanks from the deepest part of my heart to GOD for filling my heart and soul with

inner enthusiasm, spirit of hard work and patience to complete my research.

v

Abstract

From the day transistor was invented (in 1947), low area, low power and high speed are

constitutional issues faced by researchers in transistor based technology. Presently

minimization of power consumption has emerged as a design constraint over the last few years

due to increase in demand of portable consumer electronic products in very large scale

integrated (VLSI) circuit designs. Mobile phones, smart cards, assistive listening technology

such as hearing aids and PDAs are the examples of portable consumer electronic products.

Many design technologies like complementary MOS, pass transistor logics, transmission gate

based and so on exists in literature dealing with issue of low power consumption. Low power

designs can be developed at system level, technology level, architectural level and circuit level.

For implementing a combinational circuit, power saving is done by proper choice of a logic

style. This is because all important parameters governing power dissipation, switching

capacitance, transition activity and short circuit currents are strongly influenced by logic style

of the circuit. Improvising the logic styles has advantage in terms of power, delay and layout

implementations.

XOR gates and Full adders are the basic building blocks of various circuits like Central

Processing Unit (CPU) and Digital Signal Processors (DSP). So, optimization of XOR and full

adder in terms of power consumption will let us achieve low power circuits. This thesis presents

a novel design of two transistor (2T) XOR gate and its application to design an 8 bit x 8 bit

multiplier. The design explores the essence of suitably biasing the two PMOS pass transistors

and engineering the threshold voltage of the PMOS transistors. Using the 2T XOR gates, a six

transistor (6T) full adder has been realised. Detailed simulations have been carried out to

compare the proposed 2T XOR gate and 6T full adder against the existing XOR gates and full

adders available in literature with respect to power delay product (PDP), noise margin and area.

Significant improvements in PDP has been achieved with the 2T XOR gate with respect to the

existing XOR gates. The area of 6T adder has been found to be lower than 8T adder convincing

that 2T XOR gate occupies less silicon area than 3T XOR gate.

The thesis also presents the architectures of 5:3 compressor designs for low power

multiplication purposes. The architecture utilizes the novel two transistor XOR gates and two

transistor multiplexer design for logic level implementation. The modified and proposed

compressor designs reduces the stage delay, transistor count, PDP (power delay product), EDP

(energy delay product) and area by utilizing the combinations of XOR-XNOR gates, MUX

circuits and transistor level implementation in contrast with the conventional designs.

An 8 bit x 8 bit array multiplier has also been implemented using the design of 6T adder and

its performance has been analysed and compared with similar multipliers designed with peer

adder designs available in literature. The power delay product (PDP) of the proposed multiplier

has been found to be as low as 1.854 pJ using UMC 65-nm CMOS process. The design of the

vi

8 bit x 8 bit multiplier has been extended to the design of 8 bit multiply-accumulate (MAC)

unit, which has been simulated using 65-nm CMOS process. A delay of 3.977 ns and power

dissipation of 1.107 mW has been obtained with the MAC unit.

All the circuit simulations in this thesis have been done in a systematic process. For validating

the applicability and accuracy of transistor level models, process, voltage and temperature

variation analysis has been done. The proposed designs are definitely a better choice for low

frequency (≤ 50MHz) applications. From the schematic design of the structures to layout

CADENCE Spectre simulation tool has been used. ASSURA, a design verification suite of

tools within the Virtuoso custom design platform is utilized for layout purposes. Simulation

studies have been carried out in UMC 65-nm, 90-nm and 130-nm technologies for conforming

the interdependence of the proposed model.

vii

Contents

Contents……………………………………………………………………………vii-viii

List of Tables…………………………………………………………………………ix

List of Figures………………………………………………………………………..x-xi

List of Symbols and Equations……………………………………………………xii

Contents Page No.

1. Introduction……………………………………………………………………...1-26

1.1. The Historical Perspective………………………………………………….1-4

1.2. Prior Works…………………………………………………………………4-24

1.2.1. XOR Gates…………………………………………………………..5-11

1.2.2. Adders……………………………………………………………….11-17

1.2.3. Compressors…………………………………………………………17-21

1.2.4. Multipliers…………………………………………………………...22-24

1.3. Motivation…………………………………………………………………..24-25

1.4. Problem Statement…………………………………………………………..25

1.5. Contribution of the Thesis…………………………………………………..25

1.6. Thesis Organization…………………………………………………………26

2. Design of 2T XOR Gate………………………………………………………....27-41

2.1. What is a XOR Gate?.....................................................................................27-28

2.2. Design of Two Transistor XOR Gate……………………………………….28-41

2.2.1. Working of 2T XOR Gate…………………………………………...29-32

2.2.2. Simulation and Performance Analysis……………………………....32-40

of Proposed 2T XOR Gate

2.2.3. Results and Discussions…………………………………………….40-41

3. Design of 6T Adder using Novel 2T XOR Gates………………………………42-53

3.1. What is an adder?..........................................................................................42-44

3.1.1. Half Adders…………………………………………………………42-43

3.1.2. Full Adders………………………………………………………….43-44

3.2. Design of Proposed Six Transistor Full Adder…………………………….44-53

3.2.1. Simulation and Performance Analysis of …………………………..45-49

Proposed 6T Full Adder

3.2.2. Layout Design of Proposed Six Transistor………………………….49-52

Adder

3.2.3. Results and Discussions……………………………………………..52-53

viii

4. Design of 5:3 Compressor using Novel 2T XOR Gates………………………...54-63

4.1. What are Compressors in VLSI Design?.......................................................54-55

4.2. Architecture of Proposed 5:3 Compressors...………………………………55-63

4.2.1. Circuit Design of Proposed 5:3 Compressors……………………….57-58

4.2.2. Simulation and Performance Analysis of 5:3……………………….59-60

Proposed Compressor Architectures

4.2.3. Layout design of Proposed 5:3 Compressor ……………………….60-62

Architectures

4.2.4. Results and Discussions……………………………………………..63

5. Design of 8 Bit x 8 Bit Multiplier using Novel 2T XOR………………………..64-76

Gates

5.1. What is a Multiplier?......................................................................................64-67

5.1.1. Multiplication Algorithm……………………………………………65-67

5.2. Design of Proposed 8 Bit x 8 Bit Multiplier………………………………..67-76

5.2.1. Array Multiplier……………………………………………………..67

5.2.2. Simulation and Performance Analysis of …………………………...67-70

Proposed 8 Bit x 8 Bit Multiplier

5.2.3. Layout Design of Proposed 8 Bit x 8 Bit Multiplier………………...70-73

5.2.4. Overview of Multiply and Accumulate (MAC)……………………..74-75

Unit

5.2.5. True Single Phased Clocked Register (TSPCR)…………………….75-76

5.2.6. Results and Discussions……………………………………………..76

6. Conclusions………………………………………………………………………77-79

6.1. Summary of present work…………………………………………………...77

6.2. Limitations of thesis work…………………………………………………...78

6.3. Future work………………………………………………………………… 79

List of Publications…………………………………………………………………80

Bibliography………………………………………………………………………...81-85

ix

List of Tables

Table Page No.

1. Time and Area Requirements of Different Types………………………………..12

of Adders

2. Simulation Logic Levels of 2T XOR Gate at

Reverse Bias of 320 mV using 65-nm Technology................................................30

3. Comparison of Performance Analysis of

Different XOR Gates..............................................................................…............35

4. Comparison Result of Noise Margin of

Different XOR Gates……………………………….………………………….....39

5. Truth Table for Half Adder.……………………………………………………....43

6. Truth Table for Full Adder.………………………………………………………44

7. Comparison of Performance Analysis of

Different Adders………………………………………………………………….47

8. Comparative Study of Area of Different Adders………………………………...50

9. Counter Property of 5:3 Compressors…………………………………………....55

10. Comparative Analysis of Performance of Different……………………………..59

5:3 Compressors

11. Comparative Study of Area of Different 5:3 Compressors………………………61

12. Performance Analysis of 8 Bit x 8 Bit Multiplier………………………………..69

using Different Adders

13. Comparative Study of Area of 8 Bit x 8 Bit Multiplier………………………….72

using Different Adders

x

List of Figures

Figures Page No.

1.1 Earlier Inventions.……………………………………………………………….2

1.1(a). Diagrammatic Representation of IGFET…………………………………2

1.1(b). The 4004 Microprocessor [2]...…………………………………………..2

1.1(c). The 8008 Microprocessor [3]...…………………………………………...2

1.2 Graphical Depiction of Moore’s Law [4]………………………………………..3

1.3 Static CMOS XOR gates…………………………………………………………7

1.4 Eight Transistor XOR Gate with CMOS Transmission Gate.…………………....8

1.5 Six Transistor XOR Gates………………………………………………………..8

1.6 Previous Works on Design of 4 Transistor XOR Gates……………………….....9-10

1.7 Design of 3T XOR Gate……………………………………………………….....10

1.8 Improved Design of 3T XOR gate……………………………………………....11

1.9 Adder cell with Three Modules………………………………………………......14

Module 1: Generate XOR and XNOR functions

Module 2: Sum function Module 3: Carry function

1.10 Topologies of Different Full Adder Designs with Reduced…………………….14-17

Number of Transistors over the years

1.10(a). 28 Transistor Full Adder………………………………………………...14

1.10(b). 20 Transistor Full Adder………………………………………………...15

1.10(c). 16 Transistor Full Adder………………………………………………...15

1.10(d). 14 Transistor Full Adder………………………………………………...16

1.10(e). 10 Transistor Full Adder………………………………………………...16

1.10(f). 8 Transistor Full Adder…………………………………………………..17

1.11 Different Implementations of Compressor Designs……………………………..18-20

1.11(a). Conventional Design of 3:2 Compressors………………………………18

1.11(b). Conventional Design of 4:2 Compressors………………………………19

1.11(c). Conventional Design of 5:3 Compressors………………………………19

1.11(d). Existing Implementation of 5:3 Compressors…………………………..20

1.12 CMOS Implementations of

1.12(a). MUX…………………………………………………………………….21

1.12(b). XOR-XNOR…………………………………………………………….21

1.13 Algorithm for 8 bits X 8 bits Wallace Tree Multiplier [55]…………………….23

1.14 Serial- Parallel Multiplier……………………………………………………….24

2.1 Logic Symbol of XOR Gate……………………………………………………..27

2.2 The wiring diagram depicting the control of single light……………………….28

source with two switches. The light is on when either both

switches are switched up or both down

2.3 Proposed Design of 2T XOR Gate…………………………………………….30

xi

2.4 Diode Connected NMOS………………………………………………………32

2.5 Input and Output Waveforms………………………………………………....33

2.5(a). XOR Gate at Reverse Bias of 320 mV………………………………….33

2.5(b). XOR Gate at Reverse Bias of 270 mV………………………………….33

2.6 Calculation of Propagation Delay………………………………………………34

2.7 PDP (vs) Technology for XOR Gate Architectures……………………………..36-37

3.1 Circuit Diagram of Half Adder………………………………………………….42

3.2 Logic Circuit of Full Adder……………………………………………………..43

3.3 Schematic Diagram of Proposed 6T Adder……………………………………..45

3.4 Post Layout Simulation of 6 Transistor Adder at 65-nm

Technology……………………………………………………………………...46

3.5 PDP (vs) Technology for Adder Architectures………………………………....48

3.6 Layout View of Proposed 6T Full Adder……………………………………….49

3.7 Area (vs) Technology for Adder Architectures………………………………....51-52

4.1 Block Diagram of 5:3 Compressors…………………………………………....55

4.2 Architecture of Proposed 5:3 Compressor……………………………………...57

4.3 Two Transistor 2x1 Multiplexer Design………………………………………..57

4.4 Schematic View of 5:3 Compressors…………………………………………..58

4.4(a). 3T XOR and 2T 2x1 MUX Compressor………………………………..58

4.4(b). 2T XOR and 2T 2x1 MUX Compressor………………………………..58

4.5 EDP (vs) Type of Compressor Circuit in Different Technology………………..60

4.6 Area (vs) Type of Compressor Circuit in Different Technology………………..61

4.7 Layout View of Proposed 5:3 Compressors in 90-nm…………………………..62

Technology

4.7(a). 3T XOR and 2T 2x1 MUX……………………………………………..62

4.7(b). 2T XOR and 2T 2x1 MUX……………………………………………..62

5.1 Basic Multiplication…………………………………………………………….65

5.2 Signed Multiplication Algorithm……………………………………………….65

5.3 Product Matrix………………………………………………………………….66

5.4 Example: Multiplication of 8 bit x 8 bit Binary Numbers……………………...66

5.5 Array Multiplier Architecture…………………………………………………..68

5.5(a). An 8 bit x 8 bit Array Multiplier………………………………………..68

5.5(b). Basic Building Block…………………………………………………...68

5.6 PDP (vs) Technology for Different Multiplier Architectures…………………..70

5.7 Layout Design of 8 Bit x 8 Bit Multiplier………………………………………71

5.8 Area (vs) Technology for Different Multiplier Architectures…………………..73

5.9 Basic Multiply and Accumulate (MAC) Unit…………………………………..74

5.10 True Single Phased Clocked Register (TSPC)………………………………...75

5.11 Positive and Negative Latches…………………………………………………76

xii

List of Symbols and Equations

Symbols Page No.

𝛼0→1= Switching activity factor………………………………………………………….5

𝛾 = Bulk threshold coefficient……………………………………………………………29

𝜑0= Fermi Potential………………………………………………………………………29

𝛼𝑣, 𝛼𝑤= Process dependent parameters…………………………………………………..29

Equations

1. Equation for Total Power Consumption of VLSI Circuit………………………...5

2. Equation for Representation for Static CMOS XOR Gate………………………..6-7

3. Equation for XOR gates…………………………………………………………..28

4. Equation for The relation exhibited between channel length (L),………………...29

width (W), substrate to bulk voltage (𝑉𝑆𝐵) of transistor

5. Equation of Flicker Noise………………………………………………………...31

6. Equation for Propagation Delay…………………………………………………..34

7. Equation for subthreshold current………………………………………………...38

8. Equation for sum and carry of half adders………………………………………..42

9. Equation for sum and carry of full adders………………………………………...43

10. Equation for design of 5:3 compressors…………………………………………. 56

11. Equation for addition of partial products in multipliers…………………………. 66

12. Equation for working of array multiplier………………………………………... 67

1

Chapter 1

Introduction

1.1 The Historical Perspective

Digital electronic system is on a revolutionary boom with great improvement in technology.

Earlier, the digital electronic systems were established on the idea of magnetically controlled

relays (or switches) used mainly for the implementation of very simple logic networks. The

train safety systems, which are still being used at present is an example for this kind of network.

The vacuum tubes were the dominating electronic device technology until 1950s. The change

in the technology came up in 1947 at Bell telephone laboratories with the invention of transistor

followed by Shockley’s exploration of bipolar transistor in 1949. The first bipolar logic gate

introduced by Harris came into picture in 1956 and until that even more time was taken to

translate it into integrated-circuit commercial logic gates, called the Fairchild Micro-logic

family. The first truly successful IC logic family was Transistor-Transistor Logic (TTL) which

got pioneered in 1962. The issues with bipolar junction transistors particularly with respect to

power dissipation, scaling and noise immunity became more and more serious over time and

ultimately gave way to Metal Oxide Semiconductor Field Effect Transistors (MOSFETs).

The basic principle behind the MOSFET (originally called IGFET) in Figure 1.1(a) was

proposed in a patent by J. Lilienfed (Canada) as early as 1925, and, independently, by O.Heil

in England in 1935 [1]. MOS digital integrated circuits started to take off in full swing at the

early 1970s. Remarkably, the first MOS logic gates introduced were of the CMOS

Complementary MOS) variety, and this trend continued till the late 1960s. The first practical

MOS integrated circuits were implemented in PMOS-only logic and were used in applications

such as calculators. The second age revolution of the digital integrated circuit was inaugurated

with the introduction of the first microprocessor by Intel in 1972 (the 4004 microprocessor [2]

(Figure 1.1(b))) and 1974 (the 8080 microprocessor [3] (Figure 1.1(c))). These processors were

implemented in NMOS-only logic which has the advantage of higher speed over the PMOS-

only logic because the mobility of electrons present in NMOS devices are more than that of

holes in PMOS devices. Simultaneously, MOS technology also enabled the realization of the

first high density semiconductor memories.

2

Figure 1.1(a). Diagrammatic Representation of IGFET

Figure 1.1(b). The 4004 Microprocessor [2]

Figure 1.1(c). 8080 Microprocessor [3]

Figure 1.1. Earlier Inventions

3

The driving force of integrated electronics is to have minimum area by compressing the silicon

area required by electronic circuit in addition to reduction in power consumption and delay.

This led to the integration of more and more applications because of reduction in number of

transistors. The overhead in terms of silicon area and power is also reduced. The demand of

transistors in VLSI design is appropriately elaborated by Moore’s Law [4] as shown in Figure

1.2. Moore’s Law states that “The number of transistors per square inch on integrated circuits

had doubled every year since the integrated circuit was invented”. Thus, reducing the transistor

count of circuits have been the main focus for researchers for so many years and is still

continuing [5].

The challenging criteria of the emerging low power and high speed communication digital

signal processing chips can be addressed by exploring the well-engineered deep submicron

MOSFET technologies. The performance of the basic arithmetic circuits to implement complex

algorithms such as convolution, correlation and digital filtering, defines the performance of

many bigger modules of Digital Signal Processors (DSPs). The semiconductor industry has

witnessed an explosive growth of integration of sophisticated multimedia-based applications

into mobile electronics gadgetry since the last decade. However, power consumption is the

critical area of concern in this arena and has to be reduced for a particular operating frequency.

Moreover, there is a drive by designers to strive for smaller silicon area, higher speed, longer

battery life, and enhanced reliability because of explosive growth of demand and popularity of

portable electronic products. The XOR-XNOR circuits are basic building blocks in various

circuits’ especially arithmetic circuits (adders & multipliers), compressors, comparators, parity

checkers, code converters, error-detecting or error-correcting codes and phase detector. The

adders and multipliers being the fast arithmetic computation cells and widely used for many

circuits of VLSI design are the frequent research areas.

Figure 1.2. Graphical Depiction of Moore’s Law [4]

4

A further addition to reliability and packaging problem issues have been raised with the rise in

chip density and increase in power consumption of VLSI systems. Packaging and cooling cost

of VLSI systems also goes up with high power dissipation. Nowadays, low power consumption

along with minimum delay and area requirements is one of important design consideration for

IC designers. There are three major source of power consumption in CMOS VLSI circuits:

1) Dynamic switching power due to charging and discharging of parasitic capacitances,

2) Short circuit power due to direct current flow from power supply to ground with

simultaneous functioning of p-network and n-networks,

3) Leakage power due to leakage currents, which includes both the subthreshold leakage and

reverse bias leakage.

Different logic styles with each having its own advantages in terms of power, delay and layout

implementation have been proposed for high speed and low power circuits. There are many

proposed logics for high speed and low power dissipation [6].

There are four different design levels at which the increasing demand for low-power of Very

Large Scale Integration (VLSI) can be addressed. They are defined as the architectural, circuit,

layout and the process technology level [7]. A considerable potential for power savings at the

circuit level exists by means of proper choice of a logic style for implementing combinational

circuits. This is because of switching capacitance, transition activity, and short-circuit

currents— all the important parameters governing power dissipation are strongly influenced

by the chosen logic style. At technology level, power consumption is going to scale down at

the same rate as the channel length technology is shrinking day by day. Thus, power saving

can be achieved by the improvements in fabrication process such as small feature size, very

low voltages, interconnects and insulators with low dielectric constants. The performance

aspects depend on the application, the kind of circuit to be implemented, and the design

technique used. Investigations of low-power logic styles proclaimed in the literature so far,

however, have mainly pin-pointed on particular logic cells, namely full-adders, used in some

arithmetic circuits. In this thesis, these observations and surveys have been kept in mind

starting with basic logic gate and extending the idea to a much broader set of combinational

arithmetic circuits. The power dissipation characteristics of various existing logic styles are

contrasted qualitatively and quantitatively by actual logic gate implementations and

simulations under experimental circuit arrangements and operating conditions [8].With the

reduction of power at different design levels, the number of transistors are also reduced. The

reduction in number of transistors to design a circuit reduces the silicon area during fabrication

giving way to compact digital logic design. Similar investigations of sequential elements, such

as latches and flip-flops, are not included in this work, but can be found elsewhere in the

literature [7].

1.2 Literature Surveys In the past decade a lot of work has been done and various architectural designs have been

proposed for the areas mentioned in this thesis. Starting with the basic building block i.e. XOR

5

gates to DSP (digital signal processor) level, novel designs have been implemented to have

minimum silicon area, minimum power and high speed. Working with nanometre technologies

and reducing the area has also been the prime focus in the preceding decades. Microprocessors

and digital signal processors rely on the efficient implementation of generic arithmetic logic

units and floating point units to execute dedicated algorithms.

1.2.1 XOR Gates

With the ever increasing demand for high speed processing and economy of batteries, the

demand for low power VLSI system is increasing steadily over a decade. In this regard, full

adder receives a lot of attention since it forms a basic element in any processor design. From

the gate level design point of view, it is well known that full adder can be efficiently

implemented using XOR gates. The ‘sum’ can be implemented using two cascaded XOR gates

and ‘carry’ as a multiplexed operation on transistors. With this essence, XOR gate forms a

primary block further used to design full adder in Chapter 3. Due to the increasing number of transistors on digital chip, power dissipation reduction counts

as an important criteria for designing XOR gates. The total power consumption for VLSI

circuits is given by a general Equation as follows [1]:

P =𝑓. 𝐶𝑙𝑜𝑎𝑑. 𝑉𝐷𝐷2 . 𝛼0→1 + 𝑉𝐷𝐷.𝑡𝑆𝐶 . 𝐼𝑆𝐶 . 𝑓 + 𝑉𝐷𝐷 . 𝐼𝑙 (1)

Where,

𝛼0→1 = Switching activity factor for transitions,

𝑉𝐷𝐷 = Supply voltage,

𝐶𝑙𝑜𝑎𝑑 = Output load capacitance,

f = System clock frequency during transitions,

𝐼𝑆𝐶= Short circuit current flowing from power supply to ground,

𝑡𝑆𝐶 = Time duration for flow of short circuit current,

𝐼𝑙 = Leakage current.

The first term on the right hand side of Equation (1) represents the dynamic component of

power, second term denotes the short circuit power and the third term defines the leakage

power. The power consumption can be minimised by reducing the power supply voltage, load

capacitances or by lowering the frequency of circuit as depicted from the first term of Equation

(1). The switching activity is primarily accounted at architectural and Register Transfer Level

(RTL) when going for synthesis. At circuit level, other factors in dynamic power play a

dominant role [8]. Avoiding direct path from 𝑉𝐷𝐷 to ground by balancing the rise and fall time

of the transistor inputs helps in diminishing power consumption as understood by second term.

Reducing supply voltage directs to poor performance if the threshold voltage is not scaled

accordingly. To accomplish low-voltage/low-power digital designs, both supply and threshold

voltage scaling has to be taken care of as explained in literature [9–11]. Due to circuit topology,

6

the optimal operating point may vary significantly between sub-circuits, depending on the

activity and logic depth. The application of different supply voltages can impose severe area

penalties for fixed and inherent threshold voltages for the general selected processes. The

inherent variation in threshold voltages and supply will normally further reduce the advantage

of operation at ultra-low supplies [10].

Circuit realization for low power and low area has become an important issue for the growth

of integrated circuit towards very high integration density and high operating frequencies. Due

to the important role played by XOR and XNOR gates in various circuits especially in

arithmetic circuits, optimized design of XOR and XNOR circuit to achieve low power, small

size and low delay is needed. The primary concern to design XOR-XNOR gate is to obtain low

power consumption and delay in the critical path and correct output voltage swing with least

number of transistors to implement it. XOR gate is an elementary building block of digital

circuits and there is persistent research going on to enhance its performance.

So, ever since its inception, the design of XOR gates forms the basic building block of all

digital VLSI circuits which has been undergoing a considerable improvement, being motivated

by three basic design goals, viz. minimizing the transistor count, reducing the power

consumption and increasing the speed [4-25]. Hosseinzadeh, Jassbi and Navi emphasized that

the circuit performance can be improved [5] through transistor count minimization. XOR gates

play an important role in digital systems including arithmetic circuits, encryption circuits,

comparator, parity checker and so on. Enhancing the performance of the XOR gates can

significantly improve the performance of these circuits. Many design architectures and

techniques have been developed to design XOR gates with reduction in power consumption

[14]. The literature survey reveals a wide spectrum of XOR gates that have been realized over

the years. The dominant concern to design XOR gate is to acquire correct output voltage swing

with least number of transistors and additionally, implementation with low power consumption

and delay in the critical path

There are many logic styles in which XOR gates can be designed like Pass Transistor Logic

(PTL), Double Pass Transistor Logic (DPL), inverter based logic circuit, transmission based

XOR gates and XOR gates with feedback transistors [26]. These techniques involved different

methods to design XOR gates with different count of transistors with minimum transistor count

three [24, 25]. Complementary MOS uses dual networks to implement a given function [6, 7,

14]. The first part consists solely of pull-up PMOS network while a second part consists of

pull-down NMOS networks. This technique is popular and produces results that are widely

accepted but it requires more numbers of CMOS transistors. Static CMOS XOR is shown in

Figure 1.3(a). The circuit can operate with full output voltage swing. The different realization

of XOR gates through equations are given below from Equation 2-5 where A and B are the

inputs and Z is the corresponding output value.

Z = A ⊕ B = (A + B). (A′ + B′) (2)

Z′ = (A ⊕ B) ′ = {(A + B). (A′ + B′)}′ (3)

Z′ = AB + A′B′ (4)

7

Z = (AB + A′B′) ′ = A ⊕ B (5)

Alternative realization of static XOR circuit with complementary CMOS transistors using

above input-output relation is shown in Figure 1.3(b).

Figure 1.3(a)

Figure 1.3(b)

Figure 1.3. Static CMOS XOR gates

8

The early designs were also based on conventional design of XOR gates with eight transistors

[14, 16] in Figure 1.4 and six transistors [14, 16] in Figure 1.5 which were used in many

applications. The drawback of 8 transistor XOR gate was complementary inputs and no driving

capability due to transmission gates used. In Figure 1.5(a), six transistor XOR gate is shown

where, when A= “High”, the output is complementary of input B and the transmission gate has

no role. When A= “Low”, the transmission gate passes the signal B to the output end directly

and fully. So, the A’B and AB’ will give a good signal level. This function will be complete

on all the input cases. In Figure 1.5(b), an additional tailing inverter can also improve the poor

signal which comes from the output end of the 4-transistor XNOR structure, and outputs a good

signal level. For the above two cases, the complementary signal inputs are not required, and

the driving property is better than Figure 1.4 as well. However, these structures still have some

defects, such as no full driving capability at the output end, or more delay time.

Figure 1.4. Eight Transistor XOR Gate with CMOS Transmission Gate

Figure 1.5(a) Figure 1.5(b)

Figure 1.5. Six Transistor XOR Gates

9

With the course of time, designs employing four transistors came into picture [15, 16, 17, 18,

19, 20, 21, 22, 23]. D. Radhakrishnan proposed ad-hoc design techniques for implementation

employing formal design procedures using K-maps and pass network theorem [27]. The

concept realized logic implementation of XOR-XNOR circuits using pass transistors. CMOS

transmission gate logic XOR gate [14, 16] was replaced by four transistor XOR gates as shown

in Figure 1.6(a) and Figure 1.6(b) manifested by Wang, Fang and Feng [16]. For the structure

stated in Figure 1.6(a), when A = “High”, A’ must be “LOW”. A and A’ signals are connected

to the 𝑉𝐷𝐷 end of PMOS and the 𝑉𝑆𝑆 end of NMOS in the second inverter, respectively. Then

the output of the second inverter functions like a standard inverter, and outputs the signal B’.

Therefore, the output signal will be a perfect AB’ signal. On the other hand, when A = “Low”,

A’ must be “High”. The output of the second inverter will be a poor signal B because it

transmits a signal “High” by NMOS and a signal “Low” by PMOS. That is, if we use only 4

transistors to implement an XOR function, based on the inverter configuration, its output will

be complete on AB’ but poor on A’B. To improve this phenomenon, an additional transmission

gate can correct this defect as shown in Figure 1.5.

These proposed novel XOR gate architectures operated without complementary inputs which

was a major drawback in previous conventional designs of XOR gates adopting complementary

transmission gate. Later, power consumption and delay was reduced with reformed XOR gate

layout without 𝑉𝐷𝐷 as shown in Figure 1.6(c) [17]. The XOR gates demonstrated were power

supply-less XOR or P- and similarly, the XNOR gates were groundless XNOR or G- with no

ground. The output for AB=01, 10, 11 will be complete but will differ from logic low level in

case of AB=00 by a threshold value of PMOS. So, the defect of this 4 transistor XOR gate is

that the output level will be higher or lower than a normal case by threshold voltage (𝑉𝑇). The

study of diverse XOR gates in Figure 1.6(d) by Bui, Wang ,Jiang and Al-Sheraidah led to the

design of XOR gates with some improvement in PDP though the silicon area remained same

[18, 19].

Figure 1.6(a) [16] Figure 1.6(b) [16]

10

Figure 1.6(c) [17] Figure 1.6(d) [18, 19]

Figure 1.6. Previous Works on Design of 4 Transistor XOR Gates

Shams, Darwish and Bayoumi further studied various forms of XOR gate designs given by

Bui, Wang and Jiang offering a further optimization of performance [20]. A striking progress

came up with three transistor XOR gate design by Roy Chowdhury et al combining CMOS

logic with pass transistor logic [24] as shown in Figure 1.7. The XOR gate depicts the concept

of combining a CMOS inverter and a pass transistor. The design suffers from two drawbacks.

Firstly, voltage degradation due to threshold drop and secondly, current feedback due to

transistor with aspect ratio 2/1 when the inputs are A=1 and B=0. This can be overcome by

decreasing the W/L ratio of that transistor but it greatly affects the current carrying capability,

thereby, reducing the steady state power dissipation. A different version of latest three

transistor XOR gate can also be seen in Figure 1.8 and was given by Tripti Sharma,

K.G.Sharma, B.P.Singh and Neha Arora [25]. The simulation result comparison showed it to

be best among three transistor XOR gates and has minimum power, delay and PDP as

compared to other 3T XOR gates in literature.

Figure 1.7. Design of 3T XOR Gate [24]

11

Figure 1.8. Improved Design of 3T XOR gate [25]

Reducing transistor count, area and power delay product still remained the three basic goals to

refine XOR gate designs across the years coming [4-25]. With the objective of further reducing

the transistor count a novel design of a two transistor XOR gate is proposed in the thesis. The

XOR gate has been found to be implemented over lesser silicon area with huge improvement

in power-delay product.

1.2.2 Adders

Adders are indispensable in VLSI circuits and proficient employment of these adders affect the

performance of entire system [8]. High speed processing devices consumes less power and

there is a high demand for these kind of portable devices like PDAs, cell phones etc. Addition

is one of the basic and commonly used arithmetic operation for many signal processors, digital

filters, application specific Digital Signal Processors (DSPs), microprocessors and many other

diverse applications. There are many basic constraint faced by designers such as high

throughput, low power consumption, high speed and small silicon area. Adders are the essential

element which effects the entire system. Some applications of adders are in the Arithmetic

Logic Unit (ALU), the floating-point unit, subtraction, multiplication, division and for address

generation in case of cache memory access.

The purpose of integrated electronics is to compress complex electronic circuits in minimum

area with reduction in power dissipation and delay. With the era of technological advancement,

reducing the number of transistors and ultra-low power design has become the driving force

for integration of more and more applications without incurring any overhead in terms of

silicon area. The performance of design is substantially governed by three important factors

viz. area complexity, delay performance and regularity of interconnection. The regularity of

interconnection means the way transistors are laid down, routing of interconnects in the best

possible way and complying with the rules of layout. Area of the circuit also depends on the

interconnection of wires which exhaust most of the area of a VLSI circuit. Different logic styles

have been proposed over the years with a trade-off of one performance aspect at the expense

of other. The circuit delay is affected by the number of transistors in series, wiring

interconnections related to wiring capacitances, transistor sizes and number of inversion levels.

Full adder implementation can be achieved by using either one logic style or more than one

logic style. On the other hand, discussing about power which forms one of the vital resources

is a prime concern for the designers. Power dissipation depends upon power supply, switching

activity, frequency, load capacitances (made up of gate, diffusion, and wire capacitances) and

12

control circuit size. The Equation 1 explains the dependence of power on different factors and

also the issues related to it. An important criteria reducing the power consumption is reduction

of supply voltage 𝑉𝐷𝐷 and conveniently using threshold voltage at device level. However, it

leads to increase in circuit delay, degrades the drivability of the adder cells and initiates

threshold loss problem. By selecting a proper W/L ratio, the issues raised can be overcome.

Over the years, significant researches have been made for high performance adder units for

low power application and is still continuing.

A wide range of contemporary adder architectures have been surveyed in literature over a past

few decades [28-39]. Adder architectures can be classified in two broad domains, static and

dynamic. The dynamic full adders are more advantageous with respect to faster switching

speed, fewer number of transistors, full dynamic range and ratioed logic. Ratioed logic is an

attempt to reduce the number of transistors required to implement a logic function, often at the

cost of reduced robustness and extra power dissipation. The number of transistors required for

static is 2N versus N+2 transistors for dynamic logic styles for N input logic function. Regular

structure, fast logic evaluation and compact circuit layout are three pursuits of different logic

styles in history [39]. The concept of static and dynamic adder architectures are more prominent

and can be utilized in efficient way for designing full adders with large number of transistors.

The time and area requirements for various important adders are shown in Table 1.1. The CRA

stands for Carry Ripple Adder, CLA for Carry Look Ahead Adder, parallel-prefix carry look

ahead adder and CSA is Carry Save Adder. The time and area complexity of different types of

adders are defined for n number of stages in the table below:

TABLE 1.1

TIME AND AREA REQUIREMENTS OF DIFFERENT TYPES OF ADDERS [39]

TIME AREA

CRA O(n) O(n)

CLA O(log n) O(n log n)

Parallel-Prefix CLA O(2 log n) O(2n log n)

CSA O(√𝑛) O(n)

There are many logic styles in which adders can be designed like standard CMOS, Differential

Cascode Voltage Switch (DCVS), Complementary Pass-Transistor Logic (CPL), Double Pass

Transistor (DPL), Swing Restore CPL (SR-CPL) and Hybrid styles to build up a general adder

module shown in Figure 1.9. There are multiple ways to design a full adder but this thesis

presents some of the conventional adders in literature with different transistor count in order to

compare the performance of proposed design with the existing designs. The compared adders

are enumerated briefly in this chapter.

A traditional low power 28 transistor design of a CMOS full adder adopts pull-up PMOS

network and pull-down NMOS network [29, 30] but requires large chip area. A complementary

MOS logic style is built with a network of NMOS pull-down and PMOS pull-up network as

shown in Figure 1.10(a). It is advantageous in regards to robustness, reliable operation, easy

placement and routing and is also efficient due to complementary transistor pairs. Due to high

13

number of transistors, its power consumption is high. Large PMOS transistor in pull up network

result in high input capacitances, which cause high delay and dynamic power. One of the most

significant advantages of this full adder was its high noise margins and thus reliable operation

at low voltages. But the disadvantage remains intact with high input loads due to dual network

and weak output driving capability. Further, in Figure 1.10(b), 20 transistor adder design is

shown which was based on transmission gates and CMOS inverters operating with full output

voltage swing [30]. It has better critical delay, power dissipation and PDP than Conventional-

CMOS (CCMOS) and CPL. It also gives better speed than static CMOS, CPL and requires less

number of transistors. Due to high number of internal nodes, there is an increase in parasitic

capacitance [5]. In large arithmetic circuits it gives poor performance because additional

buffers are required at each output due to their weak driving capability increasing power

consumption and area. In [31] and Figure 1.10(c), 16 transistor full adder is depicted with same

operating conditions as 20 transistor full adder [30]. Though it has larger power consumption

than 20 transistor full adder but it works at higher speed. It also had less short circuit power

dissipation compared with 14 transistor full adder [32] which uses pass transistor with XOR

and XNOR gates as shown in Figure 1.10(d), where 𝐴 ⊕ 𝐵 is generated by inverter [32]. This

adder has improved output than single logic adder. This adder has reduced number of

transistors and power dissipating nodes but it has less driving capability and noise immunity

[36]. With ongoing research to reduce transistor count, many versions of 10 transistor adder

were proposed [18, 19]. Initially, one of the 10 transistor, PTL based static energy recovery

full adder was proposed which suffered with the shortcoming of speed and severe threshold

loss [33, 34, 35]. Later, a systematic study led to improved version of 10 transistor full adder

by Bui, Wang and Al-Sheraidah [17] comprising of XOR, XNOR, sum and 𝐶𝑜𝑢𝑡 modules. But

with 2 to 1 MUX and two pass transistor based XOR, Fayed and Bayoumi proposed another

more efficient 10 transistor full adder [35]. Still, the threshold loss problem persisted in the

designs which was later minimized in 10 transistor full adder reported in [36] known as

Complementary and Level Restoring Carry Logic (CLRCL) adder shown in Figure 1.10(e) by

Lui, Hwang, Sheu and Ho. However, the CLRCL had complimented 𝐶𝑖𝑛, which increased the

number of transistors and also has large stage delays for 𝐶𝑜𝑢𝑡 and Sum. In order to overcome

the problems in 10 transistor full adders and to fulfil the urge of lesser number of transistor, 8

transistor logic was implemented as shown in Figure 1.10(f) by combining CMOS logic with

pass transistor logic [24]. The 8 transistor full adder gives output with maximum two stage

delay. The noise margin is substantially increased by proper sizing of 3 transistor XOR gate.

The PDP and area has been found to be better than existing 10 transistor and 14 transistor full

adders but the design suffers from higher power consumption due to short circuit current.

14

Figure 1.9. Adder cell with Three Modules

Module 1: Generate XOR and XNOR functions

Module 2: Sum function

Module 3: Carry function

Figure 1.10(a). 28 Transistor Full Adder [29, 30]

15

Figure 1.10(b). 20 Transistor Full Adder [30]

Figure 1.10(c). 16 Transistor Full Adder [31]

16

Figure 1.10(d). 14 Transistor Full Adder [32]

Figure 1.10(e). 10 Transistor Full Adder [34]

17

Figure 1.10(f). 8 Transistor Full Adder [24]

Figure 1.10. Topologies of Different Full Adder Designs with Reduced Number of

Transistors over the years

This thesis described a novel implementation of 2T XOR gate with reduced transistor count

and thus, least silicon area. The design of 2T XOR gate is based on two PMOS transistors. The

2T XOR gate is used for the design of a 6T full adder.

1.2.3 Compressors

A lot of study has been done for the implementation of fast and efficient adders and multipliers.

The choice of implementation techniques and technologies are the two important criteria of

VLSI industry. As illustrated in the previous chapters, an efficient growth is seen in the

integration of circuit components with limited silicon area [4-25]. The continuous urge for

integration of more and more components on minimum area of silicon has galvanized the

scientists and researchers to employ new trends and techniques.

Multipliers are the central arithmetic block and multiplication is imperative for many DSPs,

general purpose processors, and digital filters etc. [40-44]. Multiplication is a complex

operation that involves three principal stages [45, 46] i.e. 1.) Partial product generation 2.)

Partial product reductions 3.) Final carry propagating addition. Second phase being imperative

for overall performance of processors, reducing the critical path and minimizing time and

power deserves ultimate attention for power proficient design. Compressors are considered as

intermediate PEs (Processing Element) for accumulation of partial product in multiplication.

18

Compressors dictate the overall critical path of the circuit and has led to high speed and reduced

power over the decades [47, 48, 49]. Compressors play an important role in the implementation

of partial product addition in multiplier algorithms. A vast study of compressors is done in

order to minimize the computation complexity for multiplication and thus, higher blocks of

arithmetic circuits.

The simplest and widely used compressors are 3:2 and 4:2 compressors which have been

modified efficiently over the decades for improved results. The conventional design is

illustrated in Figure 1.11(a) and Figure 1.11(b). The conventional adders are the chain of Full

adders which generates carries and sum at each level. There was a delay while generating the

final MSB bits of result. During the partial product addition, the conventional adders are not

enough to reach the time constraints. The carry travels through one adder to another adder. This

generates a larger delay for carry propagation and ultimately efficiency of total circuit goes

down. The compressors are used to minimize delay and area which leads in increasing the

performance of circuit. Compressors dictates the overall critical path of the circuit [47-49]. This

chapter constitutes novel compressor architecture replacing XOR gates in critical path with

MUX to improve overall performance [50-52]. A contemporary design of 5:3 compressor using

full adders and half adder is shown in Figure 1.11(c). Further optimization of 5:3 compressor

is exhibited in S. Chowdhury, A. Banerjee and H. Saha topology in Figure 1.11(d) [50].

Figure 1.11(a). Conventional Design of 3:2 Compressors

19

Figure 1.11(b) Conventional Design of 4:2 Compressors

Figure 1.11(c). Conventional Design of 5:3 Compressors

20

Figure 1.11(d). Existing Implementation of 5:3 Compressors

Figure 1.11. Different Implementations of Compressor Designs

The current work in this thesis presents two architectures of 5:3 compressor based on

XOR/MUX implementation. The idea implemented has two characteristics. Firstly, employing

two transistor 2x1 multiplexer in lieu of XOR gates diminishing the critical path delay.

Secondly, using the proposed novel design of two transistor XOR gates for minimum silicon

area symbolizing global enrichment of performance. It also reduces the stage delays compared

with the previous designs.

1.2.3.1 MUX vs XOR-XNOR

The most primitive and common topology of MUX and XOR-XNOR circuits over the years is

shown in Figure 1.12(a) and 1.12(b) [29]. The inputs are A and B with outputs O and O’. O’ is

the complement of O. S is the select line. O is obtained as 𝐴 ⊕ 𝐵. The complement of O is

known as XNOR gate denoted by logic Equation: A’.B’ + A.B.

21

Figure 1.12(a). MUX Design

Figure 1.12(b). XOR-XNOR Design

Figure 1.12. CMOS Implementations of (a) MUX (b) XOR-XNOR

In Figure 1.12(a), it is evident that transistor switching is formerly attained if both select bit

and complement bit be accessible before the input, leads to global reduction of delay [29].

Thus, eliminating additional inverter stage gives way to low power consumption and area [7].

22

1.2.4 Multipliers

Multipliers play a vital role in any electronic hardware whether it is digital signal processors

(DSPs), digital filters or general purpose processors [40 – 44]. Digital signal processors are

used to perform the common operations such as video processing, filtering and Fast Fourier

Transform (FFT). Such modules perform extensive sequence of multiply and accumulate

computations. A large number of transistors with high switching transitions is used to perform

variety of multiplication operations. For example, in 64 point radix-4 pipelined FFT processor,

multiplier consumes 30% power and occupies 46% chip area. Therefore, with the generation

of advancing technologies, researchers over the decade have been focussing on prime issues

in-order to design multipliers. The desired targets are high speed, low power consumption,

packed and balanced layout, regular interconnection and least silicon area. Power consumption

is the most important concern of all the parameters and thus lot of researches have been made

in literature to reduce it for the implementation of basic units. The reduction in power

consumption of basic unit leads to dwindling of power consumption for the whole system and

also least energy wastage for upcoming technologies. CMOS and pass transistor technologies

are the dominant technologies for high speed, low power and compact VLSI implementation

with their own advantages and disadvantages.

Addition and multiplication of two binary numbers are the two fundamental arithmetic

operations and used in high performance DSP systems. Chapter 3 proposes a unique and

efficient design of six transistor adder. According to the historical statics for the algorithms

performed by large systems, more than 70% instructions are dominated by addition and

multiplication [53]. So, the critical delay of whole system operation is dependent on this phase.

Therefore, we need a high speed multiplier. As the VLSI industry is expanding in computer

and signal processing applications, demand of high speed processing is increasing. So, the

designers mainly concentrate on the need of high speed and low power multipliers in-order to

manufacture high quality DSP chips. The different types of multipliers available in literature

are: parallel multiplier, Booth multiplier, Sequential multiplier, combinational multiplier,

Wallace tree multiplier.

1.2.4.1 Different Types of Multipliers

An efficient multiplier enumerates following characteristics:

1. Power: Multiplier should consume less power for high performance.

2. Area: Should occupy less area on silicon.

3. Speed: There should be less stage delays along a critical path for high speed operations.

4. Accuracy: A good multiplier should give correct result.

The multiplication operation is primarily on ‘add’ and ‘shift’ algorithm. Many variants of

multipliers have been proposed by researchers in literature for efficient and reliable

computation. The number of partial products to be added determines the performance of

parallel multipliers and defines its performance [43]. Booth algorithm was proposed to design

a booth multiplier with reduced number of partial product addition as it is the most critical

stage in multiplier design. Many other modifications have come up for booth multiplication in

literature [54]. In order to gain high speed, Wallace tree algorithm can be used to reduce number

of sequential adding stages explained extensively in [42] and shown through an example of 8

23

bit x 8 bit multiplier in Figure 1.13 [55]. Later, modified booth algorithm and Wallace tree

multiplier techniques combined to explore new ideas and had advantages of both techniques in

one multiplier. However, a major disadvantage in terms of low speed, increase in silicon area

due to irregularity of structure and increase in power consumption because of complex routing

may be the outcome of increasing parallelism. Hence, a serial-parallel multipliers can be

designed for better area and power, compromising the speed. The design of serial-parallel

multiplier is presented in Figure 1.14. With the above defined metrics, it is clearly visible that

type of multiplier employed depends actually on the nature of application. So, array based

multiplier has been used as it consumes low power and have relatively good performance as

compared to Wallace tree multipliers. In other multipliers, additional hardware is required to

improve the performance, but at the cost of increased layout and parasitic. On the other hand,

array multiplier has smaller and regular layout. Therefore, array multiplier is a better choice

due to its lower power consumption, smaller layout and relatively good performance [56-58].

Figure 1.13. Algorithm for 8 bit x 8 bit Wallace Tree Multiplier [55]

24

Figure 1.14. Serial- Parallel Multiplier

The main computational kernel of DSP architectures is the Multiply-Accumulate (MAC) unit

[59, 60]. It computes the product of two numbers and adds the product to an accumulator. The

energy consumption at each level will affect the overall power of MAC unit. An 8 bit MAC

unit has been formulated in 65-nm technology extending the use of 2T XOR gates, 6T adders

and 8 bit x 8 bit multipliers for DSP architectural operations and exhibiting the utilization in

Application-Specific-Integrated-Circuits (ASICs).

1.3 Motivation

IC technology has emerged profoundly in the 1960s when Gordon Moore, then with Fair-child

Corporation and later the founder of Intel, envisioned that the number of transistors that can be

integrated on a single die would grow exponentially with time (this prediction later was called

as Moore’s Law). The integration of a few transistors (referred to as Small Scale Integration

(SSI) to the integration of millions of transistors in Very Large Scale Integration (VLSI) chips

currently in use [4] is shown in Figure 1.2. Early ICs were simple, elementary and only

employed a few couple of logic gates and flip-flops for operation. Some ICs were simply a

single transistor, along with a resistor network, performing a logic function. There have been

four generations of ICs with the number of transistors on a single chip growing from a few to

over millions, in a period of four decades.

The increasing market for complex mobile systems, which has been monitored during the last

years in the worldwide market, led the designers to take into account a fresh objective in the

design of complex digital circuits i.e. the minimization of power consumption. The high

dispersion of systems like laptop and palmtop computers, cellular phones, wireless modems

and portable multimedia applications is one of the most important reason that fuel the critical

25

importance for a low power design. The urge for minimization of power dissipation of the

system is also enforced by some thermal consideration; a large amount of the energy demanded

by a device from the power supply is converted into heat. In this way heat dissipation system

and cooling mechanisms become indispensable for the appropriate and safe operation of the

device and also for its reliability.

Over the years, continuous efforts are being employed by researchers across the globe to come

up with new designs and techniques to reduce the no of transistors, power consumption and

delay for smaller circuits. This in turn can be utilized for more complex circuits at higher

architectural levels. The XOR gates, adders, multipliers, compressors form the basic blocks for

many arithmetic circuits. Reducing power dissipation in digital circuits becomes more and

more important due to an increasing number of transistors on digital chips.

1.4 Problem Definition

The main agenda of this thesis is to design an efficient low power and high performance basic

XOR gate with least number of transistor count. Moreover, the thesis chapters also present its

application to implement bigger blocks of arithmetic circuits like full adder, compressor,

multiplier and MAC unit, making them efficient too. With this objective in mind, a unique two

transistor XOR gate has been proposed in this thesis as a basic building block for VLSI circuits

design. Further, a novel 6 transistor full adder, 5:3 compressor and 8 bit x 8 bit multiplier has

also been proposed in the upcoming chapters.

1.5 Contribution of the Thesis

The current thesis contributes:

A novel approach to design the primary building block of arithmetic circuits i.e. XOR gate with

two transistors.

Another contribution is the design of six transistor adder which is the voltage mode adder with

smallest number of transistors count reported so far.

A high performance 5:3 compressor is designed from novel two transistor XOR gate and

compared with other designs in literature.

The architecture is further extended with the multipliers where the fundamental block is adder.

Adders and multipliers combine in a fashion to come up with the MAC (Multiply and

Accumulate) unit which is used in DSPs and microcontrollers at the industrial level.

A MAC unit design is also implemented in 65-nm technology for depicting its application at

higher level of microprocessors.

The basic two transistor XOR gate reduces power consumption, area, PDP (Power Delay

Product) and EDP (Energy Delay Product) which serves its purpose at higher level of designing

digital signal processors, FIR filters, general purpose processors etc.

26

1.6 Thesis Organization The primary goal of this thesis is to demonstrate the circuit level approach of design which

demands high speed and low power dissipation.

The thesis is organized as follows:

Chapter 1 gives an introduction of the background details and further implementation of the

research work.

Chapter 2 proposes the design of novel two transistor XOR gate. Also, the simulation and

performance analysis of proposed XOR gate with the existing XOR gates in literature has been

tabulated.

Chapter 3 employs 2T XOR gate to formulate six transistor adder with only two stages delay

diminishing logic depth. It has also been elaborated with the simulation results of the

comparison of proposed adder with the conventional adders available in literature in terms of

power, delay and area.

Chapter 4 depicts the application of the novel two transistor XOR gate and six transistor adder

in the implementation of 5:3 compressors and its comparison with the conventional designs.

Chapter 5 shows the application of proposed 6T adder in an 8 bit x 8 bit multiplier and also the

performance analysis with respect to power, delay, PDP (Power Delay Product) and area of 8

bit x 8 bit multipliers designed with different adders in literature. The continuation of the work

has been shown by the design of 8 bit MAC unit in 65-nm technology.

To the best of my knowledge, it is the voltage mode full adder with least number of transistor

count designed so far. To show the uniqueness of proposed model, simulations are executed in

three (65-nm, 90-nm and 130-nm) different technologies. The power and delay simulation of

XOR gates and adders have been carried along with area comparison of adders. The entire

simulation has been carried out using Cadence Spectre with ASSURA to verify the area of the

proposed and existing designs.

Further chapters describe Conclusion, Related Publications and Bibliography.

27

Chapter 2

Design of 2T XOR Gate

This chapter proposes design of the novel two transistor XOR gate and its comparison of

performance with existing XOR gates found in literature. The comparison has been made in

terms of power, delay and Power Delay Product (PDP) in three technologies i.e. 65-nm, 90-

nm, 130-nm with Process, Temperature and Voltage (PVT) variation analysis.

The chapter is organized as follows: Section 2.1 lays down the basic idea of XOR gates. Section

2.2 proposes the design of novel 2T XOR gate with 2.2.1 explaining its operation in detail.

Section 2.2.2 presents the simulation results of performance analysis in form of tables and

figures and Section 2.2.3 discussing the results obtained.

2.1 What is a XOR Gate? XOR gate is a logic gate formally named as exclusive OR and performs the operation defined

as “either A or B” where A and B are the inputs to two input XOR gate and Y is the output as

shown in Figure 2.1. The idea of XOR gate is to operate as a switch for on and off purposes

and are highly useful for circuits which compare values for equality, compute checksums or do

some arithmetic computations. Exclusive OR can be found useful in everyday life and a simple

example of the switching system has been illustrated as follows in Figure 2.2. The system

works only when both switches are in the same position. Glowing of light means binary value

0 and darkness means binary value 1. The system can be turned on or off with any one of the

switches, independent of the position of the other switch. So, if both the switches are in same

position there is light and for different combinations there is darkness. The different

combinations for darkness explains the XOR outcome.

Figure 2.1. Logic Symbol of XOR Gate

28

Figure 2.2. The wiring diagram depicting the control of single light source with two

switches. The light is on when either both switches are switched up or both down

For the two input XOR gate, the gate has two inputs (A, B) and one output (Y) with four

different combinations of input values 00, 01, 10 and 11. XOR element has logic value 1 when

the inputs are at different logic levels. This means, it produces 0 for input combinations 00 and

11, and 1 for combinations 10 and 01 [14]. This operation is also called exclusive disjunction

and can be written in Equation 6 as follows,

𝐴 ⊕ 𝐵= A’.B + A.B’ (6)

2.2 Design of the Proposed Two Transistor XOR Gate The literature survey in Chapter 1 has led to the evolution of XOR gates up to three transistors.

Thus, with the view to further minimize the transistor count efficiently, Figure 2.3 shows the

novel proposed two transistor (2T) XOR gate. The design of the 2T XOR gate is based on two

PMOS pass transistors and a negative reverse biased bulk voltage. The central idea is to obtain

correct logic values of XOR logic by changing 𝑉𝑇 (threshold voltage of PMOS) of the circuit

and modifying the voltage values of bulk terminal i.e. 𝑉𝑆𝐵 [61]. The PMOS transistors act as

pass transistor which are much more efficient in terms of speed than CMOS. Basically, pass

transistor is a logic in which source side is connected to input signal rather than the supply

voltage as shown in Figure 2.3. This reduces number of transistors, runs faster and requires less

power than CMOS logic. But the disadvantage is the reduction in number of active devices in

subsequent stages due to small voltage difference at each cascaded stage. Therefore, the logic

devices channelled in series would require to restore the signal voltage at that stage as each

transistor in series is less saturated at its output than at input [29].

The rationale behind the implementation of two transistor XOR gate is firstly, to further reduce

the transistor count with the reduction in power and silicon area. And secondly, to apply it

further for the implementation of bigger modules of digital circuits as an application. The main

aim was the transition from 3T XOR (CMOS + pass transistor PMOS logic) gate to 2T XOR

(two pass transistor PMOS) gate keeping certain design constraints in mind like:

1. 3T XOR gate constitutes CMOS logic and a pass transistor gate. CMOS logic is

complex, expensive and slower in fabrication as compared with PMOS or NMOS.

29

2. The reduction in number of transistors is to compete with lower power dissipation given

by CMOS logic w.r.t PMOS or NMOS and to generate a design with minimum power

dissipation.

3. Compared with CMOS logic, PMOS only logic is faster in fabrication, less complex,

symmetric and less expensive because the wafers used in fabrication generally have n-

type substrate and to create NMOS, n-wells of PMOS transistors are used as substrate

[1].

4. Compared with NMOS only logic, PMOS only logic has less flicker noise since the

mobility of PMOS is less than NMOS [62]. Flicker noise is directly proportional to

mobility as can be inferred from [63]. It will help in reducing the noise in bigger circuits

derived from smaller module.

5. PMOS only logic is used for the implementation as NMOS only logic will not be able

to utilize the logic behind biasing of substrate to give correct logic level for XOR gate

with minimum transistor count of two.

The efficient implementation of design at smaller block level will lead to optimization of VLSI

circuits and can be used for many electronics devices and signal processing application.

2.2.1 Working of the 2T XOR gate

When A=1 and B=0, M2 transistor turns ON and M1 is normally off but due to Gate Induced

Drain Leakage (GIDL) [1, 6] effect in transistor M1, the output is pulled to logic high. Similar

is the situation with A=0 and B=1. However, for A=1, B=1 both the PMOS transistors are OFF

and the substrate bias sets up an appropriate bulk to drain reverse bias leakage current to give

logic 0 at the output. With A=0 and B=0, both the PMOS transistors are ON and hence, it is

natural to get a logic 0 at the output. But because of the bias given at the substrate, a small bias

appears across the output, which has been observed to be well below the switching threshold

voltage. The relation exhibited between channel length (L), width (W), substrate to bulk

voltage (𝑉𝑆𝐵) of transistor is as shown in Equation 7 and cited in [61].

(7)

Where,

𝑉𝑡0 = Zero bias threshold voltage,

𝛾 = Bulk threshold coefficient,

𝑉𝑆𝐵 = Bulk Potential,

𝜑0 = Fermi Potential,

𝑡𝑜𝑥 = The thickness of the oxide layer,

𝑉𝐷𝑆 = Drain to source voltage,

𝛼𝑣, 𝛼𝑤 = Process dependent parameters.

30

Figure 2.3. Proposed Design of 2T XOR Gate

The effect of 𝑉𝑆𝐵 on the channel can be most conveniently seen as a change in the threshold

voltage 𝑉𝑇. Specifically, if the PMOS substrate biasing is increased, the threshold voltage

decreases. A PMOS transistor has n-type substrate and p+ type drain and source regions. When

a negative reverse bias voltage is applied the electrons of n-type substrate are repelled. Due to

this, current flows in reverse direction of the flow of electrons with p+ type drain or source

regions and n-type substrate acting like a forward bias diode. Thus, a voltage drop occurs across

the junction making the circuit to pull down to lower logic value. A reverse bias voltage of 320

mV source has been used in 65-nm technology. Table 2.1, shows the output values obtained

after the simulation.

TABLE 2.1

SIMULATION LOGIC LEVELS OF 2T XOR GATE AT REVERSE BIAS OF 320 mV

USING 65-nm TECHNOLOGY

INPUT INPUT OUTPUT

A(V) B(V) Y (mV) approx.

0 0 0.0

0 1 700.0

1 0 700.0

1 1 200.0

The body biased value (𝑉𝑆𝐵 = 𝑉𝑟𝑒𝑓) in the circuit has been engineered through voltage divider

circuit from supply voltage (𝑉𝐷𝐷)as shown in Figure 2.4. A diode connected NMOS circuit has

been implemented with consideration of noise sensitivity, area and PVT variations. The circuit

of body biasing includes 1.114 𝜇𝑚2 area every time, when included in the circuit but it still

lays down the area of 2T XOR, 6T adder and 8 bit x 8 bit multiplier less than its peer design as

shown in Table 2.2, Table 3.5 and Table 5.2 later in the thesis. The noise including both flicker

31

noise and thermal noise are within the range of design consideration for all frequencies. The

circuit passes all the process corners. Flicker noise is a type of electronic noise proportional to

1/f (i.e. inversely related to frequency of the electronic circuit) and depends on the area of the

circuit [64]. It is due to traps near Si/SiO2 interface that randomly release and capture carriers.

The bias circuit is affected less by noise on increasing the frequency and is found to be -131

dB for 1 MHz frequency. The theoretical value of flicker noise from Equation 8 has been found

to be -149 dB. The difference in values is accounted for non-ideal width and length of NMOS

transistors practically and material defects. But both the values of noise are negligible to effect

the bias circuit. Thermal noise and resistor noise are very low for the circuit at all frequencies.

So, the circuit acts well in terms of noise. Since, the effect of flicker noise is negligible on the

bias circuit, it acts well with smaller areas of PMOS and NMOS transistors. The diode

connected NMOS requires less area and delivers high mobility to the circuit as compared to

diode connected PMOS. The mobility and area has been taken as an advantage, over the

disadvantage of flicker noise in NMOS only bias circuit. Also, the effect of noise is very less

even for NMOS only logic at high frequencies. A trade-off is preserved in terms of power and

area for conventional diode connected NMOS and supply independent reference voltage

circuits (other method to implement bias circuit).

Flicker noise = K/𝐶𝑜𝑥𝑊. 𝐿. 𝑓 (8)

Where,

K = process dependent constant with the order of 10−25 𝑉2𝐹,

𝐶𝑜𝑥 = capacitance per unit gate area= 3.9𝜀𝑜/𝑡𝑜𝑥= 17.7 x 10−3 F,

W = width of NMOS transistor,

L = Length of NMOS transistor,

f = frequency of operation.

32

Figure 2.4. Diode Connected NMOS

2.2.2 Simulation and Performance Analysis of Proposed 2T XOR Gate

Extensive simulation study of XOR gates has been carried out to compare the proposed design

of the 2T XOR gate with existing designs of XOR gates available in literature. The circuits are

simulated in similar testing environment. In order to cover all different input combinations, the

XOR gates have been studied for four different input patterns. All the simulations and

extraction of net-lists have been done in Cadence Spectre at 65-nm, 90-nm and 130-nm

technologies. The designs are simulated at 50 MHz frequency with 50 ps of rise and fall times.

The proposed and existing XOR gates have been simulated and examined thoroughly.

Comparisons have been made with the existing peer designs in terms of power, delay and

Power Delay Product (PDP).

Figure 2.5 shows the simulation results of 2T XOR gates in 65-nm technology in Cadence with

270 mV and 320 mV reverse bias bulk voltage for analysing the effect of changing threshold

voltage according to Equation 7. The waveforms at different input combination are worth

observing when output logic low is altered by change in 𝑉𝑇 where A, B are the input voltage

waveforms in volts and Y is the output voltage waveform in millivolts. So, it is evident from

the waveform in Figure 2.5(b), at 270 mV, the waveform is less close to logic 0 value compared

to the logic 0 value at 320 mV. This is because threshold value at 320 mV is less than at 270

mV. Thus, voltage drop at 320 mV is more than 270 mV as explained in 2.2.1 section.

Therefore, increasing reverse bias voltage will result closer logic 0. But a trade-off has to be

maintained for logic 0 and logic 1 values for achieving correct digital logic levels for XOR

gates and an appropriate reverse bias voltage is chosen for different technologies.

33

Figure 2.5(a) XOR Gate Simulation at Reverse Bias of 320 mV

Figure 2.5(b) XOR Gate Simulation at Reverse Bias of 270 mV

Figure 2.5. Input and Output Waveforms of XOR Gate

34

Figure 2.6. Calculation of Propagation Delay

Table 2.2 depicts power, delay and PDP (product of power and delay) of the proposed novel

XOR implementation as well as existing XOR gate designs in the literature. The propagation

delay is measured by measuring the time difference between transition of input and output logic

levels to 50% of their values as shown in Figure 2.6. For this, there is a change of input voltage

value to result a change in the output voltage value. The average of all the combination of

outputs 00,01,10,11 of rise and fall times have been taken into account for the evaluation of

delay in XOR gates. So, the delay is measured as the average of all the signal transition levels

of the circuits. The Equations (9), (10) and (11) explains the propagation delay in the MOS

transistors given as,

𝑡𝑝 = 0.69. 𝑅𝑜𝑛. 𝐶𝐿 (9)

𝑅𝑜𝑛= 3

4.

𝑉𝐷𝐷

𝐼𝐷𝑠𝑎𝑡. (1 −

7

9. 𝜆. 𝑉𝐷𝐷) (10)

𝐼𝐷𝑠𝑎𝑡= 𝜇𝑝. 𝐶𝑜𝑥.𝑊

𝐿. (𝑉𝐺𝑆 − 𝑉𝑇)2 (11)

Where,

𝑡𝑝= Propagation delay of the circuit,

𝑅𝑜𝑛= Resistance of PMOS which is ON,

𝐶𝐿= Load capacitance = 0.05 fF,

𝐼𝐷𝑠𝑎𝑡= Current when PMOS is in saturation,

35

𝜆 = Channel length modulation of PMOS,

𝜇𝑝. 𝐶𝑜𝑥= constant of technology = 23.21 𝜇𝐴/𝑉2

TABLE 2.2

COMPARISON OF PERFORMANCE ANALYSIS OF DIFFERENT XOR GATES

Type of XOR gate Technology

(nm)

Avg.

power

(µW)

Avg.

Delay

(ps)

PDP

(𝟏𝟎−𝟏𝟖 𝐉)

6T (Fig(1.5)) [15] 65 2.1880 15.600 34.132

4T (Fig(1.6(a))[16] 65 0.8660 8.125 7.036

4T (Fig(1.6(b))[16] 65 0.0650 13.250 0.861

4T (Fig(1.6(c))[17] 65 0.0340 14.625 0.497

4T (Fig(1.6(d))[18, 19] 65 0.0470 9.125 0.428

3T (Fig(1.7))[24] 65 0.0320 4.253 0.136

2T 65 0.0035 9.375 0.033

6T (Fig(1.5)) [15] 90 6.6300 18.625 123.483

4T (Fig(1.6(a)) [16] 90 0.1790 11.375 2.036

4T (Fig(1.6(b))[16] 90 0.1430 11.500 1.644

4T (Fig(1.6(c))[17] 90 0.0960 12.125 1.164

4T (Fig(1.6(d))[18, 19] 90 0.1020 11.125 1.134

3T (Fig(1.7))[24] 90 0.0410 9.750 0.399

2T 90 0.0039 19.375 0.075

6T (Fig(1.5))[15] 130 19.884 35.750 710.853

4T (Fig(1.6(a))[16] 130 4.4520 14.250 63.441

4T (Fig(1.6(b))[16] 130 0.4896 26.750 13.096

4T (Fig(1.6(c))[17] 130 0.2788 27.194 7.582

4T (fig(1.6(d))[18, 19] 130 0.3147 23.740 7.471

3T (Fig(1.7))[24] 130 0.1179 18.750 2.210

2T 130 0.0090 35.370 0.318

The mathematical evaluation of propagation delay with 𝜆 = 0 (no channel length modulation)

is equal to 19. 79 ps in 65-nm technology for the critical path. The deviation is due to other

secondary effects of PMOS which is prominent as the technology scales down and also due to

consideration of ideal channel length modulation.

36

The same input setting is followed for measurement of power. There is gradual decline of

power from 6T XOR gate to 2T XOR gate with moderate falling off PDP. The power delay

product is calculated by multiplying the average power with the average delay. A contemporary

idea of sacrificing power for delay can be studied from previous designs eventually giving

minimal PDP [65]. The dominant power dissipation of the proposed circuit is dynamic power

dissipation which depends on the switching transitions. Compared with 3T XOR gates, 2T

XOR gates have less dynamic power dissipation due to less number of transistors which leads

to lesser switching transitions. The dynamic power dissipation for frequency 50 MHz and load

capacitance of 0.05 fF will be theoretically equal to 2.5 nW. The difference is due to additional

power consumption due to static, leakage power and other prevailing secondary effects. Since

pass transistor logic is used which has variable input gates rather than constant power lines,

only one signal path will be active at a time to avoid short between inputs. Thus, giving small

power dissipation than CMOS logic of 3T XOR gate. Thus, practically simulating the design,

it is evident from Table 2.2 that there is a large drop in power consumption for the 2T XOR

gates. The comparison of performance analysis of different XOR gates has also been pictorially

represented through histograms in Figure 2.7. The histogram clearly indicates the difference in

PDP levels from 6 transistor XOR gate to proposed 2 transistor gate in 65-nm, 90-nm and 130-

nm technologies. The bars for 4T A, 4T B, 4T C and 4T D is for four transistor (4T) XOR gates

in figures 1.6(a), 1.6(b), 1.6(c) and 1.6(d) respectively.

Figure 2.7(a) Comparative Analysis of PDP of XOR Gates at 65-nm Technology

37

Figure 2.7(b) Comparative Analysis of PDP of XOR Gates at 90-nm Technology

Figure 2.7(c) Comparative Analysis of PDP of XOR Gates at 130-nm Technology

Figure 2.7. PDP (vs) Technology for XOR Gate Architectures

Sub threshold leakage which contributes in total power dissipation occurs when devices are in

off state i.e. 𝑉𝐺𝑠 = 0. In order to sustain the improvement in gate delay for digital circuits with

scaling of technology, MOSFET devices must be scaled aggressively in terms of threshold

voltages. However, the reduction in device threshold voltage will lead to exponential increase

in subthreshold leakage. The expression for drain current of PMOS in subthreshold region is

depicted from Equation (12) and (13) below which explains the above effect.

38

𝐼𝐷 = 𝐼0. (𝑊

𝐿) exp (𝑘.

𝑉𝐺

𝑈𝑇) . [exp (−

𝑉𝑆

𝑈𝑇) − exp (−

𝑉𝐷

𝑈𝑇)] (12)

𝑈𝑇 = 𝐾𝑇/𝑞 (13)

Where,

𝐼𝐷 = Drain current of PMOS,

𝐼0 = Process dependent constant,

𝑊 = Width of PMOS,

𝐿 = Length of PMOS,

𝑘 = Gate coupling coefficient,

𝑉𝐺 = Gate voltage,

𝑉𝐷 = Drain voltage,

𝑉𝑆 = Source voltage,

𝑈𝑇 = Thermal voltage = 26 mV.

The above equation shows the dependence of threshold voltage on drain current. Due to

topology of the proposed 2T XOR gate and appropriate biasing, the gate leakage power has

been found in range of femto watts (fW) for the proposed XOR circuit in all the three

technologies via 65-nm, 90-nm and 130-nm. The leakage power obtained is too small to effect

the overall consumption of power. The leakage power is calculated by computing the gate

leakage current of individual transistor in the circuit when the transistors are switched off.

Then, total leakage current is simply the sum of individual leakage current of all the gates.

The leakage currents are mainly a reason of big concern in analog circuits. In digital circuits,

more focus is on the correct logic levels obtained and also on how to enhance the circuit to get

better logic levels. The body biasing method increases the sub-threshold leakage of the 2T

XOR gate but has negligible effect on total power [66]. The gate leakage is in range of femto-

watts (fW) equal to 263.9 fW obtained by summation of leakage power of individual gates of

PMOS transistors and is least for 65-nm due to technology scaling [67].

The XOR gates have also been scrutinized with respect to noise margins in 65-nm, 90-nm and

130-nm. Table 2.3 shows the comparison of noise margins of 2T XOR gate designs with the

design of XOR gates available in literature. Noise margin is defined as the amount of noise a

circuit can withstand without compromising the output logic level and it is input pattern

dependent [7]. Noise margin are found to be comparable. 𝑁𝑀𝐻(High Noise Margin) and

𝑁𝑀𝐿(Low Noise Margin) are studied by performing the DC analysis of circuit in Cadence to

find the Voltage Transfer Characteristics(VTC) and balancing the switching probabilities of

the two PMOS transistors at GND(logic ‘0’) and 𝑉𝐷𝐷(logic ‘1’).

The XOR gate is more extensively analysed for the impact of Process, Voltage and

Temperature (PVT) variation in 65-nm, 90-nm and 130-nm technologies. The worst-case/best-

39

case analysis had been performed by analysing the process corners of the circuits. The aim in

PVT analysis is to find the worst-case and best case performance values across all PVT corner.

In PVT-aware design, the aim design is such that it maximizes performance and meet

specifications across all PVT corners. The process variation tolerance incorporates bias voltage

change of (+/-) 100 mV (from nominal value of 320 mV in 65-nm technology), temperature

variation from -20℃ to 70℃ (from a nominal room temperature value of 27℃) and including

slow-slow, fast-fast, slow-fast and fast-slow process corners. So, the worst and best

temperatures at which the proposed XOR gate works correctly is -20℃ and 70℃ respectively

and the maximum variation of bias voltage for correct logic values is (+/-) 100 mV.

Statistically, a circuit fulfilling (+/-) 10% variation is considered an appropriate design.

TABLE 2.3

COMPARISON RESULT OF NOISE MARGIN OF DIFFERENT XOR GATES

Types of XOR gate Technology

(nm)

𝐕𝐎𝐇

(𝑽)

𝐕𝐎𝐋

(𝑽)

𝐕𝐈𝐇

(𝑽)

𝐕𝐈𝐋

(𝑽)

𝐍𝐌𝐇

(V)

𝐍𝐌𝑳

(V)

6T (Fig(2.5)) [9] 65 1.000 0.000 0.690 0.318 0.310 0.318

4T (Fig(2.6(a))[10] 65 1.000 0.000 0.667 0.357 0.333 0.357

4T (Fig(2.6(b))[10] 65 1.000 0.000 0.690 0.460 0.310 0.460

4T (Fig(2.6(c))[11] 65 1.000 0.000 0.600 0.420 0.400 0.420

4T (Fig(2.6(d))[12, 13] 65 1.000 0.000 0.630 0.450 0.370 0.450

3T (Fig(2.7))[18] 65 1.000 0.000 0.520 0.240 0.480 0.240

2T 65 1.000 0.000 0.650 0.280 0.350 0.280

6T (Fig(2.5)) [9] 90 1.000 0.000 0.680 0.480 0.320 0.480

4T (Fig(2.6(a)) [10] 90 1.000 0.000 0.699 0.372 0.301 0.372

4T (Fig(2.6(b))[10] 90 1.000 0.000 0.640 0.480 0.360 0.480

4T (Fig(2.6(c))[11] 90 1.000 0.000 0.600 0.360 0.400 0.360

4T (Fig(2.6(d))[12, 13] 90 1.000 0.000 0.600 0.400 0.400 0.400

3T (Fig(2.7))[18] 90 1.000 0.000 0.680 0.440 0.320 0.440

2T 90 1.000 0.000 0.640 0.280 0.360 0.280

6T (Fig(2.5))[9] 130 1.200 0.000 0.960 0.320 0.320 0.240

4T (Fig(2.6(a))[10] 130 1.200 0.000 0.920 0.360 0.301 0.280

4T (Fig(2.6(b))[10] 130 1.200 0.000 0.840 0.360 0.360 0.360

4T (Fig(2.6(c))[11] 130 1.200 0.000 0.960 0.264 0.400 0.240

4T (fig(2.6(d))[12, 13] 130 1.200 0.000 0.984 0.312 0.400 0.216

3T (Fig(2.7))[18] 130 1.200 0.000 0.910 0.320 0.320 0.290

2T 130 1.200 0.000 0.970 0.280 0.360 0.230

40

𝑉𝑂𝐻 = output high voltage

𝑉𝑂𝐿 = output low voltage

𝑉𝐼𝐻 = input high voltage

𝑉𝐼𝐿 = input low voltage

𝑁𝑀𝐻 = high noise margin = 𝑉𝑂𝐻-𝑉𝐼𝐻

𝑁𝑀𝐿 = low noise margin = 𝑉𝑂𝐿-𝑉𝐼𝐿

2.2.3 Results and Discussions

The waveforms in Figure 2.5 clearly demonstrate the output of a XOR gate. There is change in

the values of logic levels as the threshold voltage is varied with the variation of reverse bias

bulk voltage. Appropriate values are adjusted maintaining a trade-off between logic levels for

different technologies.

Only PMOS circuit has been used in order to stick to the idea of implementing two transistor

XOR gates because using CMOS logic won’t be able to generate the required logic with the

least number of transistors. Though CMOS has less power dissipation compared to PMOS

transistors but it is more complex and expensive. PMOS transistors are faster to fabricate,

highly controllable and reliable. Moreover, comparing with the existing XOR gates available

in literature, the two transistor only PMOS circuit will still have least power dissipation.

Secondly, PMOS is chosen over NMOS devices (though NMOS has higher mobility) because

using a reverse bias voltage for NMOS only circuit won’t be able to produce the desired logic

level for XOR gate due to its different behaviour than PMOS devices thus compromising area

(even though area occupied by NMOS is less than PMOS). It can be seen from Table 2.3 that

average delay of 2T XOR gate is more than 3T XOR gate due to employment of PMOS logic

as it has less mobility than NMOS or CMOS (which has been used for 3T XOR gate) circuit.

But the overhead is compensated by reduced power consumption. Other ways using NMOS

and CMOS transistors to implement 2T XOR gates can be incorporated as part of future

research works.

The calculation and comparison details of power, delay and PDP is compiled in Table 2.2. The

power is decreasing with the minimization of number of transistor required to design XOR

gates over the years. The power delay product for the proposed 2T XOR gate is found to be as

low as 0.033 aJ as compared with 3T XOR gate value of 0.136 aJ in 65-nm technology. The

PDP value is lowered by approximately 75.73% from 3T XOR gate to 2T XOR gate. The

highest value of PDP for 6T XOR gate is 15.6 aJ. The power and delay for different XOR gate

implementation is manifesting regular trend in 65-nm, 90-nm and 130-nm with least power

consumption for proposed 2T XOR gate. The noise margin are examined and depicted in Table

2.3. It shows the efficiency of the proposed and existing XOR gates design and its effective

employment over the decade for utilization in bigger units. The leakage power in terms of gate

leakage and sub-threshold leakage has also been determined. Process, Voltage and

41

Temperature (PVT) variations are taken into consideration by varying P, V and T over their

allowable ranges and analysing the resultant combinations or so-called PVT corners.

42

Chapter 3

Design of 6T Adder using Novel 2T XOR Gates

This chapter explains the unique design of 6T adder utilizing two 2T XOR gates described in

Chapter 2. Using XOR gates is a general way to construct adder with efficient and appropriate

operation. So, Chapter 3 is the next step towards design of adders using XOR gates which can

be utilized further at higher levels of design.

The chapter is organized as follows: Section 3.1 explains the basic operating principle of adders

and its general applications in VLSI design. Section 3.2 proposes the novel architecture of 6T

adders employing unique model of 2T XOR gates from previous chapter followed by Section

3.2.1 discussing the simulation and performance analysis of adders, Section 3.2.2 enunciating

the layout view of the proposed 6T adder model and Section 3.2.3 outlines the analysis of

results obtained in the chapter.

3.1 What is an Adder? Adders or summers, electronically signifies a digital circuit that performs addition of numbers.

It efficiently adds two digital n-bit binary numbers where n is the number of bits required.

Digital adders adds two or more binary numbers to generate two outputs as sum and carry. The

adders can be classified as half adder and full adder according to its ability or way to combine

binary numbers.

3.1.1 Half Adders

Half adder is a combinational circuit that takes two inputs A and B to produce two outputs Sum

(S) and carry (C) as shown in Figure 3.1. It is built using two logic gates, XOR gate for sum

and AND gate for carry. The input variables A and B are called as addend and augend. The

truth table describing the operation is given by Table 3.1 and the logic gate level circuit

equation governing its operation is given by Equation (14) and Equation (15) as follows:

𝑆 = 𝐴 ⊕ 𝐵 (14)

𝐶 = 𝐴. 𝐵 (15)

Figure 3.1. Circuit Diagram of Half Adder

43

TABLE 3.1

TRUTH TABLE FOR HALF ADDERS

INPUT INPUT OUTPUT OUTPUT

A B SUM(S) CARRY(C)

0 0 0 0

0 1 1 0

1 0 1 0

1 1 0 1

3.1.2 Full Adders

A 1-bit full adder circuits functionality can be summarized by Equation (16) and (17) given the

three 1-bit inputs A, B and Cin to produce outputs as Sum and carry (Cout) as shown in Figure

3.2. The logic circuit of full adder uses two half adders and one OR gate. It is usually utilized

as a cascade of adders such as ripple carry adder in which Cout of one adder is the Cin for

another adder. The critical path is defined through two XOR gates till the sum bit output. Using

only two types of gates is convenient if the circuit is being implemented using simple IC chips

which contain only one gate type per chip. The truth table is given by Table 3.2 and Equations

(16) and (17) are the governing Boolean equations as below:

𝑆𝑢𝑚 = 𝐴⨁𝐵⨁Cin (16)

Cout = A’.B. Cin + A.B’. Cin + A.B.Cin’ + A.B.Cin = Cin (A’B +AB’) + AB (Cin + Cin’)

Cout = Cin (𝐴⨁𝐵) + AB (17)

Figure 3.2. Logic Circuit of Full Adder

44

TABLE 3.2

TRUTH TABLE FOR FULL ADDER

A B 𝑪𝒊𝒏 SUM 𝑪𝒐𝒖𝒕

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 1 1 0 1

1 0 0 1 0

1 0 1 0 1

1 1 0 0 1

1 1 1 1 1

3.2 Design of Six Transistor Full Adder The proposed 2T XOR gate has been used to design a 6T full adder. The two outputs SUM and

CARRY (𝐶𝑜𝑢𝑡) can be generated based on the Boolean Equations (16) and (17) of full adder.

The approach to implement the full adder in this thesis uses two XOR gates for SUM output

and a 2 x 1 multiplexer to generate a carry output. The inputs to the circuit are A, B and 𝐶𝑖𝑛.

The critical three input XOR function of full adder required for sum bit calculation is perfectly

suited for implementation in pass transistor due to its multiplexer structure.

The exclusive ORing realized uses wired logic [34] of 2T XOR gate as depicted in Equation

(16) to give rise to sum output and the final carry output given by Equation (17) is implemented

using M5 and M6 pass transistors. The W/L ratio of M5 and M6 transistors are W=300 nm,

L=60 nm in 65-nm technology. The W and L of transistors from M1 to M4 is same as defined

for 2T XOR gate. The schematic of the proposed six transistor full adder is shown in Figure

3.3. A reverse bias voltage of 320 mV is kept in order to represent the appropriate logic high

and logic low levels at the output of simulated circuit for 65-nm technology. Evidently, for the

three input combination there is a two stage delay for the sum and carry output. The delay for

carry output is less than the previously designed eight transistor adder [24] (as explained later

in this chapter) which is the critical delay of the circuit used for further finding the PDP [65].

The approach of using minimum width and length is for minimizing the power consumption in

the circuit [61]. The concept of the design has been simulated in three technologies viz. 65-nm,

90-nm and 130-nm and proper reverse bias voltages have been applied for different

technologies to achieve the desired output.

45

Figure 3.3. Schematic Diagram of Proposed 6T Adder in 65-nm Technology

3.2.1 Simulation and Performance Analysis of Proposed 6T Full Adder

The proposed 6T adder is simulated in Cadence environment at 65-nm, 90-nm and 130-nm

technologies. The input and output voltage waveforms for the simulated schematic of adder in

Figure 3.3 is shown in Figure 3.4. The output waveform is given for all the three input

combinational logic as it responds differently for different input patterns. The post layout

simulation of adder is performed using the proposed 2T XOR gate. The circuits are simulated

at 50 MHz with rise and fall times of 50 ps.

The voltage difference is very small in the circuit but it shows the correct output logic levels

as desired for the sum and carry. As the voltage swing for the inputs is 1V in 65-nm technology,

the output value above 0.5V is considered as logic ‘1’ and below that is considered as logic ‘0’.

The voltage degradation in the waveform for the sum output is the result of cascaded XOR

gates which has been implemented through pass transistor logic. It means that the proposed

XOR gate has the strength to drive only one XOR gate without any extra circuitry with correct

logic levels. Moreover, here the circuits have been analysed with minimum width of transistors.

So, the voltage level difference can be increased by increasing the width of the transistors but

that will increase the silicon area. Also, for further implementation of the adder circuit in bigger

modules appropriate use of buffers, inverters or comparator circuits can be employed, if

required, to achieve higher voltage difference. Proper level restorer circuits can be used at

different points of the circuit to reduce the effect of voltage degradation and noise [68].

Consequently, relevant reverse bias value is used for proper operation of adder. Typically, the

46

width of the transistors used for implementing the actual circuits is minimum width for XOR

gate and with such widths the difference in voltage levels between logic high and logic low are

as high as 138 mV for sum output. The voltage swing is found to be much higher around 400

mV for carry output because of 5X width of transistor M5 and M6.

Figure 3.4. Post Layout Simulation of 6 Transistor Adder at 65-nm Technology

The comparative performance analysis of different adders in terms of power, delay and PDP

has been shown in Table 3.4 and also pictorially through histograms in Figure 3.5 exploring

28T, 20T, 16T, 14T, 10T, 8T available in literature with the proposed 6T adder. The results

indicate that the power delay product of 6T full adder is much less than the other adders

available in literature. The 8T and 6T adders have been designed using the 3T XOR gate

available in literature [24] and proposed 2T XOR gate in the thesis.

47

TABLE 3.4

COMPARISON OF PERFORMANCE ANALYSIS OF DIFFERENT ADDERS

Types of adder Technology

(nm)

Avg.

power

(𝝁𝑾)

Avg.

delay

(ps)

PDP

(𝟏𝟎−𝟏𝟖𝑱)

28T (Fig 1.10(a))[29,30] 65 0.481 11.875 5.711

20T (Fig 1.10(b)) [30] 65 0.317 7.812 2.476

16T (Fig 1.10(c)) [31] 65 0.393 4.625 1.817

14T (Fig 1.10(d)) [32] 65 0.511 3.187 1.628

10T (Fig 1.10(e)) [34] 65 0.129 11.625 1.499

8T (Fig 1.10(f)) [24] 65 0.127 8.625 1.095

6T 65 0.439 1.935 0.849

28T (Fig 1.10(a))[29,30] 90 0.806 21.750 17.530

20T (Fig 1.10(b)) [30] 90 0.281 9.812 2.757

16T (Fig 1.10(c)) [31] 90 0.318 8.500 2.703

14T (Fig 1.10(d)) [32] 90 0.610 4.320 2.635

10T (Fig 1.10(e)) [34] 90 0.665 3.750 2.493

8T (Fig 1.10(f)) [24] 90 0.232 9.500 2.204

6T 90 0.685 2.625 1.798

28T (Fig 1.10(a))[29,30] 130 7.107 20.680 146.972

20T (Fig 1.10(b)) [30] 130 3.768 14.750 55.578

16T (Fig 1.10(c)) [31] 130 5.547 8.500 47.149

14T (Fig 1.10(d)) [32] 130 6.572 3.375 22.180

10T (Fig 1.10(e)) [34] 130 1.510 14.218 21.469

8T (Fig 1.10(f)) [24] 130 1.590 10.437 16.594

6T 130 3.962 3.875 15.352

48

Figure 3.5(a) Comparative Analysis of PDP of Different Adders at 65-nm Technology

Figure 3.5(b) Comparative Analysis of PDP of Different Adders at 90-nm Technology

Figure 3.5(c) Comparative Analysis of PDP of Different Adders at 130-nm Technology

Figure 3.5. PDP (vs) Technology for Adder Architectures

49

The 6T adder is found to behave correctly for all the five process corners namely typical, slow-

slow, fast-fast, slow-fast, fast-slow with bias voltage variation of (+/-) 20 mV and temperature

variation from -10℃ to 40℃ in 65-nm, 90-nm and 130-nm technologies. The dominating

factors of MOSFETs i.e. threshold voltage and (W/L) ratios are randomly varied for different

values to conclude the analysis. Conceptually and practically, due to reduced voltage swing,

the PVT ranges of 6T adder vary from 2T XOR gate and thus have reduced. Still, the circuit

behave correctly with standard deviation of (+/-) 10 % of design parameters with certain

tolerance limits as expected for VLSI circuits. The sub-threshold leakage is reduced by virtue

of reverse biasing and thus, gate leakage power limits to femto-watt (fW) range equal to 701.14

fW which is almost negligible for 6T adder.

3.2.2 Layout Design of Proposed Six Transistor Adder

Figure 3.6, visualizes the layout of full adder in 65-nm technology in Cadence Virtuoso Layout

Editor. It is evident that the interconnect density is lower than that of 8 transistor full adder [24]

leading to low power delay product [69]. The layout is symmetric with the view of having big

sized PMOS on two p-wells and p-wells on n-type substrate.

Figure 3.6. Layout View of Proposed 6T Full Adder

A prime motivation for coming up with the latest researches is to reduce the chip area [4]. The

silicon space used defines the area of any circuit in VLSI design. The number of circuit

interconnections also consumes comparable amount of area. Adders are designed with an effort

to find optimal area complexity making the circuit least expensive. Table 3.5 and Figure 3.7

50

shows comparative study of area for different adders in three distinct technologies. The silicon

area is determined approximately by generating the layout of adder modules with proper

Design Rule Check (DRC) and Layout Versus Schematic (LVS) check. Theoretically and

experimentally, the area of the proposed design is minimum. The trade-off in the silicon area

is how the blocks are placed and how efficiently the routing is done. Based on Table 3.5, one

can easily recognize that the proposed adder with 6T has the smallest chip area with the

inclusion of bias circuit area shown in Figure 2.4.

TABLE 3.5

COMPARATIVE STUDY OF AREA OF DIFFFERENT ADDERS

Types of adder Technology

(nm)

Area

(µm2)

28T (Fig 1.10(a))[29,30] 65 114.519

20T (Fig 1.10(b)) [30] 65 83.723

16T (Fig 1.10(c)) [31] 65 78.723

14T (Fig 1.10(d)) [32] 65 60.083

10T (Fig 1.10(e)) [34] 65 44.208

8T (Fig 1.10(f)) [24] 65 39.214

6T 65 14.517

28T (Fig 1.10(a))[29,30] 90 259.364

20T (Fig 1.10(b)) [30] 90 155.703

16T (Fig 1.10(c)) [31] 90 146.577

14T (Fig 1.10(d)) [32] 90 116.741

10T (Fig 1.10(e)) [34] 90 81.624

8T (Fig 1.10(f)) [24] 90 75.247

6T 90 37.953

28T (Fig 1.10(a))[29,30] 130 290.565

20T (Fig 1.10(b)) [30] 130 195.048

16T (Fig 1.10(c)) [31] 130 179..626

14T (Fig 1.10(d)) [32] 130 127.110

10T (Fig 1.10(e)) [34] 130 93.427

8T (Fig 1.10(f)) [24] 130 85.140

6T 130 45.283

51

Figure 3.7(a) Comparative Analysis of Area of Different Adders at 65-nm Technology

Figure 3.7(b) Comparative Analysis of Area of Different Adders at 90-nm Technology

52

Figure 3.7(c) Comparative Analysis of Area of Different Adders at 130-nm Technology

Figure 3.7. Area (vs) Technology for Different Adder Architectures


The waveforms of Figure 3.4 depicts the output of the full adder. There is voltage degradation

in the waveform for the sum output as a result of cascaded XOR gates. Consequently, relevant

reverse bias value is used for proper operation of adder. Also, level restorers can be used at

different points of the circuit to reduce the effect of voltage degradation and noise [68].

The calculation and comparison details of power, delay and PDP is compiled in Table 3.4. The

power delay product is diminishing from 28 transistor full adder design to 6 transistor full adder

design. The PDP for six transistor adder is found to be as low as 0.849 aJ as compared with

1.095 aJ value for 8T full adder in 65-nm technology. The reduction in PDP is approximately

22.46% from 8T adder to 6T adder. The reduction percentage is reduced from 2T XOR gate to

6T adder due to increase number of transistor leading to increase in complexity and lower

voltage swing as compared to that in 8T full adder which will have a better voltage swing. The

highest value of PDP is 5.711 aJ for 28 transistor adder in 65-nm technology. The power and

delay for different full adder architectures follows similar trend in 65-nm, 90-nm and 130-nm

technologies with least power consumption for proposed 6T full adder design as compared to

other adder designs in literature. Process, voltage and temperature variations improvise the

accuracy of the circuit and is valuable for best and worst case analysis. The thesis also gives

an evaluation on the leakage power of the circuit which is negligible due to reverse back biasing

technique.

Based on Table 3.5, in 65-nm technology, one can easily recognize that the proposed adder

with 6T has the smallest chip area of 14.517 𝜇𝑚2 even with the insertion of bias circuit

(1.114𝜇𝑚2). The area is found to be least equal to 16.745 𝜇𝑚2 for 6T adder as compared with

53

39.214 𝜇𝑚2 for 8T adder. Similar is the trend obtained for all the three technologies reducing

silicon area approximately by 58.57% from 8T adder to 6T adder. This novel adder with

minimum area allows to implement more applications per area thus increasing the VLSI

integration and reducing the die area.

54

Chapter 4

Design of 5:3 Compressor using Novel 2T XOR

Gates

This chapter proposes another arithmetic circuit called 5:3 compressor for low power

multiplication purposes. The architecture utilizes two transistor multiplexer design and novel

two transistor XOR gates for the proposed topology giving least number of transistors for logic

level implementation. The modified and proposed compressor designs reduce the stage delays,

transistor count, PDP, EDP (Energy Delay Product) and silicon area by utilizing the

combinations of XOR-XNOR gates, MUX circuits and transistor level implementation when

compared with the conventional designs. Simulation studies have been carried out in 65-nm,

90-nm, 130-nm technologies in Cadence Spectre. The XOR gate, full adder and multiplier can

be further used for many other applications. Different types of compressors like 6:3, 7:3, 8:3

compressors etc. can also be designed based on the same technique as shown in the chapter

which can be further used for multiplication purposes. The design discussed in this chapter

comes as an application to the proposed 2T XOR gates with additional optimization.

The chapter is formulated as follows: Section 4.1 elaborates the basic operation of compressors.

Section 4.2 explains the proposed model of 5:3 compressors with Section 4.2.1 depicting the

simulation and performance analysis of proposed 5:3 compressors compared with the two other

designs proposed in literature to show the efficiency of the proposed design. Section 4.2.2

presents the schematic of the proposed compressor designs. Section 4.2.3 gives the layout view

of proposed 5:3 compressors for area comparisons and finally, Section 4.2.4 is about results

and discussions on the values obtained.

4.1 What are compressors in VLSI design? A compressor is a combinatorial device based upon the logic of the counter of full adder.

Generally, it is used in the multipliers to reduce the number of operands while adding the terms

of partial products. A typical m: n compressor takes m equally weighted input bits and produces

n-bit binary number [70]. In other words, it counts the number of 1s in the input and outputs

the binary count value. The block diagram of 5:3 compressors is shown in Figure 4.1.

The counter property of this compressor is shown in Table 4.1. The counting limit of this

compressor is zero to five. The block diagram of 6:3 and 7:3 compressors are almost similar

like 5:3 compressor; only one more input to be added to 6:3 compressor and two inputs to be

added to 7:3 compressor. I1 to I5 are the inputs and X1 to X3 are the outputs of 5:3 compressor.

Note that the outputs of the compressor have different power-of-2 weights. The weight of the

LSB (X1) of the compressor output is the same as the weight of each of the inputs, and the

remaining bits have increasingly higher weights.

55

Figure 4.1. Block Diagram of 5:3 Compressor

TABLE 4.1

COUNTER PROPERTY OF 5:3 COMPRESSOR

Input

Conditions

Outputs Decimal

Value X3 X2 X1

All inputs are zero 0 0 0 0

Any one input is one 0 0 1 1

Any two inputs are one 0 1 0 2

Any three inputs are one 0 1 1 3

Any four inputs are one 1 0 0 4

Any five inputs are one 1 0 1 5

4.2 Architecture of Proposed 5:3 Compressors A combinational logic circuit of 5:3 compressor is a topology accepting five inputs and

generating three outputs. The five input bits are summed up to produce three bit output. The

conventional design of 5:3 compressor is an enhanced version of 4:2 compressor [71, 72, 73]

and can have maximum value of 101 when all the three bits are 1. The conventional design of

5:3 compressors are shown in Figure 4.1. Figure 1.11(a) is a straightforward approach which

leads to five stage delays and the up-gradation in Figure 1.11(b) [52] entails three stage delays.

The current work involves 2x1 multiplexer substituting XOR gates at second and third stages

producing output with decreased critical path delay. Moreover, the architecture also has

56

profound role in decreasing the PDP, EDP and area. The design of 5:3 compressor has been

derived by suitably altering the Boolean equations as follows:

𝑂𝑜 = 𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3 ⊕ 𝑥4

= (𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3).𝑥4 +( 𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3). 𝑥4

= [(𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3) + (𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3)]. 𝑥4 +

[(𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3) + (𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3) ] . 𝑥4 (18)

𝑂1=((𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3). 𝑥4 +

(𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3) . 𝑥3 ⊕ ((𝑥0 ⊕ 𝑥1). 𝑥2 (𝑥0 ⊕ 𝑥1). 𝑥0)

(19)

𝑂2=((𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3). 𝑥4 +

𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3) . 𝑥3) ⊕ ((𝑥0 ⊕ 𝑥1). 𝑥2 + (𝑥0 ⊕ 𝑥1). 𝑥0)

(20)

The proposed architectures are based on Equations (18), (19) and (20).

Figure 4.2, is the modified version of 5:3 compressor encountered in Figure 1.11(b) reducing

the critical path delay. Theoretically as explained above, the changes undergone results in a

more efficient design as compared with the earlier designs of 5:3 compressors. And later

experimental simulations concurrently highlight that the radical changes portray an optimized

design with high speed and low power. The W/L ratios are minimum for XOR gates and 5/1

for multiplexer. The reverse-biased voltages are adjusted according to desired logic levels in

the design in different technologies. MUX* is the block incorporating the inverter stage. The

alternative topology for 2x1 multiplexer has been used in the thesis as shown in Figure 4.3 with

A, B as inputs and S as select line which is faster and consumes lesser power than other CMOS

design of multiplexer [34]. The design exploits the advantages of pass transistor logic over

CMOS logic.

57

Figure 4.2. Architecture of Proposed 5:3 Compressor Design

Figure 4.3. Two Transistor 2x1 Multiplexer Design

The XOR gate used for the formulation of the proposed 5:3 compressor architectures is the

proposed design in Chapter 2 of the thesis and given by Figure 2.3. In order to gain

independence of design, they have been implemented in three technologies viz. 65-nm, 90-nm

and 130-nm. The power delay and energy delay product simulations are also carried out which

is found to be less than its peer designs.

4.2.1 Circuit Design of Proposed 5:3 Compressors

The architectures have been designed and simulated in Cadence Spectre in 65-nm, 90-nm and

130-nm technologies. The schematic diagram of modified and proposed design in 90-nm

technology is as shown in Figure 4.4(a) and 4.4(b).

58

Figure 4.4(a). 3T XOR and 2T 2x1 MUX Compressor (Modified Design)

Figure 4.4(b). 2T XOR and 2T 2x1 MUX Compressor (Proposed Design)

Figure 4.4. Schematic View of 5:3 Compressors

59

4.2.2 Simulation and Performance Analysis of Proposed 5:3 Compressor Architectures

With the aim of evaluating and comparing the performance of proposed designs with

previously reported 5:3 compressor in Figure 1.11(b)( 3 transistor XOR and 6 transistor MUX

compressor) [52], exhaustive simulation studies have been carried out with respect to number

of transistors, delay and power dissipation. The circuits are simulated under same testing

conditions and the response of outputs at four different input patterns is studied. All the

assessment and estimation has been executed in 65-nm, 90-nm and 130-nm technologies. The

simulations are carried out at 2.5 MHz frequency with rise and fall times of 50 ps. Computation

of energy delay product is multiplication of PDP by average delay [1]. The simulated outcome

is exhibited in Table 4.2. The simulation analysis is also depicted through histograms in Figure

4.5 to give a clearer vision of the differences in EDP values at different technology for different

compressors in Table 4.2.

TABLE 4.2

COMPARATIVE ANALYSIS OF PERFORMANCE OF DIFFERENT 5:3 COMPRESSORS

Technology

(nm)

Type of Compressor Circuit No of

transistor

s

Delay

(ns)

Power

(nW)

PDP(𝟏𝟎−𝟏𝟖)

(J)

EDP(𝟏𝟎−𝟐𝟕)

(Js)

130 Conventional(Fig 4.2(b))[38] 28 1.371 2070 2837.97 3890.857

130 3T XOR-2T 2x1 MUX

(Fig 4.6(a))

24 0.811 463.710 376.0688 304.991


(Fig 4.6(b))

21 1.207 154.460 186.433 225.024

90 Conventional(Fig.4.2(b))[38] 28 0.870 1305 1135.350 987.754


(Fig 4.6(a))

24 0.578 227.800 131.668 76.104


(Fig 4.6(b))

21 0.721 128.170 92.414 66.630

65 Conventional(Fig.27(b))[36] 28 0.604 1220 736.880 445.075


(Fig 4.6(a))

24 0.477 155.706 74.271 35.427


(Fig 4.6(b))

21 0.553 92.840 51.340 26.984

60

Figure 4.5. EDP (vs) Type of Compressor Circuit in Different Technology

4.2.3 Layout of Proposed 5:3 Compressor Architectures

A contrastive study of silicon area is done for proposed designs and conventional design. The

results obtained are grouped in Table 4.3 and shown diagrammatically through Figure 4.6. The

proposed design has least number of transistor count and it should possess minimum silicon

space during fabrication. This notion is convinced with the layout design as presented in Figure

4.7(a) and Figure 4.7(b) for modified and proposed version respectively. The layout view of

5:3 compressor architectures is in 90-nm technology in Cadence Virtuoso Spectre. The layout

is designed with lowest interconnect density (i.e. routing is kept to approximately minimum)

leading to low power consumption [69]. The layout is built symmetric by placing big sized

PMOS transistors with proper orientation of substrates to cover minimum space without any

errors and following the DRC constraints. The cadence has the flexibility of changing the

orientation of the transistor to give a symmetry as required.

61

TABLE 4.3

COMPARATIVE STUDY OF THE AREA OF DIFFERENT 5:3 COMPRESSORS

Type of design Technology

(nm)

Area

(𝝁𝒎𝟐) Conventional(Fig 4.2(b))[38] 130 278.141

3T XOR-2T 2x1 MUX (Fig 4.6(a)) 130 204.491

2T XOR-2T 2x1 MUX (Fig 4.6(b)) 130 181.249

Conventional(Fig 4.2(b))[38] 90 220.065

3T XOR-2T 2x1 MUX (Fig 4.6(a)) 90 153.960

2T XOR-2T 2x1 MUX(Fig 4.6(b)) 90 131.953

Conventional(Fig 4.2(b))[38] 65 76.497

3T XOR-2T 2x1 MUX (Fig 4.6(a)) 65 61.425

2T XOR-2T 2x1 MUX (Fig 4.6(b)) 65 45.578

Figure 4.6. Area (vs) Type of Compressor Circuit in Different Technology

62

Figure 4.7(a). 3T XOR and 2T 2x1 MUX

Figure 4.7(b). 2T XOR and 2T 2x1 MUX

Figure 4.7. Layout View of Proposed 5:3 Compressors in 90-nm Technology

63


A design of 5:3 compressors using 3T XOR gates and 2T XOR gates has been implemented

combining with 2x1 MUX. The design is simulated and scrutinized in terms of power, delay,

PDP and EDP with exploration of layout view for area estimation.

Table 4.2 indicates that the delay of 3T XOR and 2T 2x1 MUX 5:3 compressor is less as

contrasted with 2T XOR and 2T 2x1 MUX 5:3 compressor, but the power dissipation is

approximately half, giving way to reduced PDP and EDP. The decrement of PDP is

approximately 30% in all the three technologies as compared for modified and proposed 5:3

compressors. The trade-off between power and delay has also been found in other peer designs

of literature [50]. The proposed models have shown remarkable improvement in all fields of

VLSI Design Systems.

Table 4.3 denotes the silicon area evaluation which is least for 2T XOR and 2x1 MUX but 3T

XOR and 2x1 also have less area than the conventional designs in the literature of 5:3

compressors architecture. It also shows that due to improvement of technology from 130-nm

to 65-nm the area, power, delay, PDP and EDP also enhances their values. The smallest area

obtained is 45.578 𝜇𝑚2 for 2T XOR and 2x1 MUX 5:3 compressor and thus, there is 25.79 %

approximate reduction in silicon area compared with 3T XOR and 2x1 MUX. The fabrication

will become fast because of the increase in PMOS transistor rather than CMOS. The

complexity decreases with less number of transistor which also effects the routing space. So,

in lieu with new architecture the technology considerations is also an important criteria. As we

go down for lower technologies, the efficiency of the design is enhanced keeping in mind the

accuracy of implementation. A genuine trade off should be maintained for design parameters.

Hence, this model of proposed design can be incorporated efficiently in many design system

like CPU or DSPs to increase the overall performance of the system. Starting with smaller

modules, a more complex module can be created with eminent results. This can change the

emerging trend in VLSI industry and give a way for new researches.

64

Chapter 5

Design of 8 Bit x 8 Bit Multiplier using Novel

2T XOR Gates

The chapter explores the essence of novel design of 8 bit x 8 bit multiplier using two transistor

XOR gates and six transistor full adder. It takes the thesis to a next level of arithmetic operation

which can be further utilized in industrial microprocessors and digital signal processors (DSPs).

The design has been contrasted with other multipliers available in literature (multipliers formed

by using the different count XOR gates and adders in Chapter 2 and Chapter 3) in terms of

power, delay, Power Delay Product (PDP) and area. The idea has been broadened with the

application of two transistor XOR gates, six transistor adders and 8 bit x 8 bit multiplier for the

conception of 8 bit Multiply-Accumulate (MAC) unit in 65-nm technology. The comparisons

are made in Cadence Virtuoso Spectre in UMC 65-nm, 90-nm and 130-nm technologies.

The chapter is organized as follows: Section 5.1 explains the basic operation of multipliers in

a general sense. Section 5.2 proposes 8 bit x 8 bit multiplier design concept with sub-section

5.2.1 constituting the working of array multiplier. Section 5.2.2 giving details of simulation

and performance analysis of proposed multiplier design compared with multiplier designed by

adder design logic style in literature. Section 5.2.3 presents the layout of 8 bit x 8 bit multiplier

architecture and section 5.2.4 gives an overview of Multiply-Accumulate (MAC) unit. Section

5.2.5 briefly explains a module of MAC i.e. registers/accumulators (True Single Phased

Clocked Register (TSPC)) and Section 5.2.6 gives the conclusion and result of overall chapter.

5.1 What is a Multiplier?

Multiplier is a circuit in VLSI domain that performs multiplication operation. Multiplication in

terms of mathematical operation is an abbreviated process of adding an integer to itself to a

given number of times [74]. Multiplication in its most basic aspect is the product of two binary

numbers namely multiplicand and multiplier. At elementary level, the multiplication operation

is performed by placing multiplicand on top of multiplier. The result is obtained by multiplying

each digit in multiplier with multiplicand beginning with the least significant digit (LSD). The

initial stage involves partial product generation which are compressed through compressors to

generate the product matrix. The intermediate results (partial products) are placed offset by one

and placed one atop the other for alignment of digits of same weight. The final product matrix

is determined by summation of all intermediate results. The multiplication technique equally

applies to all base including binary. In general way, the basic data flow mechanism for

multiplication technique is described in Figure 5.1 with each black dot for each digit.

65

Figure 5.1. Basic Multiplication

5.1.1 Multiplication Algorithm

Multiplication is a simple operation in digital electronics. The classical algorithm dictates

multiplication of two binary number with the help of flowchart in Figure 5.2 where Most

Significant Bit (MSB) represent the sign of the digit. The algorithm also dedicates the

multiplication of n-bit multiplicand with m-bit multiplier to generate m partial product and n +

m bits of product matrix in an array form as shown in Figure 5.3.

Y= Yn-1 Yn-2 ….....................Y2 Y1 Y0 Multiplicand

X= Xn-1 Xn-2….................. X2 X1 X0 Multiplier

Figure 5.2. Signed Multiplication Algorithm

66

Figure 5.3. Product Matrix

The equation for addition is:

P (m + n) = Y (m). X (n) = ∑ 𝑦𝑖𝑚−1𝑖=0 ∑ 𝑥𝑗

𝑛−1𝑗=0 2𝑖+𝑗 (21)

Where P represents the products, 𝑦𝑖 is 𝑖𝑡ℎbit of multiplier and 𝑥𝑗 is 𝑗𝑡ℎ bit of multiplicand.

Generally, add and shift operation is performed by the product matrix. Thus, multiplication

involves mainly three steps [45, 46, 53]:

1. Generation of partial product

2. Accumulation of shifted partial product and its reduction

3. Final addition

The AND gates are used to generate partial products and the way these partial products are

generated and summed up. The logical AND operation is followed by decomposition of

multiplication into addition operation.

As 8 bit x 8 bit multiplier is proposed in the thesis, a multiplication example of two 8 bit

numbers A and B to produce a 16 bit product P is shown in Figure 5.4.

Figure 5.4. Example: Multiplication of 8 bit x 8 bit Binary Numbers

67

The above matrix shows clearly that the multiplication has been commutated to addition of

binary numbers and thus exhibits three phases:

1. Add the multiplicand to an accumulator if the Least Significant Bit (LSB) of multiplier

is ‘1’.

2. Shift multiplicand to one bit left and multiplier to one bit right.

3. The operation is halted when all the bits of multiplier is zero.

A serial adder with least hardware gets implemented when the partial products are added

serially. The partial product when added using one combinational circuit forms a parallel

multiplier. However, different compression techniques can be exploited for reduction of partial

product to evolve with different kind of multiplier.

5.2 Design of Proposed 8 Bit x 8 Bit Multiplier An 8 bit x 8 bit multiplier has been implemented using proposed 6T adder. The result of

multiplication is obtained by multiplying two 8 bit numbers in a traditional array architecture

as shown in Figure 5.4 to get the desired 16 bits output. Array multiplier is proposed to achieve

low power and high speed multiplication operation with lesser hardware cost.

5.2.1 Array Multiplier

Array multiplier is a multiplier with traditional structure. The architecture is regular and

performs operation by repeated addition and shifting procedure. The algorithm of array

multiplier dictates multiplication of multiplicand bits with one bit of multiplier starting from

its Least Significant Bit (LSB). Then, shifting is done according to the bit sequences. The

structure is organized by several stages of AND gates and full adder cells. For the

accomplishment of multiplication of N bits, 2N adders and 2N AND gates are required. If A,

B are the multiplicand and multiplier binary numbers respectively, then P denotes the product

and intermediate results are partial products. 𝑆𝑖 and 𝐶𝑖 represents the 𝑖𝑡ℎ stage input sum and

carry to be given to other block. 𝑆0and 𝐶0 represents the output sum and carry at a particular

stage. The logic style of array multiplier is exhibited in Figure 5.5(a).

Each individual block is made using AND gate and full adder as shown in Figure 5.5(b). The

Boolean equations governing the working of array multiplier are as indicated:

𝑃 = 𝐴. 𝐵 (22)

𝑆0 = 𝑆𝑖 ⊕ 𝑃 ⊕ 𝐶𝑖 (23)

𝐶0 = 𝑆𝑖. 𝑃 + 𝐶𝑖(𝑆𝑖 ⊕ 𝑃) (24)

5.2.2 Simulation and Performance Analysis of Proposed 8 Bit x 8 Bit Multiplier

The proposed 8 Bit x 8 Bit multiplier has been designed using the combination of 64 six

transistor adders and 64 AND gates in a symmetric array matrix form as depicted by Figure

5.5(a) to generate the product term. The performance of the proposed 8 bit x 8 bit multiplier

has been analysed and compared with 8 bit x 8 bit multipliers designed with adder designs

available in literature [24, 29, 30, 31, 32, 33]. The adders used are the same shown in Chapter

3 from 28T to 6T. A comparison has been made with respect to power, delay and Power Delay

Product (PDP) in 65-nm, 90-nm and 130-nm technologies in Cadence Spectre in Table 5.1 and

68

Figure 5.6. For uniqueness and comparison of model, simulations are executed in three

different technologies. The simulations are carried out at 50 MHz frequency with 50 ps rise

and fall times. The results indicate that PDP of multiplier employing six transistor adder is the

lowest in all the three technologies. The process corner analysis has also been performed for

all process corners with (+/-) 10% variations in bias voltage and temperature.

Figure 5.5(a). An 8 bit x 8 bit Array Multiplier

Figure 5.5(b). Basic Building Block

Figure 5.5. Array Multiplier Architecture

69

TABLE 5.1

PERFORMANCE ANALYSIS OF 8 BIT x 8 BIT MULTIPLIER USING DIFFERENT

ADDERS

Type of adder used in

multiplier

Technology

(nm)

Avg.

power

(𝝁𝑾)

Avg.

delay

(ps)

PDP

(𝟏𝟎−𝟏𝟖𝑱)

28T (Fig 1.10(a))[29,30] 65 42.600 198.557 8458.528

20T (Fig 1.10(b)) [30] 65 36.500 180.468 6587.082

16T (Fig 1.10(c)) [31] 65 27.460 140.083 3846.679

14T (Fig 1.10(d)) [32] 65 24.570 136.774 3360.537

10T (Fig 1.10(e)) [34] 65 23.630 122.866 2903.323

8T (Fig 1.10(f)) [24] 65 20.140 118.018 2376.882

6T 65 15.220 121.816 1854.039

28T (Fig 1.10(a))[29,30] 90 82.120 339.278 27861.509

20T (Fig 1.10(b)) [30] 90 80.470 330.829 26621.809

16T (Fig 1.10(c)) [31] 90 50.290 315.788 15880.978

14T (Fig 1.10(d)) [32] 90 48.460 285.399 13830.435

10T (Fig 1.10(e)) [34] 90 45.900 220.926 10140.503

8T (Fig 1.10(f)) [24] 90 32.168 207.338 6669.648

6T 90 29.060 216.472 6290.676

28T (Fig 1.10(a))[29,30] 130 228.132 392.806 89611.618

20T (Fig 1.10(b)) [30] 130 223.800 364.718 81623.888

16T (Fig 1.10(c)) [31] 130 201.840 306.412 61846.198

14T (Fig 1.10(d)) [32] 130 148.440 303.844 45102.603

10T (Fig 1.10(e)) [34] 130 144.960 301.091 43646.151

8T (Fig 1.10(f)) [24] 130 105.336 258.312 27209.552

6T 130 96.420 264.900 25541.658

70

Figure 5.6(a) Comparative Analysis of PDP of Different Multipliers at 65-nm Technology

Figure 5.6(b) Comparative Analysis of PDP of Different Multipliers at 90-nm Technology

Figure 5.6(c) Comparative Analysis of PDP of Different Multipliers at 130-nm Technology

Figure 5.6. PDP (vs) Technology of Different Multiplier Architectures

71

5.2.3 Layout Design of Proposed 8 Bit x 8 Bit Multiplier

The layout of 8 bit x 8 bit multiplier using proposed 6T adder has been designed using 65- nm

technology is shown in Figure 5.7. The individual array blocks have been positioned in such a

way that the complexity of interconnection is reduced and layout is symmetric. The array

multiplier used gives smaller and regular layout. This leads to robustness and packed design.

The layout is also free from any DRC errors in Cadence virtuoso ASSURA verification suite.

A comparative study on the silicon area for 8 bit x 8 bit multiplier employing variable transistor

count adders is shown in Table 5.2 and Figure 5.8. The Table clearly depicts that the multiplier

with 6T adder implementation has the least area, thus accounting for more applications on a

chip.

Figure 5.7. Layout Design of Proposed 8 Bit x 8 Bit Multiplier

72

TABLE 5.2

COMPARATIVE STUDY OF AREA OF 8 BIT x 8 BIT MULTIPLIER USING

DIFFERENT ADDERS

Type of adder used in

multiplier

Technology

(nm)

Area

(µm2)

28T (Fig 1.10(a))[29,30] 65 11740.087

20T (Fig 1.10(b)) [30] 65 8411.571

16T (Fig 1.10(c)) [31] 65 8264.371

14T (Fig 1.10(d)) [32] 65 6466.196

10T (Fig 1.10(e)) [34] 65 4276.463

8T (Fig 1.10(f)) [24] 65 3735.890

6T 65 1581.069

28T (Fig 1.10(a))[29,30] 90 26668.557

20T (Fig 1.10(b)) [30] 90 18246.264

16T (Fig 1.10(c)) [31] 90 18107.889

14T (Fig 1.10(d)) [32] 90 13507.574

10T (Fig 1.10(e)) [34] 90 12504.680

8T (Fig 1.10(f)) [24] 90 11535.308

6T 90 6253.650

28T (Fig 1.10(a))[29,30] 130 28488.613

20T (Fig 1.10(b)) [30] 130 21300.440

16T (Fig 1.10(c)) [31] 130 20619.317

14T (Fig 1.10(d)) [32] 130 18231.370

10T (Fig 1.10(e)) [34] 130 15272.972

8T (Fig 1.10(f)) [24] 130 12391.646

6T 130 7719.373

73

Figure 5.8(a) Comparative Analysis of Area of Different Multipliers at 65-nm Technology

Figure 5.8(b) Comparative Analysis of Area of Different Multipliers at 90-nm Technology

Figure 5.8(c) Comparative Analysis of Area of Different Multipliers at 130-nm Technology

Figure 5.8. Area (vs) Technology of Different Multiplier Architectures

74

5.2.4 Overview of Design of Multiply and Accumulate (MAC) Unit

The next level of design advances to the design of Multiply and Accumulate (MAC) unit where

Multipliers and Adders are fundamental component in the design as shown in Figure 5.9 [75,

76]. To achieve high performance digital signal processing system for computationally

intensive application and real-time signal processing, a high speed and high throughput MAC

is required. The major criteria in the design of MAC unit over the last few years is speed and

power consumption. Generally for personal communication, low power designs are preferred.

The simulation results of adders in Chapter 3 and results of multiplier in Section 5.2.2 clearly

indicates the improvement in overall performance of the proposed designs in terms of power,

PDP and area. Hence, the proposed architecture is useful for the implementation of Multiply-

Accumulate (MAC) unit for high speed and low power, accounting for minimal area on the

chip.

A typical MAC Unit has three sub units: namely multiplier, adder and accumulator register.

Multiplier finds the various partial products involved. Adder adds up the values of those partial

products generated and saves them in the accumulator register. Figure 5.9 depicts the MAC

architecture when the two binary inputs have N bits and thus depicts a general design which

can be implemented for any number of bits. The two N-bit input is given to the multiplier which

generates 2N outputs. The 2N input is given to the adder for computations. The output of adder

is N+1 bits i.e. one bit is for the carry (N bits+ 1 bit). Then, the output is given to the

accumulator register. The accumulator register used in this design is True single phased clock

register. The output of the accumulator register is taken out or fed back as one of the input to

the carry save adder.

Figure 5.9. Basic Multiply and Accumulate (MAC) Unit

75

Therefore, the thesis presents an implementation of MAC unit in 65-nm technology to show

the application of proposed models at higher level of abstraction. The MAC unit can be used

for industrial purposes for the manufacture of DSP chips with reduced silicon area and

enhanced performance. An 8 bit Multiply and Accumulate (MAC) unit has been simulated in

65-nm technology accounting 8 bit x 8 bit multipliers, 6T adders and registers. A True Single

Phased Clocked Register (TSPCR) as depicted in Figure 5.10 is used as an accumulator/register

unit for implementation of MAC design [1].

5.2.5 True Single Phased Clocked Register (TSPCR)

The True Single Phased Clocked Register (TSPCR) logic integrates basic single phased

positive and negative latches as shown in Figure 5.11. The main aim to use TSPCR is to avoid

clock skew (phenomena in synchronous circuits in which clock signals arrive at different

components at different times). Thus, it eradicates two phase clocking scheme and applies a

single phase clock. For the positive latch, when the clock CLK is high, the latch enters into

transparent mode of operation forming two cascaded inverters; the latch is non-inverting, and

propagates the input (IN) to the output (OUT). On the contrary, when CLK is low, both the

inverters are disabled, and the latch is in hold mode. The pull down circuits are deactivated but

the pull up circuits are still active. Since, the circuit has dual cascaded network, no signal

propagates from the input of the latch to the output. A register is constructed by cascading

positive and negative latches. The load capacitances are C1= 1fF and C2= 0.5fF. The difference

in value of capacitances is due to loading effect of cascaded stages. The TSPC proposes many

advantages like reducing the delay overhead associated with the latches but has a slight

disadvantage of increase in number of transistors.

Figure 5.10. True single Phased Clock Register (TSPCR)

76

Figure 5.11(a). Positive Latch Figure 5.11(b). Negative Latch

Figure 5.11. Positive and Negative Latches


An 8 bit x 8 bit multiplier has also been implemented using the design of 6T adder and its

performance has been analysed and compared with similar multipliers designed with peer

adders design available in literature. The Power Delay Product (PDP) of the proposed

multiplier has been found to be as low as 1.854 pJ when compared with immediate multiplier

using 8T adder having 2.376 pJ PDP in 65-nm technology. So, overall decrement or

optimization of PDP in all technologies is 28.2%. The proposed 8 Bit x 8 Bit multiplier has

lesser percentage of reduction in terms of power consumption as compared with the power

consumed by multiplier employing 8T full adder. This is because, inverters are used at higher

level of abstraction to increase the voltage swing which was drawback of cascaded 2T XOR

gate and 6T full adder. But since, very small amount of power dissipation occurs at lower level,

this trade-off can be handled well at higher levels. Other than that, a delay of 3.977-ns and

power dissipation of 1.107-mW is realized from the realization of MAC unit in 65-nm

technology.

Based on Table 5.2, it is evident that silicon area is least for the proposed design in all the three

technologies. The silicon area estimated for multiplier with 6T adder is 1581.069 𝜇𝑚2and with

8T adder is 3735.890 𝜇𝑚2, thus, giving approximately half the area of the existing multiplier.

Area is also a prime concern as it defines the overall cost of the system. So, reduction of area

at the basic level will lead to its benefit for building bigger modules and also more functions

can be designated on single chip favouring cheap investment.

The next chapter of this thesis focuses on conclusions we have drawn from all the experiments

performed.

77

Chapter 6

Conclusion

To err is digital, to forgive human

- Jonathan Fahey, Forbes Magazine

The thesis presents the simulations and performance analysis of proposed two transistor XOR

gate, six transistor full adder, 5:3 compressor designs and 8 bit x 8 bit multiplier in three

technologies viz. 65-nm, 90-nm and 130-nm. A multiply-accumulate (MAC) is also simulated

in 65-nm technology which forms the basic unit of Digital signal Processors (DSPs).The

designs have been simulated in Cadence Virtuoso in UMC technologies.

6.1 Summary of Present Work

The thesis presented the design of a high performance 8 bit x 8 bit multiplier based on the

design of a novel 2T XOR gate which is the XOR gate with smallest transistor count designed

so far. The XOR gate implementation is compared extensively with its peer design in terms of

power-delay product and silicon area. The power-delay product is found to be least with noise

margin comparable with other designs of XOR gates available in literature. The six transistor

full adder designed with XOR gates also has smallest transistor count and minimum power

delay product. Thus, it results into better performance compared with other adders in the

literature. The design simulations of multiplier as well as 6T adder and 2T XOR gate works

well up to 2 GHz frequency. To understand and verify the amount of silicon area required for

the designs, layout of the designs have been done and checked effectively for all the errors.

The current work also shows another arithmetic circuit using two transistor XOR gates in the

design, simulation and layout view of novel 5:3 compressors. The work encompasses in

implementation of compressors by engaging multiplexers superseding the XOR gates. Thus, it

leads to the reduction of critical path delay and reducing transistor count by employing novel

2T XOR gates. The design utilizes least number of transistors for the logic level

implementation of compressors in different technologies and comparison is done at all levels-

delay, power, PDP, EDP and area. The layout has also been designed and simulated. Further,

5:3 compressors can be used for the design of other arithmetic circuits with greater advantages

and also the idea can be employed for the implementation of other processors.

An application of the proposed work has also been depicted through 8 bit x 8 bit multiplier

which can be further applied for implementation of Multiply-Accumulate (MAC) units of

digital signal processors. An implementation in 65-nm technology for MAC unit has been done

78

which forms the fundamental unit for Digital Signal Processors (DSPs) architectural operations

and Application-Specific-Integrated-Circuit (ASICs).

6.2 Limitations of thesis work

As the coin has two sides, everything has advantages and disadvantages. So, there are some

limitations adhered to the work presented in the thesis.

The novel 2T XOR gate reduces power consumption more than 50% as compared to 3T XOR

gate but the delay increases. Going to higher level of implementation, the bigger modules will

involve voltage restorer circuits consuming extra power dissipation. Therefore, the trade-off

obtained to get 75.73% reduction in PDP for 2T XOR gate reduces to 28.2% when implemented

in the multiplier. So, apparently it is more beneficial to use 2T XOR gate and 6T adder modules

because going up the hierarchy, better optimization considerations have to be met in terms of

power consumption, delay, PDP and area. The design of high-density chips in MOS VLSI

(Very Large Scale Integration) technology requires that the packing density of MOSFETs used

in the circuits is as high as possible and, consequently, that the sizes of the transistors are as

small as possible. The device geometry is kept at minimum for 2T XOR gate operation. The

width of devices can be increased further for better voltage swing at the cost of silicon area.

The frequency of operation is 50 MHz and choosing a higher frequency in GHz range will

show differences in voltage levels, power consumption and add to glitches. Accordingly,

proper choices come into picture for utilization of resources at lower level and make higher

level more effective. It means that the amount of power saved at 2T XOR gate can go up for

bigger implementation like MAC unit at higher frequency of operation. The substrate biasing

applied to the back gate of PMOS in 2T XOR gate should be optimum for the voltage levels to

be obtained correctly. This is handled by the optimum threshold voltage values of different

technologies. A logical ‘0’ value below threshold and logical ‘1’ value above threshold should

be maintained. Therefore, substrate biasing can only be played with to an extent.

The technology plays an important role for any digital circuits. The evolution in technology is

leading to smaller nanometre technologies to achieve more functions on a single chip with least

area. But as the device dimensions are systematically scaled down, various physical limitations

(short channel effects) like velocity saturation, change in threshold voltage, subthreshold

conduction, leakage current etc. come into picture and ultimately restrict the amount of feasible

scaling for some device dimensions. It is expected that the operational characteristics of the

MOS transistor will change with the reduction of its dimension. Also, some physical limitations

eventually restrict the extent of scaling that is practically achievable. Scaling of MOS

transistors is concerned with systematic reduction of overall dimensions of the devices as

allowed by the available technology, while preserving the geometric ratios found in the larger

devices. The proportional scaling of all devices in a circuit would certainly result in a reduction

of the total silicon area occupied by the circuit, thereby increasing the overall functional density

of the chip. Thus, to obtain the functionality of circuits special considerations have to be

accounted for optimizing the performance of the circuits designed when going below 65 nm

technology.

79

6.3 Future work

Future work can be focussed on implementing more digital arithmetic circuits using the

proposed novel designs. Additional research work could be spend on minimising the power

and obtaining better voltage swings for adder. An effort to increase the drive strength of 2T

XOR gates can be made by introducing new techniques and solutions. The future work can

also be focussed upon increasing the frequency of operation of the proposed design for

effective use in gigahertz range. Design of DSP chips and high performance processors are also

the future aspects of the design of arithmetic circuits.

A major drive for further research is the promising substitute to CMOS by FINFET technology

which are double gate devices. This eventually will lead to continued technology scaling below

65-nm by overcoming fundamental material and process technology limits in efficient way.

Below 65-nm technology, short channel effects like subthreshold, channel length modulation,

velocity saturation, drain punch through, impact ionization and mobility variations starts

playing a dominant effect. Thus, FINFETs are innovative MOS device structure which gives

superior performance because they are less effected by short channel effects. It is so because

of the way FINFETs are fabricated having thin body structure which control short channel

effects and supress leakage by keeping the gate capacitance in closer proximity to the whole of

the channel. FINFETs have been efficiently used in literature to design XOR gates up to three

transistor and thus efforts can be made to design a two transistor XOR gate.

80

List of Relevant Publications

Published

Himani Upadhyay, Shubhajit Roy Chowdhury, “Design of high speed and

low power 5:3 compressor architectures using novel two transistor XOR

gates” , International Journal of Electronics, Electrical and Computer

Systems(IJEECS), ISSN (Online): 2347-2820, Volume -2, Issue-7, 2014.

Himani Upadhyay, Shubhajit Roy Chowdhury, “A High Performance 8 Bit

x 8 Bit Multiplier Design using Novel Two Transistor(2T) XOR gates” ,

Journal of Low Power Electronics(JOLPE), Volume 11, Number 1, March

2015, pp. 37-48(12)

http://www.ingentaconnect.com/content/asp/jolpe;jsessionid=29x53tml2npao.victoria

81

Bibliography

1) Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, “Digital Integrated Circuits”,

A Design Perspective, Second Edition, Prentice Hall Electronics and VLSI Series,

2012.

2) F. Faggin, M.E. Hoff, Jr, H. Feeney, S. Mazor, M. Shima, “ The MCS-4 – An LSI

Micro-computer System,” 1972 IEEE Region Six Conference Record, San Diego, CA,

pp. 1-6, April 1972.

3) M. Shima, F. Faggin and S. Mazor, “ An N-Channel, 8-bit Single-Chip

Microprocessor,” ISSCC Digest of Technical Papers, pp. 56-57, Feb.1974.

4) Gordon E. Moore, “Cramming more components onto integrated circuits”, Electronics,

Volume 38, Number 8, April 19, 1965

5) M. Hosseinzadeh, S.J. Jassbi, and Keivan Navi, “A Novel Multiple Valued Logic

OHRNS Modulo 𝑟𝑛 Adder Circuit”, International Journal of Electronics, Circuits and

Systems, Vol. 1, No. 4, fall 2007, pp. 245-249

6) Neil H.E. Weste, CMOS VLSI Design Circuits & Systems Perspective, Addsion

Wesley, 3rd Edition, 2005.

7) A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design. Norwell,

MA: Kluwer, 1995.

8) Manoj Kumar, Sandeep K. Arya, Sujata Pandey, “Single bit full adder design using 8

transistors with novel 3 transistors XNOR gate,” International Journal of VLSI Design

& Communication Systems, Vol. 2, pp. 47-59, Dec. 2011.

9) R. Zimmermann and R. Gupta, “Low-power logic styles: CMOS versus CPL,” in Proc.

22nd European Solid-State Circuits Conf., Neuchâtel, Switzerland, Sept. 1996, pp.

112–115.

10) J. Yuan and C. Svensson, “New single-clock CMOS latches and flip-flops with

improved speed and power savings,” IEEE J. Solid-State Circuits, vol. 32, pp. 62–69,

Jan. 1997.

11) Y. Leblebici, S.M. Kang, “CMOS Digital Digital Integrated Circuits”, Singapore: Mc

Graw Hill, 2nd edition, 1999, Ch. 7

12) D. Radhakrishnan, “Low-voltage low-power CMOS full adder,” in Proc. IEEE Circuits

Devices Syst., vol. 148, Feb. 2001, pp. 19-24.

13) J. Wang, S. Fang, and W. Feng, “New efficient designs for XOR and XNOR functions

on the transistor level,” IEEE J. Solid-State Circuits,vol. 29, no. 7, Jul. 1994, pp. 780–

786.

14) H. T. Bui, A. K. Al-Sheraidah, and Y.Wang, “New 4-transistor XOR and XNOR

designs,” in Proc. 2nd IEEE Asia Pacific Conf. ASICs, 2000, pp.25–28.

15) H.T. Bui, Y. Wang, A. K. Al-Sheraidah, “Design and analysis of 10-transistor full

adders using novel XOR–XNOR gates,” in Proc. 5th Int. Conf. Signal Process., vol. 1,

Aug. 21–25, 2000, pp. 619–622.

82

16) H. T. Bui, Y. Wang, and Y. Jiang, “Design and analysis of low-power 10-transistor full

adders using XOR-XNOR gates,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal

Process, vol. 49, no. 1, Jan. 2002, pp. 25–30.

17) A. M. Shams, T. K. Darwish, and M. A. Bayoumi, “Performance analysis of low-power

1-bit CMOS full adder cells,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.

10, no. 1, Feb. 2002, pp. 20–29.

18) K.-H. Cheng and C.-S. Huang, “The novel efficient design of XOR/XNOR function for

adder applications,” in Proc. IEEE Int. Conf. Elect, Circuits Syst., vol. 1, Sep. 5–8,

1999, pp. 29–32.

19) H. Lee and G. E. Sobelman, “New low-voltage circuits for XOR and XNOR,” in Proc.

IEEE Southeastcon, Apr. 12–14, 1997, pp. 225–229.

20) M. Vesterbacka, “A 14-transistor CMOS full adder with full voltage swing nodes,” in

Proc. IEEE Workshop. Signal Process. Syst., Oct. 20–22, 1999, pp. 713–722.

21) Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha roy, and Hiranmay Saha, “A

High Speed 8 Transistor Full Adder Design using Novel 3 Transistor XOR Gates”,

International Journal of Electronics, Circuits and Systems, WASET Fall, (2008)

22) Tripti Sharma, K.G.Sharma, B.P.Singh and Neha Arora, “New Efficient Design for

XOR Function on the Transistor Level”, International Conference on Methods and

Models in Science and Technology, 2010 American Institute of Physics.

23) Ahmed M. Shams and Magdy A, “A structured approach for designing low power

adders,” Conference Record of the Thirty-First Asilomar Conference on Signals,

Systems & Computers, vol.1, pp.757-761, Nov. 1997.

24) R. Zimmermann, and W. Fichtner, “Low-power logic styles: CMOS versus pass-

transistor logic,” IEEE J. Solid State Circuits, vol. 32, no. 7, pp. 1079-1090, Jul. 1997.

25) N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, a System Perspective,

Addison-Wesley, 1993.

26) N. Zhuang and H. Wu, “A new design of the CMOS full adder,” IEEE J. Solid-State

Circuits, vol.27, no. 5, pp. 840–844, May 1992.

27) A. M. Shams and M. Bayoumi, “A novel high-performance CMOS1-bit full adder cell,”

IEEE Transaction on Circuits Systems II, Analog Digital Signal Process, vol. 47, no. 5,

pp. 478–481, May 2000.

28) Yingtao Jiang Al-Sheraidah, A. Yuke Wang Sha, E. and Jin-Gyun Chung, “A novel

multiplexer based low-power full adder,” IEEE Transactions on Circuits and Systems:

Express Briefs, vol. 51, no.7, pp.345-348, Jul. 2004.

29) R. Shalem, E. John, and L. K. John, “A novel low-power energy recovery full adder

cell,” in Proc.Great Lakes Symposium on VLSI, pp. 380–383, Feb. 1999.

30) A. Fayed and M. Bayoumi, “A low-power 10-transistor full adder cell for embedded

architectures,” in Proc. IEEE Symp. Circuits Syst., Sydney, Australia, May 2001, pp.

226–229.

31) J.F. Lin, Y.T.Hwang, M.H. Sheu, C.C. Ho, “A novel high speed and energy efficient

10 transistor full adder design”, IEEE Trans. Circuits Syst. I, Regular papers, Vol. 54,

No.5, May 2007, pp. 1050-1059.

83

32) S. Goel. A. Kumar, M. A. Bayoumi, “Design of robust, energy –efficient full adders for

deep sub micrometre design using hybrid-CMOS logic style,” IEEE Transactions on

Very Large Scale Integration (VLSI) Systems, vol.14, no.12, pp.1309-1321, Dec. 2006.

33) Zhang, M., J. Gu and C.H. Chang, “A novel hybrid pass logic with static CMOS output

drive full adder cell,” IEEE Int. Symposium on Circuits Systems, vol. 5, pp. 317-320,

May 2003.

34) G.A. Ruiz, M. Granda, “An area-efficient static CMOS carry-select adder based on a

compact carry look-ahead unit”, Microelectronics Journal, Vol. 35, No. 12, 2004, pp.

939-944.

35) Z. Wang, G. A. Jullien, and W. C. Miller, “A new design technique for column

compression multipliers,” IEEE Trans. Comput., vol. 44, pp. 962–970, Aug. 1995.

36) Milos Ercegovac, Tomas Lang, "Digital Arithmetic”, Morgan Kaufman, 2004.

37) I. Koren, Computer Arithmetic Algorithms. Englewood Cliffs, NJ, Prentice Hall, 1993.

38) Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha Roy, Hiranmay Saha,” Design,

Simulation and Testing of a High Speed Low Power 15-4 Compressor for High Speed

Multiplication Applications” First International Conference on Emerging Trends in

Engineering and Technology. 434 – 438, 2008.

39) K. Prasad and K. K. Parhi, “Low-power 4-2 and 5-2 compressors,” in Proc. of the 35th

Asilomar Conf. on Signals, Systems and Computers, vol. 1, 2001, pp. 129–133.

40) C. H. Chang, J. Gu, M. Zhang, “Ultra low-voltage low-power CMOS 4-2 and 5-2

compressors for fast arithmetic circuits” IEEE Transactions on Circuits and Systems I:

Regular Papers, Volume 51, Issue 10, Oct. 2004 Page(s):1985 – 1997

41) Ma GK, Taylor FJ (1990). Multiplier policies for digital signal processing. IEEE

ASSP., 7(1): 6-20.

42) S.-N. Tang, J.-W. Tsai, and T.-Y. Chang, “A 2.4-GS/s FFT processor for OFDM-based

WPAN applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 6, pp. 451–

455, Jun. 2010.

43) V. Gowrishankar, D. Manoranjitham and P. Jagadeesh, “Efficient FIR filter design

using modified carry select adder & Wallace tree multiplier”, International Journal of

Science, Engineering and Technology Research, Vol. 2, pp. 703-711, March 2013.

44) D. Radhakrishnan, A.P. Preethy, “Low Power CMOS pass logic 4-2 compressor for

high speed multiplication”, Proceedings of 43rd IEEE Midwest Symposium on Circuits

and Systems, Vol. 3, 2000, pp. 1296-1298.

45) S.F. Hsiao, M.R. Jiang, J.S. Yeh, “Design of high low power 3-2 counter and 4-2

compressor for fast multipliers”, Electronic Letters, Vol. 34, No. 4,1998, pp. 341-343.

46) S. O'uchi, K. Sakamoto, K. Endo, M. Masahara, T. Matsukawa, Y.X. Liu, M. Hioki, T.

Nakagawa, T. Sekigawa, H. Koike and E. Suzuki, “Variable-Threshold-Voltage

FinFETs with a Control-Voltage Range within the Logic-Level Swing Using

Asymmetric Work-Function Double Gates,” in VLSI Technology, Systems and

Applications, 2008.

47) M. C. Wang, “Independet-Gate FinFET Circuit Design Methodology”, International

Journal of Computer Science, 37:1. Feb. 2010.

48) L. Dadda, “Some Schemes for Parallel Multiplier,” Alta Freq,vol. 34,1965, pp. 349–

356.

84

49) C.S.Wallace, “A Suggestion for a Fast Multiplier,” IEEE Transon Electronic

Computers, vol. EC-13, pp. 14–17, 1964.

50) P. Balasubramanian, R.T. Naayagi, “Critical Path Delay and Net Delay Reduced Tree

Structure for Combinational Logic Circuits”, International Journal of Electronics,

Circuits and Systems, Vol. 1, No.1, 2007, pp. 19-29.

51) J. B. Burr and A. M. Peterson, “Ultra low power CMOS technology,”NASA VLSI

Design Symposium. 1991, pp. 4.2.1–4.2.13.

52) R. Gonzalez, B. M. Gordon, and M. A. Horowitz, “Supply and threshold voltage scaling

for low power CMOS,” IEEE J. Solid-State Circuits, vol.32, pp. 1210–1216, Aug.

1997.

53) Y. Berg and T. S. Lande, “Programmable floating-gate mos logic for low-power

operation,” in Proc. IEEE ISCAS, Hong Kong, June 1997, pp. 1792–1795.

54) Shiv Shankar Mishra, Adarsh Kumar Agrawal and R.K. Nagaria, “A comparative

performance analysis of various CMOS design techniques for XOR and XNOR

circuits”, International Journal on Emerging Technologies 1(1): 1-10(2010) ISSN:

0975-8364.

55) RADHAKRISHNAN, D., WHITAKER, S.R., and MAKI, G.K.: Formal design

procedures for pass transistor switching circuits’, IEEE J. Solid-State Circuits, 1985,

SC-20, pp. 53 1-536.

56) P. Gaubert, A. Teramoto, W. Cheng, and T. Ohmi, “Relation between the mobility, 1/f

noise, and channel direction in MOSFETs fabricated on (100) and (110) silicon-

oriented wafers,” IEEE Trans. Electron Devices, vol. 57, no. 7, pp. 1597–1607, Jul.

2010.

57) K. K. Hung, P. K. Ko, C. Hu, and Y. C. Cheng, “A unified model for the flicker noise

in metal-oxide-semiconductor field-effect transistors,” IEEE Trans. Electron Devices,

vol. 37, pp. 654–665, 1990.

58) Y. Tsividis, Mixed Analog-Digital VLSI Devices and Technology, Singapore: McGraw

Hill, 1st edition, 1996.

59) Behzad Razavi, “Design of Analog CMOS Integrated Circuits”, Tata McGraw Hill

Edition, 2002.

60) Amrita Oza, Poonam Kadam, “Techniques for Sub-threshold Leakage Reduction in

Low Power CMOS Circuit Designs”, International Journal of Computer Applications

(0975 – 8887), Volume 97– No.15, July 2014.

61) A. Muttreja, N. Agarwal, and N.K. Jha, “CMOS logic design with independent gate

FinFETs,” in Proc. Int. Conf. Computer Design, Oct. 2007, pp. 560–567.

62) S. Goel, M.A. Elgamel, M.A. Bayoumi, Y. Hanafy, “Design Methodologies for high

performance noise tolerant XOR-XNOR circuits”, IEEE Transactions on Circuits and

Systems – I: Regular Papers, Vol. 53, No. 4, 2006, pp. 867-878.

63) A. Yurdakul, “Multiplierless implementation of 2D FIR filters”,Integration: The VLSI

Journal, Vol. 38, No. 4, 2005, pp. 597-613.

64) S. B Sukhavasi, S. B Sukhasavi, V. B Madivada, H. Khan, and S. R S.Kalavakolanu,

“Implementation of low power parallel compressor for multiplier using self-resetting

logic,” International Journal of Computer Applications, vol. 47, no. 3, June, 2012.

85

65) V.G. Oklobdzija, D. Villeger, S.S. Liu, “A method for speed optimized partial product

reduction and generation of fast parallel multipliers using an algorithmic approach”,

IEEE Transactions on Computers, Vol. 45, No. 3, 1996.

66) P. Stelling, C. Martel, V.G. Oklobdzija, R. Ravi, “Optimal circuit for parallel

multipliers”, IEEE Transactions on Computers, Vol. 47, No. 3, 1998.

67) V.G. Oklobdzija, “High speed VLSI arithmetic unit: Adders and Multipliers”, in

Design of High Performance Microprocessor Circuits”, Editor A. Chandrakasan, IEEE

Press, 2000.

68) J. J. F. Cavanagh, Digital Computer Arithmetic. New York: McGraw- Hill, 1984.

69) Naveen Kumar, Manu Bansal, Navnish Kumar” VLSI Architecture of Pipelined Booth

Wallace MAC unit” International Journal of Computer Application (0975-8887).

70) Fayed, Ayman A., Bayoumi, Magdy A., “A Merged Multiplier-Accumulator for high

speed signal processing applications”, IEEE International Conference on Acoustics,

Speech, and Signal Processing (ICASSP), pp 3212 -3215, 2002.

71) S. Knowles, “A Family of Adders”, Proceedings of the 15th IEEE Symposium of

Computer Arithmetic, pp. 271-281, June 2001.

72) F. Carbognani, F. Buergin, N. Felber, H. Kaeslin and W. Fitcher. A low-power

transmission-gate-based 16-bit multiplier for digital hearing aids. Analog Integrated

Circuits and Signal Processing. vol. 56 pp. 5-12 (2008).

73) P.V. Rao, C. Prasanna Raj P, and S. Ravi. Vlsi design and analysis of multipliers for

low power. Fifth IEEE International Conference on Intelligent Information Hiding and

Multimedia Signal Processing, (2009).

74) C.-Y. Han, H.-J. Park, and L.-S. Kim. A low-power array multiplier using separated

multiplication technique. IEEE Transactions on Circuits and Systems II: Analog and

Digital Signal Processing. vol. 48, pp. 866-871 (2001).

75) Avisek Sen, Partha Mitra, Debarshi Datta, “Low Power MAC Unit for DSP Processor”,

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-

3878, Volume-1, Issue-6, January 2013.

76) P.Jagadeesh, S.Ravi, Dr.Kittur Harish Mallikarjun, “Design of High Performance 64-

Bit MAC Unit”, Proceedings of IEEE International Conference on Circuits, Power and

Computing Technologies, Tamilnadu, pp.782-786, 2013.

Design of High Performance Arithmetic Circuits using Novel...

Documents

Transcript of Design of High Performance Arithmetic Circuits using Novel...